Skip to main content

Dynamics of Selecting your Open Source AI


The landscape of open source AI is big. To identify suitable open source tools to make your AI dream product is a herculean task. Selecting an AI toolkit for your product might turn out costly when you need to scale your software, thus it turns out to be a strategic decision. We at CereLabs have developed a criteria to choose Open Source AI Toolkit.


  1. Vision/ Reason for open source

If you need to trust an open source platform, you need to start with the vision statement with which the open source AI platform is launched. The vision statement portrays the commitment of the company or community towards the toolkit.    

Following are the visions of few of the reputed AI Open Source Platforms:

      OpenCog: “OpenCog is a unique and ambitious open-source software project. Our aim is to create an open source framework for Artificial General Intelligence, intended to one day express general intelligence at the human level and beyond. That is: We're undertaking a serious effort to build a thinking machine.


    Tensorflow by Google: “By sharing what we believe to be one of the best machine learning toolboxes in the world, we hope to create an open standard for exchanging research ideas and putting machine learning in products. Google engineers really do use TensorFlow in user-facing products and services, and our research group intends to share TensorFlow implementations along side many of our research publications.”


DMLT by Microsoft: “We believe that in order to push the frontier of distributed machine learning, we need the collective effort from the entire community, and need the organic combination of both machine learning innovations and system innovations. This belief strongly motivates us to open source the DMTK project.”

 Theano (When it was launched): “This is the vision we have for Theano. This is give people an idea of what to expect in the future of Theano, but we can’t promise to implement all of it. This should also help you to understand where Theano fits in relation to other computational tools.
    • Support tensor and sparse operations
    • Support linear algebra operations
    • Graph Transformations    
        • Differentiation/higher order differentiation        
        • ‘R’ and ‘L’ differential operators        
        • Speed/memory optimizations        
        • Numerical stability optimizations        
    • Can use many compiled languages, instructions sets: C/C++, CUDA, OpenCL, PTX, CAL, AVX, ...
    • Lazy evaluation
    • Loop
    • Parallel execution (SIMD, multi-core, multi-node on cluster, multi-node distributed)
    • Support all NumPy/basic SciPy functionality
    • Easy wrapping of library functions in Theano”

It should be noted that most of the promises made by Theano in their vision were later fulfilled, thus proving their commitment towards a full fledged open source AI.


   
  1. Machine Learning Libraries

An ideal AI toolkit will have all necessary Machine Learning (ML) libraries that will     assist you in all your AI requirements. There are numerous needs that one needs to cater to build AI products, for which the libraries should have all machine learning algorithms in place. Today most of AI toolkits have all the necessary libraries that support all trending research in Machine Learning. Any AI toolkit must support Supervised learning, Unsupervised learning and Reinforcement learning. To achieve that the libraries must have atleast Support Vector Machines, Artificial Neural Networks, Clustering algorithms and Bayesian Networks, to fulfil your basic AI needs.

   
  1. Support
       
The extent of     support provided by the AI open source provider sends a clear message of the willingness to improve the AI toolkit. Tracking the changes made in the toolkit over a period of time gives a fair idea about their commitment. Google launched tensorflow with Python 2.7. There was lot of demand for Python 3 support. Within weeks Google came out with supporting Python 3 for Tensorflow, and thus proving its commitment towards continuous support. The promises made by Theano in their vision statement were fulfilled in later releases, from supporting tensors to better support for GPUs. You need to follow the release updates to get a clear picture of how well the AI needs of your product will get fulfilled.

  1. Followers

The number of followers of the open source AI toolkit gives you an impression of     the fan base that toolkit has. More followers means more improvements in future releases. It also helps in testing the toolkit at a larger scale, which a limited toolkit might miss out.     AI works on data, and a huge user base provides this data to make the toolkit more intelligent. Data is the main reason for major corporations making their toolkit open source, and general consensus is that only the toolkit with enough data will be a winner in this never ending race. Only the number of followers will decide the fate of any open source AI toolkit.
       
  1. Hardware Compatibility

AI is about making the machine learning algorithms work faster. All major open source toolkits have a strong integration and support for GPUs. With few lines of code you can distribute your code on multiple GPUs.

Facebook has recently open sourced 'Big Sur', its hardware design for AI. The release statement of 'Big Sur' says this:

“We want to make it a lot easier for AI researchers to share techniques and technologies. As with all hardware systems that are released into the open, it's our hope that others will be able to work with us to improve it. We believe that this open collaboration helps foster innovation for future designs, putting us all one step closer to building complex AI systems that bring this kind of innovation to our users and, ultimately, help us build a more open and connected world.”


Check your hardware needs for your product, and decide on your toolkit selection strategy. Building a product and then hunting for hardware optimizations might turn out expensive if you realize your toolkit doesn't offer strong GPU support.

  1. Performance

Every new version of any AI toolkit comes up with both software and hardware performance improvements. The commitment for support leads to innumerable performance improvements. The kind of processing needs your product has will help you choose an ideal AI toolkit. Performance deserves an entire article, and we will cater to this in future posts.    

  1. Documentation
A serious effort towards providing documentation for the toolkit, reflects the dedication towards the toolkit's user community. A thorough documentation along with tutorials will help AI programmers to     easily adopt the toolkit, and thus increasing the user base of the toolkit. The commitment of Theano and Tensorflow to provide a detailed documentation along with tutorials is helping them to attract more followers.
   
  1. Available Skillset in the Market

The programming language support provided by the AI toolkit will ensure that there     is enough skillset available in the market for you to hire. The reason for Google to choose Python as a language for tensor flow is that Python has a vast support for numerous libraries, specially in NLP. Google has promised to add support for other languages including Java. Such steps ensure that your product has strong support to hire new AI engineers.

As more and more companies open source their AI, it will be a challenging task to select an ideal AI toolkit for your product. Proper planning and critical thinking to select your AI toolkit using the above criteria will give you enough leverage to make successful AI products.

Comments

  1. Wow, amazing blog layout! How long have you been blogging for? you make blogging look easy. The overall look of your website is fantastic, let alone the content!

    Best 3D animation Company
    Best Chatbot Development Company
    Mobile app development in Coimbatore

    ReplyDelete

Post a Comment

Popular posts from this blog

GPU - The brain of Artificial Intelligence

Machine Learning algorithms require tens and thousands of CPU based servers to train a model, which turns out to be an expensive activity. Machine Learning researchers and engineers are often faced with the problem of running their algorithms fast. Although initially invented for processing graphics in computer games, GPUs today are used in machine learning to perform feature detection from vast amount of unlabeled data. Compared to CPUs, GPUs take far less time to train models that perform classification and prediction. Characteristics of GPUs that make them ideal for machine learning Handle large datasets Needs far less data centre infrastructure Can be specialized for specific machine learning needs Perform vector computations faster than any known processor Designed to perform data parallel computation NVIDIA CUDA GPUs today are used to build deep learning image processing tools for  Adobe Creative Cloud. According to NVIDIA blog future Adobe applicati

Building Commonsense in AI

It is often debated that what makes humans the ultimate intelligent species is the innate quality of doing commonsense reasoning. Humans use common sense knowledge about the world around to take appropriate decisions, and this turns out to be the necessary ingredient for their survival. AI researches have long thought about building commonsense knowledge in AI. They argue that if AI possess necessary commonsense knowledge then it will be a truly intelligent machine. We will discuss two major commonsense projects that exploit this idea: Cyc tries to build a comprehensive ontology and knowledge base of everyday commonsense knowledge. This knowledge can be used by AI applications to do human-like reasoning. Started in 1984, Cyc has come a long way. Today, OpenCyc 4.0 includes the entire Cyc ontology, containing 239,000 concepts and 2,093,000 facts and can be browsed on the OpenCyc website - http://www.cyc.com/platform/opencyc/ . OpenCyc is available for download from Source

Understanding Projection Pursuit Regression

The following article gives an overview of the paper "Projection Pursuit Regression” published by Friedman J. H and Stuetzle W. You will need basic background of Machine Learning and Regression before understanding this article. The algorithms and images are taken from the paper. ( http://www.stat.washington.edu/courses/stat527/s13/readings/FriedmanStuetzle_JASA_1981.pdf )  What is Regression? Regression is a machine learning technology used to predict a response variable given multiple predictor variables or features. The main distinction is that the response to be predicted is any real value and not just any class or cluster name. Hence though similar to Classification in terms of making a prediction, it is largely different given what it’s predicting.  A simple to understand real world problem of regression would be predicting the sale price of a particular house based on it’s square footage, given that we have data of similar houses sold in that area in the past. T