Skip to main content

Dynamics of Selecting your Open Source AI

The landscape of open source AI is big. To identify suitable open source tools to make your AI dream product is a herculean task. Selecting an AI toolkit for your product might turn out costly when you need to scale your software, thus it turns out to be a strategic decision. We at CereLabs have developed a criteria to choose Open Source AI Toolkit.

  1. Vision/ Reason for open source

If you need to trust an open source platform, you need to start with the vision statement with which the open source AI platform is launched. The vision statement portrays the commitment of the company or community towards the toolkit.    

Following are the visions of few of the reputed AI Open Source Platforms:

      OpenCog: “OpenCog is a unique and ambitious open-source software project. Our aim is to create an open source framework for Artificial General Intelligence, intended to one day express general intelligence at the human level and beyond. That is: We're undertaking a serious effort to build a thinking machine.

    Tensorflow by Google: “By sharing what we believe to be one of the best machine learning toolboxes in the world, we hope to create an open standard for exchanging research ideas and putting machine learning in products. Google engineers really do use TensorFlow in user-facing products and services, and our research group intends to share TensorFlow implementations along side many of our research publications.”

DMLT by Microsoft: “We believe that in order to push the frontier of distributed machine learning, we need the collective effort from the entire community, and need the organic combination of both machine learning innovations and system innovations. This belief strongly motivates us to open source the DMTK project.”

 Theano (When it was launched): “This is the vision we have for Theano. This is give people an idea of what to expect in the future of Theano, but we can’t promise to implement all of it. This should also help you to understand where Theano fits in relation to other computational tools.
    • Support tensor and sparse operations
    • Support linear algebra operations
    • Graph Transformations    
        • Differentiation/higher order differentiation        
        • ‘R’ and ‘L’ differential operators        
        • Speed/memory optimizations        
        • Numerical stability optimizations        
    • Can use many compiled languages, instructions sets: C/C++, CUDA, OpenCL, PTX, CAL, AVX, ...
    • Lazy evaluation
    • Loop
    • Parallel execution (SIMD, multi-core, multi-node on cluster, multi-node distributed)
    • Support all NumPy/basic SciPy functionality
    • Easy wrapping of library functions in Theano”

It should be noted that most of the promises made by Theano in their vision were later fulfilled, thus proving their commitment towards a full fledged open source AI.

  1. Machine Learning Libraries

An ideal AI toolkit will have all necessary Machine Learning (ML) libraries that will     assist you in all your AI requirements. There are numerous needs that one needs to cater to build AI products, for which the libraries should have all machine learning algorithms in place. Today most of AI toolkits have all the necessary libraries that support all trending research in Machine Learning. Any AI toolkit must support Supervised learning, Unsupervised learning and Reinforcement learning. To achieve that the libraries must have atleast Support Vector Machines, Artificial Neural Networks, Clustering algorithms and Bayesian Networks, to fulfil your basic AI needs.

  1. Support
The extent of     support provided by the AI open source provider sends a clear message of the willingness to improve the AI toolkit. Tracking the changes made in the toolkit over a period of time gives a fair idea about their commitment. Google launched tensorflow with Python 2.7. There was lot of demand for Python 3 support. Within weeks Google came out with supporting Python 3 for Tensorflow, and thus proving its commitment towards continuous support. The promises made by Theano in their vision statement were fulfilled in later releases, from supporting tensors to better support for GPUs. You need to follow the release updates to get a clear picture of how well the AI needs of your product will get fulfilled.


The number of followers of the open source AI toolkit gives you an impression of     the fan base that toolkit has. More followers means more improvements in future releases. It also helps in testing the toolkit at a larger scale, which a limited toolkit might miss out.     AI works on data, and a huge user base provides this data to make the toolkit more intelligent. Data is the main reason for major corporations making their toolkit open source, and general consensus is that only the toolkit with enough data will be a winner in this never ending race. Only the number of followers will decide the fate of any open source AI toolkit.
  1. Hardware Compatibility

AI is about making the machine learning algorithms work faster. All major open source toolkits have a strong integration and support for GPUs. With few lines of code you can distribute your code on multiple GPUs.

Facebook has recently open sourced 'Big Sur', its hardware design for AI. The release statement of 'Big Sur' says this:

“We want to make it a lot easier for AI researchers to share techniques and technologies. As with all hardware systems that are released into the open, it's our hope that others will be able to work with us to improve it. We believe that this open collaboration helps foster innovation for future designs, putting us all one step closer to building complex AI systems that bring this kind of innovation to our users and, ultimately, help us build a more open and connected world.”

Check your hardware needs for your product, and decide on your toolkit selection strategy. Building a product and then hunting for hardware optimizations might turn out expensive if you realize your toolkit doesn't offer strong GPU support.

  1. Performance

Every new version of any AI toolkit comes up with both software and hardware performance improvements. The commitment for support leads to innumerable performance improvements. The kind of processing needs your product has will help you choose an ideal AI toolkit. Performance deserves an entire article, and we will cater to this in future posts.    

  1. Documentation
A serious effort towards providing documentation for the toolkit, reflects the dedication towards the toolkit's user community. A thorough documentation along with tutorials will help AI programmers to     easily adopt the toolkit, and thus increasing the user base of the toolkit. The commitment of Theano and Tensorflow to provide a detailed documentation along with tutorials is helping them to attract more followers.
  1. Available Skillset in the Market

The programming language support provided by the AI toolkit will ensure that there     is enough skillset available in the market for you to hire. The reason for Google to choose Python as a language for tensor flow is that Python has a vast support for numerous libraries, specially in NLP. Google has promised to add support for other languages including Java. Such steps ensure that your product has strong support to hire new AI engineers.

As more and more companies open source their AI, it will be a challenging task to select an ideal AI toolkit for your product. Proper planning and critical thinking to select your AI toolkit using the above criteria will give you enough leverage to make successful AI products.


Popular posts from this blog

Understanding Generative Adversarial Networks - Part II

In "Understanding Generative Adversarial Networks - Part I" you gained a conceptual understanding of how GAN works. In this post let us get a mathematical understanding of GANs.
The loss functions can be designed most easily using the idea of zero-sum games. 
The sum of the costs of all players is 0. This is the Minimax algorithm for GANs
Let’s break it down.
Some terminology: V(D, G) : The value function for a minimax game E(X) : Expectation of a random variable X, also equal to its average value D(x) : The discriminator output for an input x from real data, represents probability G(z): The generator's output when its given z from the noise distribution D(G(z)): Combining the above, this represents the output of the discriminator when 
given a generated image G(z) as input
Now, as explained above, the discriminator is the maximizer and hence it tries to 

Understanding Generative Adverserial Networks - Part 1

This is a two part series on understanding Generative Adversarial Networks (GANs). This part deals with the conceptual understanding of GANs. In the second part we will try to understand the mathematics behind GANs.

Generative networks have been in use for quite a while now. And so have discriminative networks. But only in 2014 did someone get the brilliant idea of using them together. These are the generative adversarial networks. This kind of deep learning model was invented by Ian Goodfellow. When we work with data already labelled, it’s called supervised learning. It’s much easier compared to unsupervised learning, which has no predefined labels, making the task more vague. 

"Generative Adversarial Networks is the most interesting idea in the last ten years in Machine Learning." - Yann LeCun

In this post, we’ll discuss what GANs are and how they work, at a higher , more abstract level. Since 2014, many variations of the traditional GAN have come out, but the underlying conc…