Skip to main content

Helping the Blind See


The Sense of Vision is taken for granted by us in our day to day life, but only a visually impaired person can understand the true value and necessity of Vision. But soon AI based computer vision systems can help the blind and visually impaired to navigate.

Tech giants like Google, Baidu, Facebook, Microsoft are working on a range of products that apply Deep Learning for the Visually Impaired. One of them being Image Captioning technology wherein the system describes the content of an image.  To accelerate further research and to boost the possible applications of this technology, Google made the latest version of their Image Captioning System available as an open source model in Tensorflow. It’s called “Show And Tell: A Neural Image Caption Generator”. The project can be found at https://github.com/tensorflow/models/tree/master/im2txt and the full paper can be found at https://arxiv.org/abs/1609.06647

The Show and Tell model is an example of an encoder-decoder neural network. It works by first "encoding" an image into a fixed-length vector representation, and then "decoding" the representation into a natural language description.

The image encoder is a deep convolutional neural network. This type of network is widely used for image tasks and is currently state-of-the-art for object recognition and detection. The Inception v3 image recognition model pretrained on the ILSVRC-2012-CLS image classification dataset is used as the encoder.
The decoder is a long short-term memory (LSTM) network. This type of network is commonly used for sequence modeling tasks such as language modeling and machine translation. In the Show and Tell model, the LSTM network is trained as a language model conditioned on the image encoding.
Words in the captions are represented with an embedding model. Each word in the vocabulary is associated with a fixed-length vector representation that is learned during training.
Caption Generated : a street light with a building in the background.

Caption Generated : a group of motorcycles parked in front of a building.

Caption Generated : a group of people walking down a street.

Caption Generated : a group of motorcycles parked next to each other.

Caption Generated : a city street filled with lots of traffic.

Caption Generated : a bus driving down a street next to tall building.

Caption Generated : a group of cars parked on the side of a street.

We at Cere Labs, an Artificial Intelligence startup based in Mumbai, have come with an application wherein we have used this technique and extended its application on Videos to continuously describe the content of Videos. Firstly, we have trained the Show And Tell Model on the MSCOCO image captioning data set to come with our custom model. Then we used OpenCV to obtain video frames from a particular video and these frames were then fed to the inference algorithm of Show And Tell which would caption these individual frames. To speed up the inference performance the frame rate for processing frames in Inference algorithm was tuned to obtain a smooth and synced video playback and caption generation. The results were awesome with some errors in the generated captions but they can be improved further through more data and training. This application was further extended to generate captions on feed received from camera so that the description is real time and can someday help the visually impaired and blind. The possibilities are enormous with applications even in Robotics.

We further plan to experiment and come up with more innovative applications of this promising technology.


By Amol Bhivarkar,
Researcher / Senior Software Developer,
Cere Labs


Comments

  1. The development of artificial intelligence (AI) has propelled more programming architects, information scientists, and different experts to investigate the plausibility of a vocation in machine learning. Notwithstanding, a few newcomers will in general spotlight a lot on hypothesis and insufficient on commonsense application. machine learning projects for final year In case you will succeed, you have to begin building machine learning projects in the near future.

    Projects assist you with improving your applied ML skills rapidly while allowing you to investigate an intriguing point. Furthermore, you can include projects into your portfolio, making it simpler to get a vocation, discover cool profession openings, and Final Year Project Centers in Chennai even arrange a more significant compensation.


    Data analytics is the study of dissecting crude data so as to make decisions about that data. Data analytics advances and procedures are generally utilized in business ventures to empower associations to settle on progressively Python Training in Chennai educated business choices. In the present worldwide commercial center, it isn't sufficient to assemble data and do the math; you should realize how to apply that data to genuine situations such that will affect conduct. In the program you will initially gain proficiency with the specialized skills, including R and Python dialects most usually utilized in data analytics programming and usage; Python Training in Chennai at that point center around the commonsense application, in view of genuine business issues in a scope of industry segments, for example, wellbeing, promoting and account.


    The Nodejs Training Angular Training covers a wide range of topics including Components, Angular Directives, Angular Services, Pipes, security fundamentals, Routing, and Angular programmability. The new Angular TRaining will lay the foundation you need to specialise in Single Page Application developer. Angular Training

    ReplyDelete

Post a comment

Popular posts from this blog

Anomaly Detection based on Prediction - A Step Closer to General Artificial Intelligence

Anomaly detection refers to the problem of finding patterns that do not conform to expected behavior [1]. In the last article "Understanding Neocortex to Create Intelligence", we explored how applications based on the workings of neocortex create intelligence. Pattern recognition along with prediction makes human brains the ultimate intelligent machines. Prediction help humans to detect anomalies in the environment. Before every action is taken, neocortex predicts the outcome. If there is a deviation from the expected outcome, neocortex detects anomalies, and will take necessary steps to handle them. A system which claims to be intelligent, should have anomaly detection in place.
Recent findings using research on neocortex have made it possible to create applications that does anomaly detection. Numenta’s NuPIC using Hierarchical Temporal Memory (HTM) framework is able to do inference and prediction, and hence anomaly detection. HTM accurately predicts anomalies in real worl…

Implement XOR in Tensorflow

XOR is considered as the 'Hello World' of Neural Networks. It seems like the best problem to try your first TensorFlow program.

Tensorflow makes it easy to build a neural network with few tweaks. All you have to do is make a graph and you have a neural network that learns the XOR function.

Why XOR? Well, XOR is the reason why backpropogation was invented in the first place. A single layer perceptron although quite successful in learning the AND and OR functions, can't learn XOR (Table 1) as it is just a linear classifier, and XOR is a linearly inseparable pattern (Figure 1). Thus the single layer perceptron goes into a panic mode while learning XOR – it can't just do that. 

Deep Propogation algorithm comes for the rescue. It learns an XOR by adding two lines L1 and L2 (Figure 2). This post assumes you know how the backpropogation algorithm works.



Following are the steps to implement the neural network in Figure 3 for XOR in Tensorflow:
1. Import necessary libraries
impo…

Understanding Projection Pursuit Regression

The following article gives an overview of the paper "Projection Pursuit Regression” published by Friedman J. H and Stuetzle W. You will need basic background of Machine Learning and Regression before understanding this article. The algorithms and images are taken from the paper. (http://www.stat.washington.edu/courses/stat527/s13/readings/FriedmanStuetzle_JASA_1981.pdf
What is Regression? Regression is a machine learning technology used to predict a response variable given multiple predictor variables or features. The main distinction is that the response to be predicted is any real value and not just any class or cluster name. Hence though similar to Classification in terms of making a prediction, it is largely different given what it’s predicting. 
A simple to understand real world problem of regression would be predicting the sale price of a particular house based on it’s square footage, given that we have data of similar houses sold in that area in the past. The regression so…