Skip to main content

Implement XOR in Tensorflow

XOR is considered as the 'Hello World' of Neural Networks. It seems like the best problem to try your first TensorFlow program.

Tensorflow makes it easy to build a neural network with few tweaks. All you have to do is make a graph and you have a neural network that learns the XOR function.

Why XOR? Well, XOR is the reason why backpropogation was invented in the first place. A single layer perceptron although quite successful in learning the AND and OR functions, can't learn XOR (Table 1) as it is just a linear classifier, and XOR is a linearly inseparable pattern (Figure 1). Thus the single layer perceptron goes into a panic mode while learning XOR – it can't just do that. 

Deep Propogation algorithm comes for the rescue. It learns an XOR by adding two lines L1 and L2 (Figure 2). This post assumes you know how the backpropogation algorithm works.

Following are the steps to implement the neural network in Figure 3 for XOR in Tensorflow:
1. Import necessary libraries

import tensorflow as tf
import math
import numpy as np

2. Declare the number of input, hidden and output layer nodes.

MAX_STEPS = 5000

3. Nodes are created in Tensorflow using placeholders. Placeholders are values that we will input when we ask Tensorflow to run a computation.

Create inputs x consisting of a 2d tensor of floating point numbers
= tf.placeholder("float",
shape=[None, INPUT_COUNT])
4. Define weights and biases from input layer to hidden layer
WEIGHT_HIDDEN = tf.Variable(tf.truncated_normal([INPUT_COUNT, HIDDEN_COUNT]))
BIAS_HIDDEN = tf.Variable(tf.zeros([HIDDEN_COUNT]))
A variable is a value that lives in a Tensorflow's computation graph that can be modified by the computation.
5. Define an activation function for the hidden layer. Here we are using the Sigmoid function, but you can use other activation functions offered by Tensorflow.
AF_HIDDEN = tf.nn.sigmoid(tf.matmul(inputs_placeholder, WEIGHT_HIDDEN) + BIAS_HIDDEN)
6. Define weights and biases from hidden layer to output layer. The biases are initialized with tf.zeros to make sure they start with zero values.
WEIGHT_OUTPUT = tf.Variable(tf.truncated_normal([HIDDEN_COUNT, OUTPUT_COUNT]))
BIAS_OUTPUT = tf.Variable(tf.zeros([OUTPUT_COUNT]))

7. With one line of code we can calculate the logits tensor that will contain the output that is returned
We then compute the softmax probabilities that are assigned to each class
y = tf.nn.softmax(logits)
8. The tf.nn.softmax_cross_entropy_with_logits op is added to compare the output logits to expected output
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits, y_)
It then uses tf.reduce_mean to average the cross entropy values across the batch dimension as the total loss
loss = tf.reduce_mean(cross_entropy)
The tensor that will contain the loss value will be returned
9. Next, we instantiate a tf.train.GradientDescentOptimizer that applies gradients with the requested learning rate. Since Tensorflow has access to the entire computation graph, it can find the gradients of the cost of all the variables.
train_step = tf.train.GradientDescentOptimizer(LEARNING_RATE).minimize(loss)
The tensor containing the outputs of the training step is returned.
10. Next we create a tf.Session () to run the graph
with tf.Session() as sess:
We initialize all the variables before we use them
init = tf.initialize_all_variables()
Then we run the session
For every training loop we are going to provide the same input and expected output data
INPUT_TRAIN = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
OUTPUT_TRAIN = np.array([[1, 0], [0, 1], [0, 1], [1, 0]])
We need to create a python dictionary object with placeholders as keys and feed tensors as values
feed_dict = {
inputs_placeholder: INPUT_TRAIN,
labels_placeholder: OUTPUT_TRAIN,

This is passed into the function's feed_dict parameter to provide the input examples for this step of training.
The following code fetch two values [train_step, loss] in its run call. Because there are two values to fetch, returns a tuple with two items. We also print the loss and outputs every 100 steps.
for step in xrange(MAX_STEPS)
  loss_val =[train_step, loss], feed_dict)
  if step % 100 == 0:
   print "Step:", step, "loss: ", loss_val
   for input_value in INPUT_TRAIN:
     print input_value,, 
     feed_dict={inputs_placeholder: [input_value]})

11. When you run Tensorflow, on the 4900th step you will get a similar output as shown
[0 1] [[ 0.99858057 0.00141946]]
[0 1] [[ 0.00187515 0.9981249]]
[1 0] [[ 0.00128779 0.99871218]]
[1 1] [[ 0.99883229 0.00116773]]
12. The following points should be noted:
  • You will need to experiment with Tensorflow to create an optimized code. Play around with HIDDEN_COUNT, LEARNING_RATE AND MAX_STEPS
  • You can use variety of activation functions and increase the number of hidden nodes to make your program efficient and faster.


Post a Comment

Popular posts from this blog

GPU - The brain of Artificial Intelligence

Machine Learning algorithms require tens and thousands of CPU based servers to train a model, which turns out to be an expensive activity. Machine Learning researchers and engineers are often faced with the problem of running their algorithms fast. Although initially invented for processing graphics in computer games, GPUs today are used in machine learning to perform feature detection from vast amount of unlabeled data. Compared to CPUs, GPUs take far less time to train models that perform classification and prediction. Characteristics of GPUs that make them ideal for machine learning Handle large datasets Needs far less data centre infrastructure Can be specialized for specific machine learning needs Perform vector computations faster than any known processor Designed to perform data parallel computation NVIDIA CUDA GPUs today are used to build deep learning image processing tools for  Adobe Creative Cloud. According to NVIDIA blog future Adobe applicati

Understanding Projection Pursuit Regression

The following article gives an overview of the paper "Projection Pursuit Regression” published by Friedman J. H and Stuetzle W. You will need basic background of Machine Learning and Regression before understanding this article. The algorithms and images are taken from the paper. ( )  What is Regression? Regression is a machine learning technology used to predict a response variable given multiple predictor variables or features. The main distinction is that the response to be predicted is any real value and not just any class or cluster name. Hence though similar to Classification in terms of making a prediction, it is largely different given what it’s predicting.  A simple to understand real world problem of regression would be predicting the sale price of a particular house based on it’s square footage, given that we have data of similar houses sold in that area in the past. T

Understanding Generative Adverserial Networks - Part 1

This is a two part series on understanding Generative Adversarial Networks (GANs). This part deals with the conceptual understanding of GANs. In the second part we will try to understand the mathematics behind GANs. Generative networks have been in use for quite a while now. And so have discriminative networks. But only in 2014 did someone get the brilliant idea of using them together. These are the generative adversarial networks. This kind of deep learning model was invented by Ian Goodfellow . When we work with data already labelled, it’s called supervised learning. It’s much easier compared to unsupervised learning, which has no predefined labels, making the task more vague.  "Generative Adversarial Networks is the most interesting idea in the last ten years in Machine Learning." - Yann LeCun In this post, we’ll discuss what GANs are and how they work, at a higher , more abstract level. Since 2014, many variations of the traditional GAN have co