Implement XOR in Tensorflow

XOR is considered as the 'Hello World' of Neural Networks. It seems like the best problem to try your first TensorFlow program.

Tensorflow makes it easy to build a neural network with few tweaks. All you have to do is make a graph and you have a neural network that learns the XOR function.

Why XOR? Well, XOR is the reason why backpropogation was invented in the first place. A single layer perceptron although quite successful in learning the AND and OR functions, can't learn XOR (Table 1) as it is just a linear classifier, and XOR is a linearly inseparable pattern (Figure 1). Thus the single layer perceptron goes into a panic mode while learning XOR – it can't just do that.

Deep Propogation algorithm comes for the rescue. It learns an XOR by adding two lines L1 and L2 (Figure 2). This post assumes you know how the backpropogation algorithm works.

Following are the steps to implement the neural network in Figure 3 for XOR in Tensorflow:

1. Import necessary libraries

import tensorflow as tf

import math

import numpy as np

2. Declare the number of input, hidden and output layer nodes.

INPUT_COUNT = 2

OUTPUT_COUNT = 2

HIDDEN_COUNT = 2

LEARNING_RATE = 0.4

MAX_STEPS = 5000

3. Nodes are created in Tensorflow using placeholders. Placeholders are values that we will input when we ask Tensorflow to run a computation.

Create inputs x consisting of a 2d tensor of floating point numbers

inputs_placeholder
= tf.placeholder("float",
shape=[None, INPUT_COUNT])

4. Define weights and biases from input layer to hidden layer

WEIGHT_HIDDEN = tf.Variable(tf.truncated_normal([INPUT_COUNT, HIDDEN_COUNT]))

BIAS_HIDDEN = tf.Variable(tf.zeros([HIDDEN_COUNT]))

A variable is a value that lives in a Tensorflow's computation graph that can be

modified
by the computation.

5. Define an activation function for the hidden layer. Here we are using the Sigmoid function, but you can use other activation functions offered by Tensorflow.

AF_HIDDEN = tf.nn.sigmoid(tf.matmul(inputs_placeholder, WEIGHT_HIDDEN) + BIAS_HIDDEN)

Define
weights and biases from

hidden


layer to output layer.

The
biases are initialized with tf.zeros to make sure

they


start with zero values.

WEIGHT_OUTPUT = tf.Variable(tf.truncated_normal([HIDDEN_COUNT, OUTPUT_COUNT]))

BIAS_OUTPUT = tf.Variable(tf.zeros([OUTPUT_COUNT]))

7

.
With one line of code we can calculate t

he
logits tensor that will contain the output that is returned

logits = tf.matmul(AF_HIDDEN, WEIGHT_OUTPUT) + BIAS_OUTPUT

We then compute the softmax probabilities that are assigned to each class

y = tf.nn.softmax(logits)

The

tf.nn.softmax_cross_entropy_with_logits op is added to compare the output logits to expected output

cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits, y_)

It then uses tf.reduce_mean to average the cross entropy values across the batch dimension as the total loss

loss = tf.reduce_mean(cross_entropy)

The tensor that will contain the loss value will be returned

9. Next, we instantiate a tf.train.GradientDescentOptimizer that applies gradients with the requested learning rate. Since Tensorflow has access to the entire computation graph, it can find the gradients of the cost of all the variables.

train_step = tf.train.GradientDescentOptimizer(LEARNING_RATE).minimize(loss)

The tensor containing the outputs of the training step is returned.

10. Next we create a tf.Session () to run the graph

with tf.Session() as sess:

We initialize all the variables before we use them

init = tf.initialize_all_variables()

Then we run the session

sess.run(init)

For every training loop we are going to provide the same input and expected output data

INPUT_TRAIN = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])

OUTPUT_TRAIN = np.array([[1, 0], [0, 1], [0, 1], [1, 0]])

We need to create a python dictionary object with placeholders as keys and feed tensors as values

feed_dict = {

inputs_placeholder: INPUT_TRAIN,

labels_placeholder: OUTPUT_TRAIN,

}

This is passed into the sess.run() function's feed_dict parameter to provide the input examples for this step of training.

The following code fetch two values [train_step, loss] in its run call. Because there are two values to fetch, sess.run() returns a tuple with two items. We also print the loss and outputs every 100 steps.

for step in xrange(MAX_STEPS)

loss_val = sess.run([train_step, loss], feed_dict)

if step % 100 == 0:

print "Step:", step, "loss: ", loss_val

for input_value in INPUT_TRAIN:

print input_value, sess.run(y,

feed_dict={inputs_placeholder: [input_value]})

11. When you run Tensorflow, on the 4900th step you will get a similar output as shown

[0 1] [[ 0.99858057 0.00141946]]

[0 1] [[ 0.00187515 0.9981249]]

[1 0] [[ 0.00128779 0.99871218]]

[1 1] [[ 0.99883229 0.00116773]]

12. The following points should be noted:

You will need to experiment with Tensorflow to create an optimized code. Play around with HIDDEN_COUNT, LEARNING_RATE AND MAX_STEPS
You can use variety of activation functions and increase the number of hidden nodes to make your program efficient and faster.

Reference: http://www.tensorflow.org/tutorials/mnist/tf/index.html#tensorflow-mechanics-101

Comments

Anonymous7 March 2019 at 21:20
It was really an interesting blog, Thank you for providing unknown facts.
Aviation Academy in Chennai
Air hostess training in Chennai
Airport management courses in Chennai
Ground staff training in Chennai
ReplyDelete
Replies
aaronnssd4 August 2021 at 03:42
I generally want quality content and I found that in your post. The information you have shared about Online Software Development Training Courses.....is beneficial and significant for us. Keep sharing these kinds of articles here. Thank you.
ReplyDelete
Replies

Add comment

Unravelling the mysteries of Artificial Intelligence

Search This Blog

Implement XOR in Tensorflow

Labels

Comments

Post a Comment

Popular posts from this blog

GPU - The brain of Artificial Intelligence

Understanding Projection Pursuit Regression

Anomaly Detection based on Prediction - A Step Closer to General Artificial Intelligence