Skip to main content

Understanding Generative Adversarial Networks - Part II

In "Understanding Generative Adversarial Networks - Part I" you gained a conceptual understanding of how GAN works. In this post let us get a mathematical understanding of GANs.

The loss functions can be designed most easily using the idea of zero-sum games. 
The sum of the costs of all players is 0.
   
   
This is the Minimax algorithm for GANs

Let’s break it down.

Some terminology:
V(D, G) : The value function for a minimax game
E(X) : Expectation of a random variable X, also equal to its average value
D(x) : The discriminator output for an input x from real data, represents probability
G(z): The generator's output when its given z from the noise distribution
D(G(z)): Combining the above, this represents the output of the discriminator when 
given a generated image G(z) as input

Now, as explained above, the discriminator is the maximizer and hence it tries to 
maximize V(D, G). The discriminator wants to correctly label an image from the input 
data as real.

Thus, it tries to maximize D(x). At the same time, a generated image (created by the 
generator), must have a very low chance of coming from the input data -- it should be 
fake. Thus, D(G(z)) should be small, or 1 - D(G(z))should be large. And as log is an 
increasing function ( it increases with increasing x), one can easily see how V(D, G) 
is getting maximized here.

The converse is true for the generator. It wants to increase the chance of the 
discriminator incorrectly classifying a generated image as real. Thus, D(G(z)) should be 
large. As this term increases, log(1 - D(G(z))decreases. Thus, V(D, G) decreases here.

Now, as we have understood the intuition behind the minimax algorithm for 
adversarial networks, let’s discuss the gradients.



As explained above, the discriminator has to maximize the minimax value function 
V(D, G). Thus, it must undergo what is called gradient ascent (yeah.. not descent). 
It’s weights must be updated with the above gradient.

Coming to the generator, it must undergo gradient descent with respect to the this:


Now comes the actual implementation:
A for loop for the number of iterations we want to perform encompasses the entire code,
as expected. Next, another for loop is run over the discriminator training part for k 
iterations. This means that for every k iterations over the discriminator, the generator’s 
weights and biases are updated only once.

                    image source

The reason for this is to avoid something called the Helvetica Scenario. Let’s go back 
to the forger-officer analogy. Suppose that particular officer is colour blind. Now, if the 
forger makes fake money which is identical to real money except that it has a slightly 
different, but noticeable, colour difference, the officer will treat the forged money as 
authentic money. As the officer did not give any feedback on how to improve, the forger 
has no reason to improve his or her technique. After that, all generated currency will 
fool that particular officer, but it won’t actually be what we hoped for -- 
indistinguishable from real currency.

This is the gist of what the Helvetica Scenario means. The generator unintentionally 
finds a small weakness in the discriminator and exploits it, succeeding in the immediate 
goal, but failing in the long term.

Hence, it is more important to train the discriminator first. Once the discriminator is 
reasonably confident, it can give very valuable feedback to the generator, which in turn 
helps achieve our end goal, which is to generate a life-like image.

Coming back to the algorithm, in each of those k iterations, the discriminator ‘s 
parameters are updated.

Then, the generator is trained for one iteration and this process continues till 
convergence. The value of k can vary a lot, the minimum is , of course, 1.





By 

Aniruddha Karajgi,
Research Intern,
Cere Labs Pvt. Lt.

Comments

Popular posts from this blog

Implement XOR in Tensorflow

XOR is considered as the 'Hello World' of Neural Networks. It seems like the best problem to try your first TensorFlow program.

Tensorflow makes it easy to build a neural network with few tweaks. All you have to do is make a graph and you have a neural network that learns the XOR function.

Why XOR? Well, XOR is the reason why backpropogation was invented in the first place. A single layer perceptron although quite successful in learning the AND and OR functions, can't learn XOR (Table 1) as it is just a linear classifier, and XOR is a linearly inseparable pattern (Figure 1). Thus the single layer perceptron goes into a panic mode while learning XOR – it can't just do that. 

Deep Propogation algorithm comes for the rescue. It learns an XOR by adding two lines L1 and L2 (Figure 2). This post assumes you know how the backpropogation algorithm works.



Following are the steps to implement the neural network in Figure 3 for XOR in Tensorflow:
1. Import necessary libraries
impo…

From Cats to Convolutional Neural Networks

Widely used in image recognition, Convolutional Neural Networks (CNNs) consist of multiple layers of neuron collection which look at small window of the input image, called receptive fields.
The history of Convolutional Neural Networks begins with a famous experiment “Receptive Fields of Single Neurons in the Cat’s Striate Cortex” conducted by Hubel and Wiesel. The experiment confirmed the long belief of neurobiologists and psychologists that the neurons in the brain act as feature detectors.
The first neural network model that drew inspiration from the hierarchy model of the visual nervous system proposed by Hubel and Wiesel was Neocognitron invented by Kunihiko Fukushima, and had the ability of performing unsupervised learning. Kunihiko Fukushima’s approach was commendable as it was the first neural network model having the capability of pattern recognition similar to human brain. The model gave a lot of insight and helped future understanding of the brain.
A successful advancement i…

Understanding Projection Pursuit Regression

The following article gives an overview of the paper "Projection Pursuit Regression” published by Friedman J. H and Stuetzle W. You will need basic background of Machine Learning and Regression before understanding this article. The algorithms and images are taken from the paper. (http://www.stat.washington.edu/courses/stat527/s13/readings/FriedmanStuetzle_JASA_1981.pdf
What is Regression? Regression is a machine learning technology used to predict a response variable given multiple predictor variables or features. The main distinction is that the response to be predicted is any real value and not just any class or cluster name. Hence though similar to Classification in terms of making a prediction, it is largely different given what it’s predicting. 
A simple to understand real world problem of regression would be predicting the sale price of a particular house based on it’s square footage, given that we have data of similar houses sold in that area in the past. The regression so…