Skip to main content

50 Questions about Convolutional Neural Networks

Typical cnn

“Any sufficiently advanced technology is indistinguishable from magic.” - Arthur C. Clarke.

Well! Convolutional Neural Network (CNN) is such a technology. How it does, what it does is truly indistinguishable from magic. Read our earlier post - “From Cats to Convolutional Neural Networks”, to understand why CNNs come close to human intelligence. Although the inner workings of a CNN can be explained, the magic remains. Fascinated by CNNs, we thought of coming up with as many questions about CNNs to understand the mystery of why it is able to classify images or any kind of input so well.

  1. What is convolution?
  2. What is pooling?
  3. Which pooling function is preferred - Max or Average?
  4. What is the role of activation functions in CNN?
  5. Why is Relu prefered in CNN rather than Sigmoid?
  6. Why adding more layers increase the accuracy of the network?
  7. What is the intuition behind CNN?
  8. What is stride?
  9. Is it necessary to include zero-padding?
  10. What is parameter sharing, and why is it important?
  11. What would have happened if we would have not considered the pooling layer in CNN?
Why is pooling so important?
  1. What brings CNN closer to biological systems?
  2. How to decide on amount of training, test and validation data to be given to the network?
  3. What is cross-validation and why is it important?
  4. Which cross-validation technique is better - bootstrap or k-fold?
  5. When does a CNN fail?
  6. How can we know for certain that the network fails because of not providing adequate input or because it has less layers?
  7. What are the hidden layers doing?
  8. How does the backpropagation algorithm work across the network?
  9. Can one do continuous learning on CNN, or the training needs to be done first before conducting inference?
  10. Why are GPUs necessary to train a CNN?
  11. Why does using a pre-trained network increase the learning speed of new categories?
  12. When will we say a CNN is not able to learn?
  13. Why is it sufficient to only train the fully connected layer of a pre-trained network to train new categories.
  14. How important it is to provide right set of data to train a CNN?
  15. Can we use the features learned by the inside layers of a CNN?
  16. What is generalization?
  17. What is overfitting?
  18. Why is it important to apply distortions to input images to train an image classifier?
  19. What are hyper-parameters?
  20. What is an epoch?
  21. What decides the number of examples per epoch?
  22. What is gradient descent?
  23. What is a loss function?
  24. Why is cross-entropy the preferred cost function in CNN?
  25. Which one is better - Batch gradient descent or Stochastic gradient descent?
  26. What is the importance of learning rate in training a CNN?
  27. Which method is optimal - keeping the learning rate constant or changing it as the network becomes mature?
  28. How has CNN reduced the job of data scientists in terms of feature selection?
  29. Why starting the CNN’s training with random weights is preferable compared to starting it with zero weights?
  30. Why is Gaussian the preferred choice to choose random weights?
  31. How does regularization helps in preventing overfitting?
  32. How is a trained CNN evaluated?
  33. What is the importance of bias in training a CNN? Is it that significant in training a CNN?
  34. What are the best practices followed in CNNs?
  35. Why is training CNN a costly affair?
  36. Why can a CNN can be applied to any kind of learning, including images, Natural Language Processing and speech?
  37. Why is a CNN capable of computing any kind of function?
  38. How to tweak the number of convolutions and pooling functions in each layer?
  39. What does pre-processing in CNN means?

Hope we have covered most of the questions that justify the magic of Convolutional Neural Networks. If you have any more questions about CNNs, please feel free to add in the comments.


Popular posts from this blog

Anomaly Detection based on Prediction - A Step Closer to General Artificial Intelligence

Anomaly detection refers to the problem of finding patterns that do not conform to expected behavior [1]. In the last article "Understanding Neocortex to Create Intelligence" , we explored how applications based on the workings of neocortex create intelligence. Pattern recognition along with prediction makes human brains the ultimate intelligent machines. Prediction help humans to detect anomalies in the environment. Before every action is taken, neocortex predicts the outcome. If there is a deviation from the expected outcome, neocortex detects anomalies, and will take necessary steps to handle them. A system which claims to be intelligent, should have anomaly detection in place. Recent findings using research on neocortex have made it possible to create applications that does anomaly detection. Numenta’s NuPIC using Hierarchical Temporal Memory (HTM) framework is able to do inference and prediction, and hence anomaly detection. HTM accurately predicts anomalies in real

Understanding Generative Adverserial Networks - Part 1

This is a two part series on understanding Generative Adversarial Networks (GANs). This part deals with the conceptual understanding of GANs. In the second part we will try to understand the mathematics behind GANs. Generative networks have been in use for quite a while now. And so have discriminative networks. But only in 2014 did someone get the brilliant idea of using them together. These are the generative adversarial networks. This kind of deep learning model was invented by Ian Goodfellow . When we work with data already labelled, it’s called supervised learning. It’s much easier compared to unsupervised learning, which has no predefined labels, making the task more vague.  "Generative Adversarial Networks is the most interesting idea in the last ten years in Machine Learning." - Yann LeCun In this post, we’ll discuss what GANs are and how they work, at a higher , more abstract level. Since 2014, many variations of the traditional GAN have co

Implement XOR in Tensorflow

XOR is considered as the 'Hello World' of Neural Networks. It seems like the best problem to try your first TensorFlow program. Tensorflow makes it easy to build a neural network with few tweaks. All you have to do is make a graph and you have a neural network that learns the XOR function. Why XOR? Well, XOR is the reason why backpropogation was invented in the first place. A single layer perceptron although quite successful in learning the AND and OR functions, can't learn XOR (Table 1) as it is just a linear classifier, and XOR is a linearly inseparable pattern (Figure 1). Thus the single layer perceptron goes into a panic mode while learning XOR – it can't just do that.  Deep Propogation algorithm comes for the rescue. It learns an XOR by adding two lines L1 and L2 (Figure 2). This post assumes you know how the backpropogation algorithm works. Following are the steps to implement the ne