Skip to main content

50 Questions about Convolutional Neural Networks

Typical cnn


“Any sufficiently advanced technology is indistinguishable from magic.” - Arthur C. Clarke.

Well! Convolutional Neural Network (CNN) is such a technology. How it does, what it does is truly indistinguishable from magic. Read our earlier post - “From Cats to Convolutional Neural Networks”, to understand why CNNs come close to human intelligence. Although the inner workings of a CNN can be explained, the magic remains. Fascinated by CNNs, we thought of coming up with as many questions about CNNs to understand the mystery of why it is able to classify images or any kind of input so well.

  1. What is convolution?
  2. What is pooling?
  3. Which pooling function is preferred - Max or Average?
  4. What is the role of activation functions in CNN?
  5. Why is Relu prefered in CNN rather than Sigmoid?
  6. Why adding more layers increase the accuracy of the network?
  7. What is the intuition behind CNN?
  8. What is stride?
  9. Is it necessary to include zero-padding?
  10. What is parameter sharing, and why is it important?
  11. What would have happened if we would have not considered the pooling layer in CNN?
Why is pooling so important?
  1. What brings CNN closer to biological systems?
  2. How to decide on amount of training, test and validation data to be given to the network?
  3. What is cross-validation and why is it important?
  4. Which cross-validation technique is better - bootstrap or k-fold?
  5. When does a CNN fail?
  6. How can we know for certain that the network fails because of not providing adequate input or because it has less layers?
  7. What are the hidden layers doing?
  8. How does the backpropagation algorithm work across the network?
  9. Can one do continuous learning on CNN, or the training needs to be done first before conducting inference?
  10. Why are GPUs necessary to train a CNN?
  11. Why does using a pre-trained network increase the learning speed of new categories?
  12. When will we say a CNN is not able to learn?
  13. Why is it sufficient to only train the fully connected layer of a pre-trained network to train new categories.
  14. How important it is to provide right set of data to train a CNN?
  15. Can we use the features learned by the inside layers of a CNN?
  16. What is generalization?
  17. What is overfitting?
  18. Why is it important to apply distortions to input images to train an image classifier?
  19. What are hyper-parameters?
  20. What is an epoch?
  21. What decides the number of examples per epoch?
  22. What is gradient descent?
  23. What is a loss function?
  24. Why is cross-entropy the preferred cost function in CNN?
  25. Which one is better - Batch gradient descent or Stochastic gradient descent?
  26. What is the importance of learning rate in training a CNN?
  27. Which method is optimal - keeping the learning rate constant or changing it as the network becomes mature?
  28. How has CNN reduced the job of data scientists in terms of feature selection?
  29. Why starting the CNN’s training with random weights is preferable compared to starting it with zero weights?
  30. Why is Gaussian the preferred choice to choose random weights?
  31. How does regularization helps in preventing overfitting?
  32. How is a trained CNN evaluated?
  33. What is the importance of bias in training a CNN? Is it that significant in training a CNN?
  34. What are the best practices followed in CNNs?
  35. Why is training CNN a costly affair?
  36. Why can a CNN can be applied to any kind of learning, including images, Natural Language Processing and speech?
  37. Why is a CNN capable of computing any kind of function?
  38. How to tweak the number of convolutions and pooling functions in each layer?
  39. What does pre-processing in CNN means?

Hope we have covered most of the questions that justify the magic of Convolutional Neural Networks. If you have any more questions about CNNs, please feel free to add in the comments.

Comments

Popular posts from this blog

GPU - The brain of Artificial Intelligence

Machine Learning algorithms require tens and thousands of CPU based servers to train a model, which turns out to be an expensive activity. Machine Learning researchers and engineers are often faced with the problem of running their algorithms fast. Although initially invented for processing graphics in computer games, GPUs today are used in machine learning to perform feature detection from vast amount of unlabeled data. Compared to CPUs, GPUs take far less time to train models that perform classification and prediction. Characteristics of GPUs that make them ideal for machine learning Handle large datasets Needs far less data centre infrastructure Can be specialized for specific machine learning needs Perform vector computations faster than any known processor Designed to perform data parallel computation NVIDIA CUDA GPUs today are used to build deep learning image processing tools for  Adobe Creative Cloud. According to NVIDIA blog future Adobe applicati

Building Commonsense in AI

It is often debated that what makes humans the ultimate intelligent species is the innate quality of doing commonsense reasoning. Humans use common sense knowledge about the world around to take appropriate decisions, and this turns out to be the necessary ingredient for their survival. AI researches have long thought about building commonsense knowledge in AI. They argue that if AI possess necessary commonsense knowledge then it will be a truly intelligent machine. We will discuss two major commonsense projects that exploit this idea: Cyc tries to build a comprehensive ontology and knowledge base of everyday commonsense knowledge. This knowledge can be used by AI applications to do human-like reasoning. Started in 1984, Cyc has come a long way. Today, OpenCyc 4.0 includes the entire Cyc ontology, containing 239,000 concepts and 2,093,000 facts and can be browsed on the OpenCyc website - http://www.cyc.com/platform/opencyc/ . OpenCyc is available for download from Source

Understanding Projection Pursuit Regression

The following article gives an overview of the paper "Projection Pursuit Regression” published by Friedman J. H and Stuetzle W. You will need basic background of Machine Learning and Regression before understanding this article. The algorithms and images are taken from the paper. ( http://www.stat.washington.edu/courses/stat527/s13/readings/FriedmanStuetzle_JASA_1981.pdf )  What is Regression? Regression is a machine learning technology used to predict a response variable given multiple predictor variables or features. The main distinction is that the response to be predicted is any real value and not just any class or cluster name. Hence though similar to Classification in terms of making a prediction, it is largely different given what it’s predicting.  A simple to understand real world problem of regression would be predicting the sale price of a particular house based on it’s square footage, given that we have data of similar houses sold in that area in the past. T