Convolutional Neural Networks - Deep Learning Specialization 4
deeplearning.ai by Andrew Ng on Coursera
W1: Foundations of Convolutional Neural Networks
V1: computer vision problem
- types: classification / object detection / style transfer
- full-connected (FC) NN cannot handle high resolution pictures due to huge matrix after reshape an image as one dimension
Convolution in DL
V2&V3: edge detection example
V4: padding
- add zeros around the images
V5: strided convolutions
‘Convolution’ used in ML is actually cross-correlation in math, which is without slipping the filter (response function)
V6: convolution over volume
Convolution Neural Network
V7: one layer of convolutional network
V8: simple CNN example (only conv layer)
V9: pooling layer
Reduce the size of the representation / Speed up the computation / Make feature detection more robust
- even pooling has no parameters to be tuned, but it will affect the backpropagation calculation
V10: full CNN example
Why CNN works
V11: why CNN
W2: Deep Convolutional Models: Case Studies
For engineering work, the most efficient way is to do case study and read literature to learn from other’s CNN architecture, then apply on your own task
Classic CNN
V2: classic networks
ResNets
V3&V4: ResNets and why (compared to plain networks)
ResNet is to solve the problem of vanishing and exploding gradient in training very deep neural networks, and ResNet blocks with the shortcut makes it very easy for sandwiched blocks to learn an identity function (weight and bias)
However, using a deeper network doesn’t always help. A huge barrier to training them is vanishing gradients: very deep networks often have a gradient signal that goes to zero quickly, thus making gradient descent unbearably slow. More specifically, during gradient descent, as you backprop from the final layer back to the first layer, you are multiplying by the weight matrix on each step, and thus the gradient can decrease exponentially quickly to zero (or, in rare cases, grow exponentially quickly and “explode” to take very large values). During training, you might therefore see the magnitude (or norm) of the gradient for the earlier layers descrease to zero very rapidly as training proceeds. (cited from coding exercise)
Inception network
V5: 1x1 convolution
V6&V7: inception network
Transfer learning from open-source implementation
V8: search open-source implementation in Github
- Starting from other’s architecture is a common path of starting your own works
- Only reading paper is hard to replicate its architecture, so it is better to find out the shared implementations of that particular paper to start with
- Some open-source implementations also include pre-trained data, so that you can use it to do transfer learning making your progress even faster
V9: transfer learning
- almost always do transfer learning because it works very well on image recognition, but only do yourself training when you have extreme large dataset and enough computational budget
Data augmentation
V10: data augmentation
- commonly used in image recognition because it is always lack of data for this kind of task
- the best way is still starting from other’s implementation
State of computer vision and advises
V11: state of computer vision
W3: Object Detection
Convolution Implementation of Sliding
V1: object localization
V2: landmark detection
V3: object detection with sliding windows
V4: convolution X sliding windows
the way of reducing computation cost of sliding with CNN
YOLO Algorithm
V5: bounding box prediction
Utilize the idea of “FC–>convolution–>add sliding” (each $1\times1\times c_{output}$ in output represents one portion of the whole image), but use grid instead of sliding, then add bounding box into label
V6: intersection over union (IoU)
to evaluate the performance of object detector
V7: non-max suppression
to make sure each object is detected only once
V8: anchor boxes
to solve the rare case that multiple objects are assigned to single grid
V9: full YOLO algorithm
R-CNN
V10: R-CNN introduction (different from YOLO)
W4: Special Application: Face Recognition & Neural Style Transfer
Face Recognition
V1: face verification and recognition
V2: one-shot learning
V3: Siamese network
V4: triplet loss
How to train the “encoding” network above (loss function and training set)
V5: alternative training method – binary classification
Instead of triplet loss, the Siamese network can also be trained as a binary classification
Neural Style Transfer
V1: neural style transfer
V2: what is deep ConvNets learning
V3&V4&V5: cost function
Loss function can be defined to achieve the target, and it can grasp input from any layers of model (not just from last layer)