Convolutional Neural Networks - Deep Learning Specialization 4

deeplearning.ai by Andrew Ng on Coursera

W1: Foundations of Convolutional Neural Networks

V1: computer vision problem

types: classification / object detection / style transfer
full-connected (FC) NN cannot handle high resolution pictures due to huge matrix after reshape an image as one dimension

Convolution in DL

V2&V3: edge detection example Alt text

V4: padding

add zeros around the images

V5: strided convolutions

‘Convolution’ used in ML is actually cross-correlation in math, which is without slipping the filter (response function)

V6: convolution over volume Alt text

Convolution Neural Network

V7: one layer of convolutional network Alt text

V8: simple CNN example (only conv layer) Alt text

V9: pooling layer

Reduce the size of the representation / Speed up the computation / Make feature detection more robust

even pooling has no parameters to be tuned, but it will affect the backpropagation calculation

V10: full CNN example Alt text

Why CNN works

V11: why CNN

Alt text

W2: Deep Convolutional Models: Case Studies

For engineering work, the most efficient way is to do case study and read literature to learn from other’s CNN architecture, then apply on your own task

Classic CNN

V2: classic networks Alt text

ResNets

V3&V4: ResNets and why (compared to plain networks)

ResNet is to solve the problem of vanishing and exploding gradient in training very deep neural networks, and ResNet blocks with the shortcut makes it very easy for sandwiched blocks to learn an identity function (weight and bias)

However, using a deeper network doesn’t always help. A huge barrier to training them is vanishing gradients: very deep networks often have a gradient signal that goes to zero quickly, thus making gradient descent unbearably slow. More specifically, during gradient descent, as you backprop from the final layer back to the first layer, you are multiplying by the weight matrix on each step, and thus the gradient can decrease exponentially quickly to zero (or, in rare cases, grow exponentially quickly and “explode” to take very large values). During training, you might therefore see the magnitude (or norm) of the gradient for the earlier layers descrease to zero very rapidly as training proceeds. (cited from coding exercise)

Alt text

Inception network

V5: 1x1 convolution Alt text

V6&V7: inception network Alt text

Transfer learning from open-source implementation

V8: search open-source implementation in Github

Starting from other’s architecture is a common path of starting your own works
Only reading paper is hard to replicate its architecture, so it is better to find out the shared implementations of that particular paper to start with
Some open-source implementations also include pre-trained data, so that you can use it to do transfer learning making your progress even faster

V9: transfer learning

almost always do transfer learning because it works very well on image recognition, but only do yourself training when you have extreme large dataset and enough computational budget

Data augmentation

V10: data augmentation

commonly used in image recognition because it is always lack of data for this kind of task
the best way is still starting from other’s implementation

State of computer vision and advises

V11: state of computer vision Alt text

W3: Object Detection

Convolution Implementation of Sliding

V1: object localization Alt text

V2: landmark detection Alt text

V3: object detection with sliding windows Alt text

V4: convolution X sliding windows

the way of reducing computation cost of sliding with CNN

YOLO Algorithm

V5: bounding box prediction

Utilize the idea of “FC–>convolution–>add sliding” (each $1\times1\times c_{output}$ in output represents one portion of the whole image), but use grid instead of sliding, then add bounding box into label