Neural Networks and Deep Learning - Deep Learning Specialization 1

deeplearning.ai by Andrew Ng on Coursera

W1: Introduction to Deep Learning

Supervised Neural Network

Alt text

Why Neural Network is Taking Off

Alt text

W2: Neural Network Basic illustrated by Logistic Regression

Notation and Matrix

logistic regression weight stack in rows, and samples stack in columns (transposed matrix compared to traditional ML)

Alt text

Logistic Regression

V2:

weight and intercept is better to be separated in NN
use sigmoid function to transform the result into [0,1] so that $\hat{y}$ is the probability of 1 given x

Alt text

V3:

special loss function for logistic regression, to make the optimization problem as convex so that gradient descent works to find global optima (not local optima)
loss function is for single sample, while cost function is for entire training set (cost of your parameters; average of total loss function)

Alt text

Gradient Descent and Computation Graph

V4: gradient descent

step along the steepest direction on a convex surface started at the initial point
step size is the learning rate

Alt text

V7&V8: computation graph

forward propagation (blue): yield the output
backward propagation (red): compute gradients or derivative (use $dvar$ as notation for derivative of final output respected to intermediate variable)

Alt text

In Logistic Regression Context

V9&10: gradient descent of logistic regression

use vectorization to avoid large for loop

Alt text

Vectorization

V1&V2: use built-in function to calculate(for example, np.dot = dot multiplication of two matrix), instead of using for loops

V3&V4: vectoring logistic regression

forward propagation
backward propagation and gradient descent

Alt text

V5: broadcasting rule of python

how python transform matrix when the size is unmatched during matrix calculation

Alt text

V6: bug-free numpy coding tips

avoid using rank 1 array, but commit shape when create the matrix

Alt text

Programing Practice

Implemented each function separately: initialize(), propagate(), optimize(). Then you built a model()
use assert to verify shape or types of matrix

W3: Shallow Neural Network

Introduction of real structure of neural network, converting from computation graph of logistic regression

Neural Network Representation

Alt text

Vectorization

V3: with single sample

NN version is doing logistic regression multiple times

V4&V5: with multiple samples Alt text

Alt text

Activation Functions and Gradient Descent

V6: different choices of activation functions

activation function: determine the output
for logistic regression, we normally use sigmoid as activation function, but some other functions are working better

Alt text

V7&V8: why we need non-linear activation functions

if use linear ones, no matter how many hidden layers it has, there is no difference for the output
special case like regression model, may use linear activation function in output layer, but in hidden layers still need to use non-linear one

Neural network is like a system of non-linear superposition

V9&V10: gradient descent Alt text

Random Initialization

V11:

if set 0 for all weight values, all hidden units/nodes/neurons will be calculated symmetrically, and the outputs are same for each units which is meaningless for NN
we use random initialization to get small values to avoid stuck when using tanh or sigmoid