Neural Networks and Deep Learning - Deep Learning Specialization 1

deeplearning.ai by Andrew Ng on Coursera

W1: Introduction to Deep Learning

Supervised Neural Network

Alt text

Alt text

Why Neural Network is Taking Off

Alt text

W2: Neural Network Basic illustrated by Logistic Regression

Notation and Matrix

logistic regression weight stack in rows, and samples stack in columns (transposed matrix compared to traditional ML)

Alt text

Logistic Regression

V2:

  • weight and intercept is better to be separated in NN
  • use sigmoid function to transform the result into [0,1] so that $\hat{y}$ is the probability of 1 given x

Alt text

V3:

  • special loss function for logistic regression, to make the optimization problem as convex so that gradient descent works to find global optima (not local optima)
  • loss function is for single sample, while cost function is for entire training set (cost of your parameters; average of total loss function)

Alt text

Gradient Descent and Computation Graph

V4: gradient descent

  • step along the steepest direction on a convex surface started at the initial point
  • step size is the learning rate

Alt text

V7&V8: computation graph

  • forward propagation (blue): yield the output
  • backward propagation (red): compute gradients or derivative (use $dvar$ as notation for derivative of final output respected to intermediate variable)

Alt text

In Logistic Regression Context

V9&10: gradient descent of logistic regression

  • use vectorization to avoid large for loop

Alt text

Vectorization

V1&V2: use built-in function to calculate(for example, np.dot = dot multiplication of two matrix), instead of using for loops

V3&V4: vectoring logistic regression

  • forward propagation
  • backward propagation and gradient descent

Alt text

V5: broadcasting rule of python

  • how python transform matrix when the size is unmatched during matrix calculation

Alt text

V6: bug-free numpy coding tips

  • avoid using rank 1 array, but commit shape when create the matrix

Alt text

Programing Practice

  • Implemented each function separately: initialize(), propagate(), optimize(). Then you built a model()
  • use assert to verify shape or types of matrix

W3: Shallow Neural Network

Introduction of real structure of neural network, converting from computation graph of logistic regression

Neural Network Representation

Alt text

Vectorization

V3: with single sample

  • NN version is doing logistic regression multiple times Alt text

V4&V5: with multiple samples Alt text

Alt text

Activation Functions and Gradient Descent

V6: different choices of activation functions

  • activation function: determine the output
  • for logistic regression, we normally use sigmoid as activation function, but some other functions are working better

Alt text

V7&V8: why we need non-linear activation functions

  • if use linear ones, no matter how many hidden layers it has, there is no difference for the output
  • special case like regression model, may use linear activation function in output layer, but in hidden layers still need to use non-linear one

    Neural network is like a system of non-linear superposition Alt text

V9&V10: gradient descent Alt text

Random Initialization

V11:

  • if set 0 for all weight values, all hidden units/nodes/neurons will be calculated symmetrically, and the outputs are same for each units which is meaningless for NN
  • we use random initialization to get small values to avoid stuck when using tanh or sigmoid Alt text

W4: Deep Neural Network

L-layer Neural Network Notation

V1: L-layer neural network notation Alt text

Forward and Backward Building Blocks

V2: forward propagation

  • similar to 2 layer NN, but using for loop for multiple layers calculation

V3: matrix dimensions verification

  • (like dimensional analysis in physics) Alt text

V4: why deep can work better Alt text

V5&V6: building blocks of deep NN Alt text

Hyperparameters

V7: hyperparameters

  • hyperparameters determine the parameters Alt text

Alt text

PREVIOUSImproving Deep Neural Networks - Deep Learning Specialization 2
NEXTApplied Machine Learning - Microsoft Certificate in Data Science 9a