Neural Networks and Deep Learning - Deep Learning Specialization 1
deeplearning.ai by Andrew Ng on Coursera
W1: Introduction to Deep Learning
Supervised Neural Network
Why Neural Network is Taking Off
W2: Neural Network Basic illustrated by Logistic Regression
Notation and Matrix
logistic regression weight stack in rows, and samples stack in columns (transposed matrix compared to traditional ML)
Logistic Regression
V2:
- weight and intercept is better to be separated in NN
- use sigmoid function to transform the result into [0,1] so that $\hat{y}$ is the probability of 1 given x
V3:
- special loss function for logistic regression, to make the optimization problem as convex so that gradient descent works to find global optima (not local optima)
- loss function is for single sample, while cost function is for entire training set (cost of your parameters; average of total loss function)
Gradient Descent and Computation Graph
V4: gradient descent
- step along the steepest direction on a convex surface started at the initial point
- step size is the learning rate
V7&V8: computation graph
- forward propagation (blue): yield the output
- backward propagation (red): compute gradients or derivative (use $dvar$ as notation for derivative of final output respected to intermediate variable)
In Logistic Regression Context
V9&10: gradient descent of logistic regression
- use vectorization to avoid large for loop
Vectorization
V1&V2: use built-in function to calculate(for example, np.dot = dot multiplication of two matrix), instead of using for loops
V3&V4: vectoring logistic regression
- forward propagation
- backward propagation and gradient descent
V5: broadcasting rule of python
- how python transform matrix when the size is unmatched during matrix calculation
V6: bug-free numpy coding tips
- avoid using rank 1 array, but commit shape when create the matrix
Programing Practice
- Implemented each function separately: initialize(), propagate(), optimize(). Then you built a model()
- use assert to verify shape or types of matrix
W3: Shallow Neural Network
Introduction of real structure of neural network, converting from computation graph of logistic regression
Neural Network Representation
Vectorization
V3: with single sample
- NN version is doing logistic regression multiple times
V4&V5: with multiple samples
Activation Functions and Gradient Descent
V6: different choices of activation functions
- activation function: determine the output
- for logistic regression, we normally use sigmoid as activation function, but some other functions are working better
V7&V8: why we need non-linear activation functions
- if use linear ones, no matter how many hidden layers it has, there is no difference for the output
- special case like regression model, may use linear activation function in output layer, but in hidden layers still need to use non-linear one
Neural network is like a system of non-linear superposition
V9&V10: gradient descent
Random Initialization
V11:
- if set 0 for all weight values, all hidden units/nodes/neurons will be calculated symmetrically, and the outputs are same for each units which is meaningless for NN
- we use random initialization to get small values to avoid stuck when using tanh or sigmoid
W4: Deep Neural Network
L-layer Neural Network Notation
V1: L-layer neural network notation
Forward and Backward Building Blocks
V2: forward propagation
- similar to 2 layer NN, but using for loop for multiple layers calculation
V3: matrix dimensions verification
- (like dimensional analysis in physics)
V4: why deep can work better
V5&V6: building blocks of deep NN
Hyperparameters
V7: hyperparameters
- hyperparameters determine the parameters