Hello :) Today is Day 27!
A quick summary of today:
- covered the 2nd course from DeepLeaning.AI’s DL specialization
First of all, I finished what I started yesterday
I made a simple deep learning model from scratch.
-
Init params
-
do forward prop, compute cost, do backward prop, update params
I don’t know if people do this in practice, but I think it helped me to understand the neural network model more deeply.
Now for the Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization course
It introduced this basic recipe for ML
To overcome bias or variance there are many methods but one of them is to normalize the data
After normalizing, finding the minimum is way easier.
We can also do different initializations of the Weights and Biases of the layers
-
In this case W, b are set to 0
-
In this case, W is a large number, b is 0
-
There is this He initialization
parameters['W' + str(l)] = np.random.randn(layers_dims[l], layers_dims[l-1])*np.sqrt(2./layers_dims[l-1]) parameters['b' + str(l)] = np.zeros((layers_dims[l], 1))*np.sqrt(2./layers_dims[l-1])
Different inits lead to different results. Random inits are good for breaking symmetry and make sure different hidden layers learn different things.
Next, regularization
-
A model without
-
Using L2 norm
-
Using a dropout layer
Next, optimization
The Adam optimizer has 3 hyperparams that can affect its effects
- Beta1: typically set close to 1 (but less than 1), controls the exponential decay rate for the first moment estimates (the mean of the gradients).
- Beta2: Similar to beta1, beta2 is another exponential decay rate parameter, typically also set close to 1. It controls the decay rate for the second moment estimates (the uncentered variance of the gradients). A common default value for beta2 is 0.999.
- Epsilon: This parameter is a small constant added to the denominator to prevent division by zero and to improve numerical stability. It ensures that the optimizer’s calculations don’t explode when the denominator approaches zero. A typical value for epsilon is around 1e-8.
Also, I was given a learning rate optimization method
Among these, the next one is supposed to be the most effective
Another regularization technique is batch normalization
It is a method used to normalize the pre-activation values (z) in neural networks, allowing for more stable training. It employs parameters, beta and gamma, to adjust the mean and standard deviation of the normalized values.
That is all for today!
See you tomorrow :)