(Day 27) Improving deep learning models

Ivan Ivanov · January 28, 2024

Hello :) Today is Day 27!

A quick summary of today:

First of all, I finished what I started yesterday

I made a simple deep learning model from scratch.

  1. Init params image

  2. do forward prop, compute cost, do backward prop, update params image

I don’t know if people do this in practice, but I think it helped me to understand the neural network model more deeply.

Now for the Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization course

It introduced this basic recipe for ML

image

To overcome bias or variance there are many methods but one of them is to normalize the data

image

After normalizing, finding the minimum is way easier.

We can also do different initializations of the Weights and Biases of the layers

  1. In this case W, b are set to 0 image image

  2. In this case, W is a large number, b is 0 image image

  3. There is this He initialization

    parameters['W' + str(l)] = np.random.randn(layers_dims[l], layers_dims[l-1])*np.sqrt(2./layers_dims[l-1])
         parameters['b' + str(l)] = np.zeros((layers_dims[l], 1))*np.sqrt(2./layers_dims[l-1])
    

image image

Different inits lead to different results. Random inits are good for breaking symmetry and make sure different hidden layers learn different things.

Next, regularization

  1. A model without image image

  2. Using L2 norm image image

  3. Using a dropout layer image image

Next, optimization

The Adam optimizer has 3 hyperparams that can affect its effects

  • Beta1: typically set close to 1 (but less than 1), controls the exponential decay rate for the first moment estimates (the mean of the gradients).
  • Beta2: Similar to beta1, beta2 is another exponential decay rate parameter, typically also set close to 1. It controls the decay rate for the second moment estimates (the uncentered variance of the gradients). A common default value for beta2 is 0.999.
  • Epsilon: This parameter is a small constant added to the denominator to prevent division by zero and to improve numerical stability. It ensures that the optimizer’s calculations don’t explode when the denominator approaches zero. A typical value for epsilon is around 1e-8.

Also, I was given a learning rate optimization method

image

Among these, the next one is supposed to be the most effective

image

Another regularization technique is batch normalization

image image

It is a method used to normalize the pre-activation values (z) in neural networks, allowing for more stable training. It employs parameters, beta and gamma, to adjust the mean and standard deviation of the normalized values.

image


That is all for today!

See you tomorrow :)

Original post in Korean