(Day 44) Batch vs Layer vs Group Normalization and GANs (+ found a free KAIST AI course)

Ivan Ivanov · February 14, 2024

Hello :) Today is Day 44!

A quick summary of today:

  • Found KAIST Professor Choi’s Programming for AI lectures link
  • Discovered that there is Layer and Group norm (not only Batch norm)
  • Learned about GANs

1) Batch vs Layer vs Group normalization methods

image

In one of Professor Choi’s lectures, he explained about the above three norm layers (I have not heard of the 2nd 3rd 4th), plus some searches online, I found the difference.

Firstly, batch norm: given a batch of activations for a specific layer, the mean and std for the batch is calculated. Then, it subtracts the mean and divides by the std to normalize the values. (+ an epsilon is added to the standard deviation for numerical stability) following that, a scale factor “gamma” and shift factor “beta” which are learnable parameters are applied.

Secondly, layer norm: proposed in 2016, layer norm operates over the feature dimension (i.e., it calculates the mean and variance for each instance separately, over all the features) but unlike batch norm, it doesn’t depend on the batch size so it’s often used in recurrent models where batch norm performs poorly. Layer norm computes the mean and std across each individual observation instead (over all channels in case of images) rather than across the batch. This makes it batch-size independent and can therefore be used in models like RNNs or in transformer models.

Thirdly, instance norm: here, mean and variance are calculated for each individual channel for each individual sample across both spatial dimensions.

2) Generative adversarial network (GAN) I finally got to meet GANs. What are GANs? GANs are networks that aim to understand the given data and recreate it. It’s ‘learning’ process is interesting in that, it ‘fights with itself’. There are 2 networks - a generator and a discriminator. The generator tries to create images as if they came from the training set and trick the discriminator, while the discriminator tries to distinguish between the actual training images and the ones coming from the generator.

image

A simple Generator:

image

And a discriminator:

image

They both use their own optimizer:

image

First you train with real images and then with fake.

image

And then you update the generator

image

To get a different perspective, I went to PyTorch’s tutorial page for GAN (more specifically - DCGAN)

A DCGAN is a more complex version of GAN that uses more techniques like batch norm, leakyReLU, and conv layers.

Their GAN makes use of people faces dataset

image

The generator and discriminator work against each other

image

The generator takes noise initially and turns it into an image to trick the discriminator:

image

While the discriminator takes an image and outputs whether it is fake or real

image

I ran their code on my laptop. Initially 5 epochs ran for 4 hours, and then I realised the device is ‘cuda’ but that does not work on mac, as it should be ‘mps’, so I was running it on cpu, wasting time :/ thinking “wow this must be computationally expensive” hahaha (model complexity is something to learn on my list)

After the 5 long epochs: the resulting faces are:

image

And this is after 19 epochs (I am going to bed now so I stopped it)

image

It is kinda cool. But …

image

Obviously, DALLE-2 and midjourney can create way better fake images than my laptop :D but knowing the concept is key here.

Thank you to Professor Choi from KAIST for sharing his lectures to the world.

Side note! While I was looking for GAN info, I found a tweet from 2016 from Andrej Karpathy

image

A joke or not, in 2021 it was the recommended by google haha

image

And under those tweets, I saw that he left OpenAI and wants to focus on his personal projects

image

Such as youtube videos

image

More Andrej Karpathy youtube videos - looking forward to them!

That is all for today!

See you tomorrow :)