(Day 43) Coding up LeNet, VGG, InceptionNet, UNet from scratch

Ivan Ivanov · February 13, 2024

Hello :) Today is Day 43!

A quick summary of today:

  • write LeNet from scratch
  • write VGG from scratch
  • write InceptioNet from scratch
  • write UNet from scratch (again)

Wanting to understand the popular models a bit more, I decided to do the above.

1) Let’s begin with LeNet.

A basic framework developed in the 1990s, basic but set the groundwork for networks like AlexNet, VGG and ResNet.

image image

consists of 2 conv layers, each followed by a maxpool, and then ending with 2 fully connected (linear) layer.

2) Next is VGG

image

The paper proposes numerous versions, VGG11, VGG13, VGG16, VGG19 but from a google search VGG16 seems most popular (version D in the pic). It is deeper than the earlier LeNet, consisting of multiple conv+maxpool layers, each increasing the amount of filters, and decreasing the size of the image.

image

instead of 1 version, a general model was created so that it can adapt to the desired VGG architecture

below is the implementation. I think this is a nice set-up for testing the 4 versions on 1 dataset of my choosing.

image image

3) GoogLeNet / InceptionNet

This is a long one… haha. Funny that a research paper references a meme (and it got the name inception from it)

image

GoogleNet features inception modules - which consist of parallel conv branches of different receptive field sizes.Also 1x1 convolutions reduce dimensionality and improve efficiency.Towards the end, global avgpool replaces fully connected layers for fixed-length feature vectors.

image

and the implementation:

individual convolution block

image

individual inception block

image

And the final GoogLeNet

image image

4) UNet

Yesterday I attemped it and I think I got a good kind-of working model, but I was not sure if it is correct or not because I translated it from tensorflow to pytorch. Today I found another version of UNet online and pasted it into my records (thank you to this youtuber)

image

Pretty similar to yesterday. Here is the double conv part, and the init of the Unet itself.

image

and then the forward

image

Actually I decided to even test this UNet on a human image dataset in kaggle

image

After 50 epochs, with learning_rate 0.01 and batch_size of 16, using DiceLoss and BCEWithLogitsLoss, the best model I got is: epoch: 42, train loss: 0.2380, valid loss: 0.3435. I am running a 100 epoch one with a learning rate scheduler. It will take long but the results (hopefully good) can be seen here in Kaggle.

Sample outputs from the 50 epoch model are:

image

I should try IoU, or maybe another combination of losses too.

Side note! Turns out I did not properly install pytorch on my laptop and was running things on cpu, which is a disaster, but today I set it up properly. (but still using kaggle is a bit faster haha).

That is all for today!

See you tomorrow :)