(Day 270) DE course by Joe Reis - completed

Ivan Ivanov · September 27, 2024

Hello :) Today is Day 270!

A quick summary of today:

Firstly, the DE course by Joe Reis

image

Today on DeepLearning.AI’s community blog I found that the errors associated with the last capstone projects were fixed so I went in and completed them.

image

🥳🥳🥳

This course was amazing. Very in-depth on the way of thinking as a DE but also provides a good deal of practice through AWS’ many different tools.

At the end Joe shared his opinion on where the DE field is going and how the role of a DE might look:

image

Next week, on stream, I might review what I learned in depth so that I can write a better review of the course.

The matrix calculus you need for deep learning

I got this paper from yesterday’s lecture and it is 100% awesome! I printed it and made a small booklet, and read it.

image

Definitely in my ‘to-recommend’ sources on math for DL.

DL by Bishop & Bishop

Today when I went to the library I read the above paper, and after B&B’s book on DL. I did not record myself reading the paper, but after I finished it I realised I better turn on a timelapse video on my phone.

Here is the video on my youtube channel

Today I covered:

2 Probabilities

2.1 The Rules of Probability

  • 2.1.1 A medical screening example
  • 2.1.2 The sum and product rules
  • 2.1.3 Bayes’ theorem
  • 2.1.4 Medical screening revisited
  • 2.1.5 Prior and posterior probabilities
  • 2.1.6 Independent variables

2.2 Probability Densities

  • 2.2.1 Example distributions
  • 2.2.2 Expectations and covariances

2.3 The Gaussian Distribution

  • 2.3.1 Mean and variance
  • 2.3.2 Likelihood function
  • 2.3.3 Bias of maximum likelihood
  • 2.3.4 Linear regression

2.4 Transformation of Densities

  • 2.4.1 Multivariate distributions

2.5 Information Theory

  • 2.5.1 Entropy
  • 2.5.2 Physics perspective
  • 2.5.3 Differential entropy
  • 2.5.4 Maximum entropy
  • 2.5.5 Kullback–Leibler divergence
  • 2.5.6 Conditional entropy
  • 2.5.7 Mutual information

2.6 Bayesian Probabilities

  • 2.6.1 Model parameters
  • 2.6.2 Regularization
  • 2.6.3 Bayesian machine learning

3 Standard Distributions

3.1 Discrete Variables

  • 3.1.1 Bernoulli distribution
  • 3.1.2 Binomial distribution
  • 3.1.3 Multinomial distribution

3.2 The Multivariate Gaussian

  • 3.2.1 Geometry of the Gaussian
  • 3.2.2 Moments
  • 3.2.3 Limitations
  • 3.2.4 Conditional distribution
  • 3.2.5 Marginal distribution
  • 3.2.6 Bayes’ theorem
  • 3.2.7 Maximum likelihood
  • 3.2.8 Sequential estimation
  • 3.2.9 Mixtures of Gaussian

3.3 Periodic Variables

  • 3.3.1 Von Mises distribution

3.4 The Exponential Family

  • 3.4.1 Sufficient statistics

3.5 Nonparametric Methods

  • 3.5.1 Histograms
  • 3.5.2 Kernel densities
  • 3.5.3 Nearest-neighbours

4 Single-layer Networks: Regression

4.1 Linear Regression

  • 4.1.1 Basis functions
  • 4.1.2 Likelihood function
  • 4.1.3 Maximum likelihood
  • 4.1.4 Geometry of least squares
  • 4.1.5 Sequential learning
  • 4.1.6 Regularized least squares
  • 4.1.7 Multiple outputs

4.2 Decision theory

4.3 The Bias–Variance Trade-off

One thing I was thinking when reading the KL-convergence part was why are we introducing this at the start of the book when the next thing we will talk about is regression in a single neuron. KL-convergence is not really applicable to regression. From a beginner’s point of view, reading about KL-convergence so early might be intimidating, and a better time to include it might be when it’s actually used, like before variational autoencoders. Introducing it in a more relevant context could help learners better understand its significance and application. But as it is part of Information Theory, it is also understandable to include it in that part of the book. (And I have seen this pattern in other books as well but it’s fine, just commenting)