(Day 270) DE course by Joe Reis - completed

Ivan Ivanov · September 27, 2024

Hello :) Today is Day 270!

A quick summary of today:

completed DeepLearning.AI Data Engineering Professional Certificate but will review it on stream next week
read The matrix calculus you need for deep learning
started reading DL by Bishop & Bishop

Firstly, the DE course by Joe Reis

Today on DeepLearning.AI’s community blog I found that the errors associated with the last capstone projects were fixed so I went in and completed them.

🥳🥳🥳

This course was amazing. Very in-depth on the way of thinking as a DE but also provides a good deal of practice through AWS’ many different tools.

At the end Joe shared his opinion on where the DE field is going and how the role of a DE might look:

Next week, on stream, I might review what I learned in depth so that I can write a better review of the course.

The matrix calculus you need for deep learning

I got this paper from yesterday’s lecture and it is 100% awesome! I printed it and made a small booklet, and read it.

Definitely in my ‘to-recommend’ sources on math for DL.

DL by Bishop & Bishop

Today when I went to the library I read the above paper, and after B&B’s book on DL. I did not record myself reading the paper, but after I finished it I realised I better turn on a timelapse video on my phone.

Here is the video on my youtube channel

Today I covered:

2 Probabilities

2.1 The Rules of Probability

2.1.1 A medical screening example
2.1.2 The sum and product rules
2.1.3 Bayes’ theorem
2.1.4 Medical screening revisited
2.1.5 Prior and posterior probabilities
2.1.6 Independent variables

2.2 Probability Densities

2.2.1 Example distributions
2.2.2 Expectations and covariances

2.3 The Gaussian Distribution

2.3.1 Mean and variance
2.3.2 Likelihood function
2.3.3 Bias of maximum likelihood
2.3.4 Linear regression

2.4 Transformation of Densities

2.4.1 Multivariate distributions

2.5 Information Theory

2.5.1 Entropy
2.5.2 Physics perspective
2.5.3 Differential entropy
2.5.4 Maximum entropy
2.5.5 Kullback–Leibler divergence
2.5.6 Conditional entropy
2.5.7 Mutual information

2.6 Bayesian Probabilities

2.6.1 Model parameters
2.6.2 Regularization
2.6.3 Bayesian machine learning

3 Standard Distributions

3.1 Discrete Variables

3.1.1 Bernoulli distribution
3.1.2 Binomial distribution
3.1.3 Multinomial distribution

3.2 The Multivariate Gaussian

3.2.1 Geometry of the Gaussian
3.2.2 Moments
3.2.3 Limitations
3.2.4 Conditional distribution
3.2.5 Marginal distribution
3.2.6 Bayes’ theorem
3.2.7 Maximum likelihood
3.2.8 Sequential estimation
3.2.9 Mixtures of Gaussian

3.3 Periodic Variables

3.3.1 Von Mises distribution

3.4 The Exponential Family

3.4.1 Sufficient statistics

3.5 Nonparametric Methods

3.5.1 Histograms
3.5.2 Kernel densities
3.5.3 Nearest-neighbours

4 Single-layer Networks: Regression

4.1 Linear Regression

4.1.1 Basis functions
4.1.2 Likelihood function
4.1.3 Maximum likelihood
4.1.4 Geometry of least squares
4.1.5 Sequential learning
4.1.6 Regularized least squares
4.1.7 Multiple outputs

4.2 Decision theory

4.3 The Bias–Variance Trade-off

One thing I was thinking when reading the KL-convergence part was why are we introducing this at the start of the book when the next thing we will talk about is regression in a single neuron. KL-convergence is not really applicable to regression. From a beginner’s point of view, reading about KL-convergence so early might be intimidating, and a better time to include it might be when it’s actually used, like before variational autoencoders. Introducing it in a more relevant context could help learners better understand its significance and application. But as it is part of Information Theory, it is also understandable to include it in that part of the book. (And I have seen this pattern in other books as well but it’s fine, just commenting)

That is all for today!

See you tomorrow :)