Hello :) Today is Day 268!
A quick summary of today:
- continued with Graph Data Modelling Fundamentals
- decided to check UC Berkely’s free Intro to ML course
- started casually watching a lecture on the math behind neural nets
Today I did a bit of a few things.
Continue with Graph Data Modelling Fundamentals
Exercises to create new relationships
MATCH (sandy:User {name: 'Sandy Jones'})
MATCH (clinton:User {name: 'Clinton Spencer'})
MATCH (apollo:Movie {title: 'Apollo 13'})
MATCH (sleep:Movie {title: 'Sleepless in Seattle'})
MATCH (hoffa:Movie {title: 'Hoffa'})
MERGE (sandy)-[:RATED {rating:5}]->(apollo)
MERGE (sandy)-[:RATED {rating:4}]->(sleep)
MERGE (clinton)-[:RATED {rating:3}]->(apollo)
MERGE (clinton)-[:RATED {rating:3}]->(sleep)
MERGE (clinton)-[:RATED {rating:3}]->(hoffa)
Here, I created new ‘user rated movie’ relationships
Testing with Instance Model
I did some basic queries like who acted where, who directed what, etc - with such queries we can do basic tests on our models
Refactoring our graph
-
The graph as modeled does not answer all of the use cases
-
A new use case has come up that you must account for in your data model
-
The Cypher for the use cases does not perform optimally, especially when the graph scales
Steps for refactoring
To refactor a graph data model and a graph we must:
-
Design the new data model
-
Write Cypher code to transform the existing graph to implement the new data model
-
Retest all use cases, possibly with updated Cypher code
We can use the keyword PROFILE
to get a query plan just like in SQL - EXPLAIN
.
After refactoring, it is important to retest all queries that were affected and make sure they return the expected results.
Berkley University’s Intro to ML is available for free ?!
That is another top uni in the world that publishes its ML course for free and it is mindblowing. It is amazing. And I decided to start doing it. It is the 2024 Fall version so it is ongoing this semester.
The lecture videos are available, lecture slides are available, homeworks are available, homework solutions are available, the suggested book to read is available too. Everything!
Here is the link
I decided to check it out.
CS109 Probability for Software Engineers is a legendary course for me in terms of a deep, going back to bare basics intro that sets you up to learn ML. So I guess this can count as a next step after Stanford’s CS109 and a competitor for Stanford’s CS229 - Intro to ML taught by Andrew Ng.
Lecture 1: Intro
Classification
- linear classifier (logistic regression)
- knn
Neural networks - composing simple (logistic) regression functions over and over again
This is just the intro so it is not directly jumping into neural nets. The next lectures cover MLE, Guassians, linear regression, classification, grad. descent - all crucial basics to know before the neural net.
Reading 1.1 to 1.2.4 in Bishop & Bishop
The impact of DL
- medical diagnosis
- protein structure understanding and discovery
- image synthesis - generative AI
- LLMs
- linear models - predict the target for some input x; y = (x, w)
- error functions - a measure for the misfit between the function y, for any given value of w, and x. A popular choise is:
Lecture 2: MLE
Properties of MLE
- consistency: as we get more and mroe data (drawn from one distribution in our family), then we converge to estimating the true value of theta for D
- statistically efficient: making good use of the data (‘least variance’ param estimates)
-
the value of p(D theta) is invariant to re-parameterisation - MLE can still yield a parameter estimate even when the data were not generated from that family
Readings from Bishop & Bishop
- 2-2.1.2 (rules of probability: sum, product)
There are two kinds of uncertainty:
- systematic: because we only get to see datasets of finite size
- intrinsic/stochastic (noise): noise arises because weare able to observe only partial information about the world, so to reduce this we need to gather different kinds of data
Sum rule: p(X) = ∑p(X, Y)
Product rule: p(X, Y) = p(Y | X)p(X) |
- 2.1.6 (independent RVs)
If the joint distribution of two variables factorises into the product of the marginals - p(X,Y) = p(X)p(Y) then X and Y are said to be independent
- 2.2-2.2.1 (probability densities in continuous spaces)
If x and y are two real, variables, then the sum and product rules take the form:
- 2.3-2.3.2 (univariate Gaussian, likelihood)
- 3-3.1.3 (Bernoulli, binomial, multinomial, MLE)
These parts are pretty much the same as in the lecture pictures. But still nicely and explained in detail.
The Complete Mathematics of Neural Networks and Deep Learning
I think this 16 year old student is faily famous for this lecture video on the math behind neural nets. So I decided to relax and just watch it. It is great and with examples which is crucial.
It is material that I know and am comfortable with. Thanks to the almighty Andrej Karpathy and his forward/backprop ninja video. But I was curious how this student teaches the math so I started watching and it seems great.
Today I just covered:
PART I - Introduction
1.1 Prerequisites 1.2 Agenda 1.3 Notation 1.4 Big Picture 1.5 Matrix Calculus Review 1.5.1 Gradients 1.5.2 Jacobians 1.5.3 New Way of Seeing the Scalar Chain Rule 1.5.4 Jacobian Chain Rule
That is all for today!
See you tomorrow :)