(Day 277) Neo4j Professional Certificate attempt 1

Ivan Ivanov · October 4, 2024

gnn traditional-machine-learning statistics

Hello :) Today is Day 277!

A quick summary of today:

reading a bit of The Elements of Statistical Learning
attempting the Neo4j Professional Certificate

Reading a bit of The Elements of Statistical Learning

I think this is the book I was looking for. Not sure why I am starting to read it 10 months into my journey but it is what I was looking for. Going in-depth on ‘classical’ models - linear regression and classificaion, trees, boosting.

I went to a cafe to read it, but I could not film myself 😆.

One thing I noticed was the usage of the below two pics

To showcase the difference between a simple OLS and a KNN, this and DL by Bishop&Bishop used those pics, and most likely other books have used them as well. They are good pics and a nice representation of the clear different between the two methods.

I had a more in-depth look into chapter 3

1. Linear Regression Model

models the relationship between input X and the response Y as a linear combination
the goal is to estimate coefficients beta that minimize the residual sum of squares (RSS), representing the difference between predicted and actual values

2. Least Squares Estimation

the OLS method minimizes RSS to estimate parameters beta
OLS is unbiased and computationally efficient but may struggle with multicollinearity or large sets of predictors

3. Subset Selection

Techniques for selecting a subset of input features to reduce model complexity while maintaining accuracy:
- best subset selection
- forward selection
- backward selection

4. Shrinkage Method

Ridge Regression: Adds a penalty based on the size of the coefficients, shrinking them toward zero to reduce variance at the cost of a small bias
Lasso: Similar to ridge, but uses the absolute value of the coefficients, making it useful for feature selection as some coefficients may be shrunk to zero

5. Principal Components Regression (PCR) and Partial Least Squares (PLS)

PCR: Reduces dimensionality by transforming predictors into principal components and performing regression on these components
PLS: Similar to PCR but also takes the response variable into account, often leading to better predictive performance

I need to read and take notes for this book. It reminds me of ISLR

Attempting the Neo4j Professional Certificate

There is no proctor or screen checking so I was surprised I could just check the docs or google for answers. I tried to minimise that just to see what I actually know, and above the the result haha. I didn’t pass however the feedback is great so now I know what I need to focus on to improve and hopefully pass next time.

That is all for today!

See you tomorrow :)