(Day 296) Neo4j & LLM Fundamentals

Ivan Ivanov · October 23, 2024

Hello :) Today is Day 296!

A quick summary of today:

covered creating a PD model from the udemy course
did a neo4j & langchain course on neo4j’s website (on stream)

Continuing with the credit risk modelling course on udemy

Overall, this course seems such a great introduction to credit risk modelling. It introduces popular concepts and measures, and how to derive them in python. The code is not even on github, so I am a bit concious of sharing it here as it is not public at all. Therefore I will just share some key insights from each part.

Section 5, 6: PD model data prep + model creating

Interestingly, it introduced using Weight of Evidence (WoE) and Information Value (IV) for grouping metrics into bins. Unintentionally, I used this method in one of me own projects - Insurance fraud detection so I am aware of how it works but still am excited to see it being applied here as well.

WoE - to what extent an independent variable would predict a dependente var. It shows the extent to which each of the different categories of an independent variable explains the dependent one

WoE for a category i is equal to the log of the ratio between the % of class 1 belonging to i divided by the % of class 0 depending to i. The further away from 0 the better

Information Value - how much information the original independent variable brings w.r.t. explaining the dependent variable. IV can be used for preselection

Section 7 - evaluation

Three metrics were introduced

AUC (ROC-AUC)
Gini (coming from economics) - plots the cumulative proportion of defaulted borrowers as a function of the cumulative proportion of all borrowers
Kolmogorov-Smirnov coefficient - shows to what extent the model separates the actual good from the actual bad borrowers; formally, it is the maximum difference between the cumulative distributions of good and bad borrowers w.r.t. predicted probabilities. The greater the difference, the better the model

Section 8 - decision-making

We can use scorecards or PDs (both can be derived from each other). Furthermore we can set cut-offs to accept applicants based on a PD threshold (which is usually better than setting a (FICO) score cut-off)

Section 9 - monitoring

We can use the Population Stability Index (PSI) to evaluate whether we need to re-train our model because there has been a change in the population w.r.t. a certain feature using categories of the feature.

We can use the table below for guidance.

Values of PSI (0 - 1)	Population Difference
PSI = 0	No difference
PSI < 0.1	Little to no difference
0.1 < PSI < 0.25	Little difference (No action is taken)
PSI > 0.25	Big difference (Action is taken)
PSI = 1	Absolute difference

This is what’s left for tomorrow:

Completed Neo4j & LLM Fundamentals on stream

Had to cut in the middle because of problems with Streamlabs. Overall, a very nice course. It says it is 4 hours but I did it in under 2 hours. Definitely a good course ^^

Screenshot at Oct 23 19-44-42

That is all for today!

See you tomorrow :)