Hello :) Today is Day 296!
A quick summary of today:
- covered creating a PD model from the udemy course
- did a neo4j & langchain course on neo4j’s website (on stream)
Continuing with the credit risk modelling course on udemy
Overall, this course seems such a great introduction to credit risk modelling. It introduces popular concepts and measures, and how to derive them in python. The code is not even on github, so I am a bit concious of sharing it here as it is not public at all. Therefore I will just share some key insights from each part.
Section 5, 6: PD model data prep + model creating
Interestingly, it introduced using Weight of Evidence (WoE) and Information Value (IV) for grouping metrics into bins. Unintentionally, I used this method in one of me own projects - Insurance fraud detection so I am aware of how it works but still am excited to see it being applied here as well.
WoE - to what extent an independent variable would predict a dependente var. It shows the extent to which each of the different categories of an independent variable explains the dependent one
WoE for a category i is equal to the log of the ratio between the % of class 1 belonging to i divided by the % of class 0 depending to i. The further away from 0 the better
Information Value - how much information the original independent variable brings w.r.t. explaining the dependent variable. IV can be used for preselection
Section 7 - evaluation
Three metrics were introduced
- AUC (ROC-AUC)
- Gini (coming from economics) - plots the cumulative proportion of defaulted borrowers as a function of the cumulative proportion of all borrowers
- Kolmogorov-Smirnov coefficient - shows to what extent the model separates the actual good from the actual bad borrowers; formally, it is the maximum difference between the cumulative distributions of good and bad borrowers w.r.t. predicted probabilities. The greater the difference, the better the model
Section 8 - decision-making
We can use scorecards or PDs (both can be derived from each other). Furthermore we can set cut-offs to accept applicants based on a PD threshold (which is usually better than setting a (FICO) score cut-off)
Section 9 - monitoring
We can use the Population Stability Index (PSI) to evaluate whether we need to re-train our model because there has been a change in the population w.r.t. a certain feature using categories of the feature.
We can use the table below for guidance.
Values of PSI (0 - 1) | Population Difference |
---|---|
PSI = 0 | No difference |
PSI < 0.1 | Little to no difference |
0.1 < PSI < 0.25 | Little difference (No action is taken) |
PSI > 0.25 | Big difference (Action is taken) |
PSI = 1 | Absolute difference |
This is what’s left for tomorrow:
Completed Neo4j & LLM Fundamentals on stream
Had to cut in the middle because of problems with Streamlabs. Overall, a very nice course. It says it is 4 hours but I did it in under 2 hours. Definitely a good course ^^
That is all for today!
See you tomorrow :)