(Day 295) Starting a Credit Risk Modelling course on Udemy

Ivan Ivanov · October 22, 2024

Hello :) Today is Day 295!

A quick summary of today:

read about optimisation algorithms
read about financial calculus
started a course on udemy
streamed

Optimization Algorithms Chapter 1 - Introduction to search and optimization

I found this book randomly and decided to check the 1st few chapters which cover introductory knowledge to optimisation.

Why care about search and optimisation?

Search is the systematic examination of states, starting from an initial state and ending (hopefully) at the goal state. Optimization techniques are in reality search methods, where the goal is to find an optimal or a near-optimal state within the feasible search space. This feasible search space is a subset of the optimization problem space where all the problem’s constraints are satisfied.

Going from toy problems to the real world

Examples of classic and real-world optimisation problems:

design problems: situations when time is not as important as the quality of the solution and users are willing to wait to get optimal solutions
planning problems: neeed to be solved in a time span from a few seconds to a few miutes
control probmes: need to be solved repetitively and quickly, in a time span from a few milliseconds to a few seconds (usually optimality is traded in for speed gains)

Basic ingredients of optimisation problems

Optimisation refers to the practice of finding the ‘best’ solution to a given problem, where ‘best’ usually means satisfactory or acceptable, possibly subject to a given set of constraints. The solutions can be classified into feasible, optimal, and near-optimal:

feasible are solution that satisfy all the given constraints
optimal are both feasible and provide the best objective function value
near-optimal are feasible solutions that provide a superior objective function value but are not necessarily the best

Assuming we have a minimisation problem, where the goal is to find the values of adecision variable that minimise a certain objective function, a search space may combine multiple global minima, strong local minima, and weak local minima:

Feasible solutions fall within the constraints of the problem.

a global optimum is the best of a set of candidate solutions. Mathematically, if f(x) is the objective function, a point x* is the global min if for all x in the domain of f, f(x*) <= f(x)
a strong local min is a point where the function’s value is <= the values of the function in a neighbourhood around that point but is higher than the global min. Mathematically, a point x* is a strong local min if there is a neighbourhood N of x* such that f(x) < f(x) for all x in N with x =/= x
a weak local min is a point where the function’s value is <= to the function’s values at neighbouring points, but there are sequences of points convering to this point for which the function’s values strictly decrease. Mathematically, a point x* is a weak local min if there is a neighbourhood N of x* such that f(x*) <= f(x) for all x in N

Optimization problems can generally be stated as follows. Find X which optimizes f, subject to a possible set of equality and inequality constraints:

There are 3 main components of optimisation problems:

decision variables: a set of unknowns or variables that affect the objective function’s value. These are the variables that define the possible solutions to an optimization problem. If X represents the unknowns, also referred to as the independent variables, then f(X) quantifies the quality of the candidate solution or feasible solution.
objective functions (also known as criterion, merit function, utility function, cost function): stands for the quantity to be optimised. Simple math operations like addition, substraction, multiplication, and division do not change the value of the optimal point. Without a loss of generality, optimization can be interpreted as the minimization of a value, since the maximization of a primal function f(x) can be just the minimization of a dual problem generated after applying mathematical operations on f(x). This means that if the primal function is a minimization problem, then the dual problem is a maximization problem (and vice versa). According to this duality aspect of optimization problems, a solution x*, which is the minimum for the primal minimization problem, is also, at the same time, the maximum for the dual maximization problem

If we have a single objective function to be optimised, the problem is called a mono-objective optimisation problem. The opposite is multi-objective optimisation.

constraints

Constrained optimisation problems have a set of equality and/or inequality constraints that restrict the values assigned to the decision variables. In addition, most problems have a set of boundary constraints, which define the domain of values for each variable. Also, constraints can be hard (must be satisfied) or soft (desirable to satisfy). Soft constraints can be modelled by incorporating a reward/penalty function as part of the obj. function. The function can reward solutions that satisfy the soft constraints and penalise if not.

Soft constrains can also be used to make the search algorithm more adaptive. For example, the severity of the penalty can be dynamically changed as the algorithm progresses, imposing less strict penalties at first to encourage exploration, but imposing more severe penalties near the end to generate a result largely bound by the constraint.

Well-structured problems (WSP) vs ill-structured problems (ISP)

We can classify optimization problems based on their structure and the procedure that exists (or doesn’t exist) for solving them.

WSP

There are 6 key characteristics of WSPs. These include the presence of a clear criterion for testing proposed solutions, the existence of a problem space capable of representing the initial problem state and potential solutions, and the representation of attainable and considerable state changes within the problem space. Moreover, any knowledge acquired by the problem solver can be represented within these spaces, and if the problem involves interacting with the external world, the state changes reflect the laws governing the real world.

ISP

These are complex discrete or continuous problems without algorithmic solutions or general problem solvers. ISPs are characterised by one or more of the following characteristics:

a problem space with different views of the problems, unclear goals, multimodality, and a dynamic nature
a lack of exact mathematical models or a lack of well-proven algorithmic solutions
solutions that are contradictory, consequences that are difficult to predict, and risk that is difficult or impossible to calculate, resulting in a lack of clear evaluation criteria
considerable data imperfection in terms of uncertainty, partial observability, vagueness, incomplete information, ambiguity, or unpredictability that makes monitoring the execution of the solutions difficult and sometimes impossible
computational intractability

WSP, but ISP in practice

The traveling salesman problem (TSP) is an example of a problem that may be well-structured in principle, but in practice becomes ill-structured. This is because of the impractical amount of computational power required to solve the problem in real time.

The book largely focuses on ISPs, and on WSPs that are ISPs in practice since WSPs tend to have a well-known solving algorithms that often provide trivial, step-by-step procedures, and most problems in the real world are ISPs.

Most of the algorithms explored in the book are derivative-free and stochastic; they use randomness in their parameters and decision processes. These algorithms are often well suited to solving ISPs, as the randomness of their initial states and operators allows the algorithms to escape local minima and find optimal or near-optimal solutions. In contrast, deterministic algorithms use well-defined and procedural paths to reach solutions and generally are not well suited for ISPs, as they either cannot work in unknown search spaces or are unable to return solutions in a reasonable amount of time. Moreover, most of the algorithms covered in this book are black-box solvers that deal with the optimization problem as a black box. This black box provides, for certain decision variable values, the corresponding values of the objective functions and constraint functions. Importantly, this approach eliminates the need to consider various properties of the objective and constraint functions, such as nonlinearity, differentiability, nonconvexity, monotonicity, discontinuities, or even stochastic noise.

Search algorithms and the search dilemma

The feasible search space, we may find a few reasonably good neighboring solutions, and the question is whether we should exploit this region or keep exploring, looking for better solutions in other regions of the feasible search space.

Exploration (or diversification) is the process of investigating new regions in the feasible search space with the hope of finding other promising solutions. On the other hand, exploitation (or intensification) is the process of directing the search agent to focus on an attractive region of the search space where good solutions have already been found.

Chapter 2 - A deeper look at search and optimization

Classifying optimisation problems

The process of optimization involves selecting decision variables from a given feasible search space in such a way as to optimize (minimize or maximize) a given objective function or, in some cases, multiple objective functions.

Optimization problems are characterized by three main components: decision variables or design vectors, objective functions or criteria to be optimized, and a set of hard and soft constraints to be satisfied. The nature of these three components, the permissible time allowed for solving the problem, and the expected quality of the solutions lead to different types of optimization problems

Number and type of decision variables

Based on the number of decision variables, optimisation problems can be broadly grouped into univariate and multivariate problems.

Problem classification also varies according to the type of decision variables. A continuous problem involves continuous-valued variables, where xj ∈ R. In contrast, if xj ∈ Z, the problem is an integer or discrete optimization problem. A mixed-integer problem has both continuous-valued and integer-valued variables. For example, optimizing elevator speed and acceleration (continuous variables) and the sequence of picking up passengers (a discrete variable) is a mixed-integer problem. Problems where the solutions are sets, combinations, or permutations of integer-valued variables are referred to as combinatorial optimization problems.

Types of problems:

A complexity class P represents all decision problems that can be solved in polynomial time by deterministic algorithms (i.e., algorithms that do not guess at a solution). The NP or nondeterministic polynomial problems are those whose solutions are hard to find but easy to verify and are solved by a nondeterministic algorithm in polynomial time. NP-complete problems are those that are both NP-hard and verifiable in polynomial time. Finally, a problem is NP-hard if it is at least as hard as the hardest problem in NP-complete. NP-hard problems are usually solved by approximation or heuristic solvers, as it is hard to find efficient exact algorithms to solve such problems.

Landscape and number of objective functions

An obj. function’s landscape represents the distribution of the function’s values in the feasible search space. In this landscape, we will fund the optimal solution or the global minima in the lowest valley, asuuming we are dealing with a minimisation problem, or at the highest peak in the case of a maximisation problem.

According to the landscape, if there is only one clear global optimal solution, the problem is unimodal (e.g. convex and concave functions). In a multimodal problem, more than one optimum exists.

The obj. function is called deceptive when the global min lies in a very narrow valley and there is also a strong local minimum with a wide basin of attraction, such that the value of this obj. function is close to the value of an obj. function at the global minimum.

As mentioned before, there are mono- and multi-objective functions depending on the # of obj. functions. Also, if a problem does not have an explicit obj. function, it’s called a constraint-satisfaction problem (CSP). The goal is such cases is to just find a solution that satisfies a given set of constraints.

Constraints

Constrained problems have hard or soft constraints for equality, inequality, or both. Hard constraints must be satisfied, while soft constraints are nice to satisfy (but are not mandatory). If there are no constraints to be considered, aside from the boundary constraints, the problem is an unconstrained optimization problem.

If we have a bivariate mono-objective constrained optimisation problem. we can convert it to an unconstrained optimisation using the Lagrange multiplier. The idea is to convert the constrained optimization problem defined by the objective function f(x,y) with an equality constraint g(x,y) into an unconstrained optimization problem using the Lagrangian function L(x,y,λ) = f(x,y) + λg(x,y). This function combines an objective function and constraints, enabling constrained optimization problems to be formulated as unconstrained problems through the use of Lagrange multipliers. To do so, we take the partial derivatives of the objective functions and the constraints, with respect to the decision variables x and y, to form the unconstrained optimization equations to be used by the SymPy solver

Linearity of objective functions and constraints

If all the obj. functions and associated constraint conditions are linear, the optimisation problem is categorised as a linear optimisation problem or linear programming problem (LPP or LP), where the goal is to find the optimal value of a linear function s.t. linear constraints. Blending problems are a typical application of mixed integer linear programming (MILP), where a number of ingredients are to be blended or mixed to obtain a product with certain characteristics or properties.

If one of the objective functions, or at least one of the constraints, is nonlinear, the problem is considered a nonlinear optimization problem or nonlinear programming problem (NLP), and it’s harder to solve than a linear problem.

A special case of NLP, when the objective function is quadratic, is called quadratic programming (QP).

Expected quality and permissible time for the solution

Optimization problems can also be categorized according to the expected quality of the solutions and the time allowed to find the solutions

Classifying search and optimisation algorithms

When searching, we try to examine different states to find a path from the start (initial) state to the goal state. Often, an optimization algorithm searches for an optimum solution by iteratively transforming a current state or a candidate solution into a new, hopefully better, solution. Search algorithms can be classified based on the way the search space is explored:

local search: uses only local information about the search space surrounding the current solution to produce new solutions. Since only local info is used, local search algorithms (also known as local optimisers) locate local optima (which may or may not be the global optima)
global search: uses more info about the search space to local a global optima

Another classification distinguishes between deterministic and stochastic algorithms:

deterministic algorithms: they follow a rigorous procedure in their path, and both the values of their design variables and their functions are repeatable. From the same starting point, they will follow the same path, whether you run the program today or tomorrow. Examples include, but are not limited to, graphical methods, gradient and Hessian-based methods, penalty methods, gradient projection methods, and graph search methods. Graph search methods can be further subdivided into blind search methods (e.g. depth-first, breadth-first, or Dijkstra) and informed search methods (e.g. hill climbing, beam search, best-first, A*, or contraction hierarchies).
stochastic algorithms: explicitly use randomness in their params or decision-making process or both. or example, genetic algorithms use some random or pseudo-random numbers, resulting in individual paths that are not exactly repeatable. With stochastic algorithms, the time taken to obtain an optimal solution cannot be accurately estimated. Solutions do not always get better, and stochastic algorithms sometimes miss the opportunity to find optimal solutions. This behavior can be advantageous, however, because it can prevent them from becoming trapped in local optima. Examples of stochastic algorithms include tabu search, simulated annealing, genetic algorithms, differential evolution algorithms, particle swarm optimization, ant colony optimization, artificial bee colony, firefly algorithm, etc. Most statistical machine learning algorithms are stochastic because they make use of randomness during the learning stage and they make predictions during the inference stage with a certain level of uncertainty. Moreover, some machine learning models are, like people, unpredictable. Models trained using human behavior-based data as independent variables are more likely to be unpredictable than those trained using independent variables that strictly follow physical laws. Due to the uncertainty associated with machine learning predictions, machine learning–based algorithms used to solve optimization problems can be considered stochastic methods

Heuristics and metaheuristics

Heuristics (also known as mental shortcuts or rules of thumb) are solution strategies, seeking methods, or rules that can facilitate finding acceptable (optimal or near-optimal) solutions to a complex problem in a practical time. Despite seeking near-optimal solutions at a reasonable computational cost, they cannot guarantee either feasibility or degree of optimality.

Metaheuristic algorithms are often global optimizers that can be applied to different linear and nonlinear optimization problems with relatively few modifications for specific problems. These algorithms are often robust and can handle different problem sizes, problem instances, and random variables.

The goal of metaheuristics is to efficiently explore the search space in order to find optimal or near-optimal solutions. Metaheuristics may incorporate mechanisms to achieve a trade-off between exploration (diversification) and exploitation (intensification) of the search space to avoid getting trapped in confined areas of the search space while also finding optimal or near-optimal solutions in a reasonable amount of time.

Different heuristic search strategies can be used to generate candidate solutions. These strategies include, but are not limited to, search by repeated solution construction (e.g. graph search and ant colony optimization), search by repeated solution modification (e.g. tabu search, simulated annealing, genetic algorithm, and particle swarm optimization), and search by repeated solution recombination (e.g. genetic algorithm and differential evolution).

Nature-inspired algorithms

These algorithms mimic natural phenomena to solve complex, dynamic, and multi-objective problems. These algorithms draw inspiration from molecular dynamics, biological evolution, ethology, and neural networks. Key types include:

simulated annealing (from thermal processes)
evolutionary computing (e.g. genetic algorithms, evolutionary programming)
swarm Intelligence (e.g. particle swarm, ant colony optimization)
neural networks (mimicking biological neurons)

Other algorithms include bacterial foraging, water cycle, and quantum-inspired methods. The book is organized into five parts, covering various algorithms such as graph search, optimization, and machine learning models.

Credit Risk Modelling in Python

This course on udemy was recommended to me once I finished the ML for financial risk management book (thank you cookies 😆) and it seems its the most popular on the topic on udemy. I did many udemy courses back in 2020 when I was learning full-stack dev and its a bit nostalgic going back now.

The course is very practical and according to the teach will show how to implement some models that follow the Basel regulations:

It is common for PD to be done using logistic regression, and LGD and EAD using beta regressions

I will share little bits, maybe the code parts, because I am not sure how much I can share from udemy’s video lectures.

I will learn to build models like the below ones following regulators’ guidelines and how to maintain and deploy such models

Under Basel II, companies can calculate the components of credit risk as such:

	Standardised Approach	Foundation-IRB	Advanced-IRB
PD (Probability of Default)	Externally provided	Internally estimated	Internally estimated
LGD (Loss Given Default)	Externally provided	Externally provided	Internally estimated
EAD (Exposure at Default)	Externally provided	Externally provided	Internally estimated

The SA approach is very conservative as it comes from regulators. As such, its capital requirements are very high. For that reasons banks may start with the SA, but then move onto F-IRB and A-IRB.

IRB approachs:

allow banks to establish their own credit ratings
precise calculations about the held capital for each individual exposure
allows banks to better allocate resources to cover losses (and be more profitable)

There are application and behaviour models for loans

The former is used to estimate the credit rating at the moment of application, the latter are used for calculated PD and EL once a loan is granted, and can be used to decide whether additional loans should be granted to existing customers

If a credit card holder applies for a mortgage, the bank uses a behavioral model to estimate their credit rating by considering both application and behavioral characteristics. For ease of use by non-statistical users, like credit agents, these models are presented as scorecards, which are simplified representations of probability of default models. Scorecards are based on model coefficients and can either be application or behavioral scorecards, depending on the input characteristics. A feature of the PD model (according to the teacher) is that its best for all its features to be categorical so that created scorecard is based on categories and can be easily used.

This was section 1:

Section 2 was setting up python ~

Section 3 and 4 were doing some basic preprocessing on the dataset (Lending Club data from 2007 to 2014 from Kaggle) and this part is aimed at python and pandas beginners as it goes into the different cleaning/preprocessing functions used.

Section 5 is the longest one - Data prep for the PD model