(Day 291) Starting Machine Learning for Financial Risk Management with Python + some math for ML

Ivan Ivanov · October 18, 2024

math traditional-machine-learning theory

Hello :) Today is Day 291!

A quick summary of today:

finally sat down and did the basic math exercises from Before ML Vol. 2 - Calculus
started Machine Learning for Financial Risk Management with Python on O’Reilly

Basic calculus exercises

Today I finially got some time to sit down with my tablet and write down the exercises of the Before ML Vol. 2 - Calculus. It is basic math, and as the title suggests is math needed before doing ML to better understand the common underlying mechanisms. I just wanted to check the exercise book because before each exercise there is also a short description/overview of the specific concept.

Here are my notes:

Exercises P1 Exercises P2 Exercises P3 Exercises P4 Exercises P5 Exercises P6 Exercises P7 Exercises P8 Exercises P9 Exercises P10

ML for Financial Risk Management with Python

Started this book today. I initially attemped to read it back in January (at the start of my ML journey) but I felt that this was not the book I needed back then, so here I am - it was high time I read it now. And also as an ex-risk analyst this is very exciting to me

Chapter 1 - Fundamentals of Risk Management

Risk

Risk is challenging to assess due to its abstract nature and is often perceived as hazardous, whether expected or unexpected. Expected risks are typically priced, while unexpected risks can be more devastating. Financially, risk is often viewed as the potential for loss or uncertainty. Risk can be defined as any event that may hinder an organization’s objectives or the likelihood of loss. Though risk is associated with costs, the relationship is not always direct—expected risks tend to have lower costs compared to unexpected ones.

Return

Return refers to the profit gained from an investment over a specific period, representing the upside of risk. There is a trade-off between risk and return: higher assumed risk typically leads to greater realized return, making this relationship a contentious issue in finance. Markowitz (1952) provided clarity by defining risk using standard deviation, which allows for mathematical and statistical analysis in finance. The expected return is the probability-based return of interest, while portfolio variance involves weights, variance, and covariance. The portfolio expected return is calculated as a weighted average of individual returns.

Risk and Return go hand-in-hand, but the magnitute of that relationship depends on the individual stocks and the financial market conditions

Risk Management

Financial risk management is a process to deal with the uncertainties resulting from financial markets. It involves assessing the financial risks facing an organization and developing management strategies consistent with internal priorities and policies

Risk management is, therefore, not about mitigating risk at all costs. Mitigating risk may require sacrificing return, and it can be tolerable up to certain level as companies search for higher return as much as lower risk. Thus, to maximize profit while lowering the risk should be a delicate and well-defined task.

There is a general framework for risk strategies:

ignore: companies accept all risks and their consequences and prefer to do nothing
transfer: involves transferring the risks to a third party by hedging or some other way
mitigate: companies develop a strategy to mitigate risk partly because its harmful effect might be considered too much to bear and/or surpass the benefit attached to it
accept: if companies embrace the strategy of accepting the risk, they properly identify risks and acknowledge the benefit of them. In other words, when assuming certain risks arising from some activities bring value to shareholders, this strategy can be chosen

Main financial risks

The book defines the below as the main types of risk (but there are others too)

Market risk

This arises from changes in financial market factors, such as interest rate fluctuations or exchange rate changes, which can adversely affect companies, especially those involved in international trade or holding short positions. Changes in commodity prices, influenced by various fundamentals, can also threaten financial sustainability.

Credit Risk

Occurs when a counterparty fails to fulfill debt obligations, leading to potential losses for lenders. Deterioration in credit quality can also decrease the market value of securities held by an organization.

Liquidity Risk

This type of risk gained attention after the 2007–2008 financial crisis. It encompasses two dimensions: trading liquidity risk, which is the ease of executing transactions, and funding liquidity risk, which refers to a company’s ability to raise cash. Inability to convert assets to cash quickly can harm a company’s financial health.

Operational Risk

Involves challenges in managing risks associated with day-to-day operations. This includes risks from external events, internal failures, fraud, regulatory non-compliance, and insufficient training. Companies unprepared for these risks may face significant financial collapse, as historical events have shown.

Information Asymmetry in Financial Risk Management

Information asymmetry and financial risk management go hand in hand as the cost of financing and firm valuation are deeply affected by information asymmetry. That is, uncertainty in valuation of a firm’s assets might raise the borrowing cost, posing a threat to a firm’s sustainability

Adverse selection

Adverse selection is a type of asymmetric information in which one party tries to exploit its informational advantage. This arises when sellers are better informed than buyers.

Adverse selection is common in the insurance industry. Suppose the consumer utility function (a tool used to represent consumer preferences for goods and services, and it is concave for risk-averse individuals) is:

where x is income and gamma is a parameter, which takes on values between 0 and 1

The ultimate aim of this example is to decide whether or not to buy an insurance based on consumer utility.

For the sake of practice, assume that the income is US$2 and the cost of the accident is US$1.5.

Now it is time to calculate the probability of loss, pi, which is exogenously given and uniformly distributed.

As a last step, to find equilibrium, we have to define supply and demand for insurance coverage

import matplotlib.pyplot as plt
import numpy as np

# Use seaborn style for plots
plt.style.use('seaborn')

# Define a utility function
gamma = 0.4  # Define gamma for utility function

def utility(x):
    return np.exp(x ** gamma)

# Generate random probabilities and sort them
pi = np.random.uniform(0, 1, 20)
pi = np.sort(pi)

# Print the three highest probabilities of losses
print('The highest three probability of losses are {}'.format(pi[-3:]))

# Define constants
y = 2
c = 1.5
Q = 5
D = 0.01

# Supply function
def supply(Q):
    return np.mean(pi[-Q:]) * c

# Demand function
def demand(D):
    return np.sum(utility(y - D) > pi * utility(y - c) + (1 - pi) * utility(y))

# Plotting the demand and supply curves
plt.figure()

# Plot insurance demand
plt.plot([demand(i) for i in np.arange(0, 1.9, 0.02)], np.arange(0, 1.9, 0.02), 
         'r', label='insurance demand')

# Plot insurance supply
plt.plot(range(1, 21), [supply(j) for j in range(1, 21)], 
         'g', label='insurance supply')

# Labels and legend
plt.ylabel("Average Cost")
plt.xlabel("Number of People")
plt.legend()
plt.show()

both curves are downward sloping, implying that as more people demand contracts and more people are added on the contracts, the risk lowers, affecting the price of the contract

The straight line presents the insurance supply and average cost of the contracts and the other line, showing a step-wise downward slope, denotes the demand for insurance contracts. As we start analysis with the risky customers, as you add more and more people to the contract, the level of riskiness diminishes in parallel with the average cost.

Moral Hazard

Market failures also result from asymmetric information. In a moral hazard situation, one party of the contract assumes more risk than the other party. Formally, moral hazard may be defined as a situation in which the more informed party takes advantages of the private information at their disposal to the detriment of others.

Chapter 2 - Introduction to Time Series Modeling

Time series models are important for analyzing financial data, which often includes a time dimension. Common models include Moving Average (MA), Autoregressive (AR), and ARIMA. MA uses past error terms, AR relies on past values of the time series, and ARIMA combines both.

The superiority of the time series approach comes from the idea that correlations of observations in time better explain the current value. Having data with a correlated structure in time implies a violation of the famous identically and independently distributed (IID) assumption, which is at the heart of many models.

So, due to the correlation in time, the dynamics of a contemporaneous stock price can be better understood by its own historical values. How can we comprehend the dynamics of the data? This is a question that we can address by elaborating the components of time series.

Time Series Components

Time series has four components: trend, seasonality, cyclicality, and residual.

We can use statsmodels’ seasonal_decompose:

import yfinance as yf
import numpy as np
import pandas as pd
import datetime
import statsmodels.api as sm
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.seasonal import seasonal_decompose
import matplotlib.pyplot as plt

ticker = '^GSPC'
start = datetime.datetime(2015, 1, 1)
end = datetime.datetime(2021, 1, 1)

SP_prices = yf.download(ticker, start=start, end=end, interval='1mo').Close
seasonal_decompose(SP_prices, period=12).plot()
plt.show()

In the 1st row, we see the plot of raw data, and in the 2nd row, trend can be observed showing upward movement. In the 3rd row, seasonality is exhibited, and finally residual is presented showing erratic fluctuations (this includes noise and the cyclical component)

Trend

Trend indicates a general tendency of an increase or decrease during a given time period

In the above graph we can clearly see an upward trend. A line plot is not the only option for understanding trend. Rather, there are some strong tools for this. Like Autocorrelation and Partial Autocorrelation

The autocorrelation function (ACF) is a statistical tool to analyze the relationship between the current value of a time series and its lagged values.

This is the ACF plot of the S&P500

The vertical lines represent the correlation coefficients; the first line denotes the correlation of the series with its 0 lag—that is, it is the correlation with itself. The second line indicates the correlation between series value at time t - 1 and t. In light of these, we can conclude that the S&P 500 shows a serial dependence. There appears to be a strong dependence between the current value and lagged values of S&P 500 data because the correlation coefficients, represented by lines in the ACF plot, decay in a slow fashion

What are the likely causes of autocorrelation? Here are some potential causes:

the primary source of autocorrelation is “carryover,” meaning that the preceding observation has an impact on the current one
model misspecification
measurement error, which is basically the difference between observed and actual values
dropping a variable, which has an explanatory power

Partial autocorrelation function (PACF) is another method of examining the relationship between current and previous values

ACF is commonly considered as a useful tool in the MA(q) model simply because PACF does not decay fast but approaches toward 0. However, the pattern of ACF is more applicable to MA. PACF, on the other hand, works well with the AR(p) process.

PACF provides information on the correlation between the current value of a time series and its lagged values, controlling for the other correlations. PACF measures the correlation between current values of series and lagged values in a way to isolate in-between effects.

In interpreting the PACF, we focus on the spikes outside the dark region representing confidence interval. The graph exhibits some spikes at different lags, but lag 10 is outside the confidence interval. So it may be wise to select a model with 10 lags to include all the lags up to lag 10.

Seasonality

Seasonality exists if there are regular fluctuations over a given period of time

Here is a plot of energy capacity utilisation

The plot indicates periodic ups and downs over a nearly 10-year period with high-capacity utilization during the first months of every year and then going down toward the end of year, confirming that there is seasonality in energy-capacity utilization.

We can also plot an ACF with lag 30 which confirms the above observation:

Cyclicality

Refers to variations that occur over long periods but without a fixed time interval, unlike seasonality.

Residuals

Residuals in time series analysis represent the “leftover” or unexplained variation after fitting a model to the data. They are calculated as the difference between the observed values and the model’s predicted values. Residuals help assess the accuracy of the model—ideally, they should show no patterns and behave like random noise, meaning the model has captured all the underlying structure.

If residuals display patterns, it indicates that the model has missed some aspects of the data, such as trends, seasonality, or cyclicality, suggesting the need for model adjustments or refinements.

Time Series Models

Traditional time series models are univariate models, and they follow these phases:

identification: here, we explore the data using ACF and PACF, identifying patterns and conducting statistical tests
estimation: we estimate coefficients via the proper optimization technique
diagnostics: after estimation, we need to check if information criteria or ACF/PACF suggest that the model is valid. If so, we move on to the forecasting stage
forecast: this part is more about the performance of the model. In forecasting, we predict future values based on our estimation

White Noise

White noise is a time series with a mean of 0 and constant variance, where successive values are uncorrelated. It is stationary, meaning its properties don’t change over time, and its plot shows random fluctuations around the mean. However, because the values are uncorrelated, white noise is not useful for forecasting future values, as we cannot predict one value based on previous ones.

We need to identify the optimum number of lags before running the time series model. Deciding the optimal number of lags is a challenging task and the most widely used methods are ACF, PACF, and information criteria.

Information Criteria

A(Aikake)IC is introduced as an extension to the Maximum Likelihood Principle. Maximum likelihood is conventionally applied to estimate the parameters of a model once structure and dimension of the model have been formulated.

AIC can be mathematically defined as:

AIC = -2ln(Maximum Likelihood) + 2d

where d is the total number of parameters. The last term, 2d, aims at reducing the risk of overfitting. It is also called a penalty term, by which the unnecessary redundancy in the model can be filtered out.

The Bayesian information criterion (BIC) is the other information criterion used to select the best model. The penalty term in BIC is larger than that of AIC:

BIC = -2ln(Maximum Likelihood + ln(n)d

where n is the number of observations

Hurvich and Tsai (1989):

If the true model is infinite dimensional, a case which seems most realistic in practice, AIC provides an asymptotically efficient selection of a finite dimensional approximating model. If the true model is finite dimensional, however, the asymptotically efficient methods, e.g., Akaike’s FPE (Akaike 1970), AIC, and Parzen’s CAT (Parzen 1977), do not provide consistent model order selections.

Moving Average Model

MA and residuals are closely related models. MA can be considered a smoothing model, as it tends to take into account the lag values of residuals

To be consistent, all the below models will use the stock data of Microsoft and Apple.

We can see that there are significant spikes at some lags and, therefore, we’ll choose lag 9 for the short MA model and 22 for the long MA for Apple

The data is preprocessed as follows:

drop na
take 1st difference
split into train and test

MA model prediction results for Apple:

MA model prediction results for Microsoft:

Autoregressive model

The dependence structure of successive terms is the most distinctive feature of the AR model, in the sense that current value is regressed over its own lag values in this model. So we basically forecast the current value of the time series X_t by using a linear combination of its past values

The AR(p) model implies that past values up to order p have somewhat explanatory power on X_t. If the relationship has shorter memory, then it is likely to model X_t with a fewer number of lags.

We have discussed one of the main properties of time series, stationarity; the other important property is invertibility. After introducing the AR model, it is time to show the invertibility of the MA process. It is said to be invertible if it can be converted to an infinite AR model.

In the following analysis, we run the AR model to predict Apple and Microsoft stock prices. Unlike MA, partial ACF is a useful tool to find out the optimum order in the AR model. This is because, in AR, we aim to find out the relationship of a time series between two different times, say X_t and X_t-k, and to do that we need to filter out the effect of other lags in between

In the 1st plot, obtained from the first differenced Apple stock price, we see a significant spike at lag 29, and in the 2nd plot, we have a similar spike at lag 26 for Microsoft. Thus, 29 and 26 are the lags that we are going to use in modeling AR for Apple and Microsoft, respectively

AR model prediction results:

Autoregressive Integrated Moving Average Model

The ARIMA is a function of past values of a time series and white noise. ARIMA has been proposed as a generalization of AR and MA, but they do not have an integration parameter, which helps us to feed the model with the raw data. In this respect, even if we include nonstationary data, ARIMA makes it stationary by properly defining the integration parameter.

ARIMA has three parameters, namely p, d, and q. As should be familiar from previous time series models, p and q refer to the order of AR and MA, respectively. The d parameter controls for level difference. If d = 1, it amounts to first difference, and if it has a value of 0, that means that the model is ARIMA. It is possible to have a d greater than 1, but it’s not as common as having a d of 1.

Pros:

ARIMA allows us to work with raw data without considering if it is stationary
it performs well with high-frequency data
it is less sensitive to the fluctuation in the data compared to other models

Cons:

ARIMA might fail in capturing seasonality
it works better with long series and short-term (daily, hourly) data
as no automatic updating occurs in ARIMA, no structural break during the analysis period should be observed
having no adjustment in the ARIMA process leads to instability

ARIMA prediction results:

All the code from this chapter is here

Chapter 3 - Deep Learning for Time Series Modeling

This chapter was an explanation of what RNN and LSTM models are and their architecture. Then using tensorflow for predictions. The end prediction results are:

RNN prediction results

LSTM prediction results

Next in the book is Part II. Machine Learning for Market, Credit, Liquidity, and Operational Risks which should be very interesting

That is all for today!

See you tomorrow :)