(Day 288) New book + scikit-learn's inspection module

Ivan Ivanov · October 15, 2024

traditional-machine-learning theory applying-knowledge

Hello :) Today is Day 288!

A quick summary of today:

started reading Financial Data Engineering on O’Reilly
explored PDP and ICE plots (through scikit-learn) on stream

Financial Data Engineering Chapter 1 - Foundations

what is finance?

what is data engineering? (a proper answer can be found in Joe Reis and Matt Housley’s book haha)
what is financial data engineering?

The book’s definition:

Financial data engineering is the domain-driven practice of designing, implementing, and maintaining data infrastructure to enable the collection, transformation, storage, consumption, monitoring, and management of financial data coming from mixed sources, with different frequencies, structures, delivery mechanisms, formats, identifiers, and entities, while following secure, compliant, and reliable standards.

Why financial data engineering?

Volume, Variety, and Velocity of Financial Data

Volume

Big data in finance is defined by its large size. Data volumes have increased due to socio-technological changes and advancements in digital technology, such as high-frequency trading and widespread adoption of mobile payments. This increase presents opportunities like better fraud detection and complex pattern recognition, but also technical challenges in data storage, querying, and aggregation.

Velocity

Data velocity refers to the speed of data generation and ingestion. Financial markets, high-frequency trading, and social media have increased the velocity of data. This creates opportunities for quicker reactions and insights but introduces challenges in building systems capable of handling real-time data processing with high accuracy.

Variety

Financial data comes in various types—structured, semi-structured, and unstructured. Recent years have seen growth in diverse data formats, such as regulatory filings, news, and social media posts. While this variety improves prediction and analytics, it also requires advanced infrastructure to manage, clean, and integrate different data types efficiently.

Finance-Specific Data Requirements and Problems

there is a lack of standardisation in some key areas:
- identification system for financial data
- classification system for financial assets and sectors
- financial information exchange
lack of established data standards for financial transaction processing
dispersed and diverse sources of financial data
adoption of multiple data formats by companies, data vendors, providers, and regulators
complexity in matching and identifying entities within financial datasets
lack of reliable methods to define, store, and manage financia lreference data
lack of relevant data for understanding and managing various financial problems due to poor data collection processes
the constant need to adapt data and tech infrastructure to meet new market and regulatory demands
the constant need to record, store, and share financial data for various regulatory and market purposes
absence of standardised practices for cleaning and ensuring the quality of financial data
difficulty in aggregating data across various silos and divisions within financial institutions
creating consolidated tapes, which integrate market data from multiple sources including trade and quote information across various venues, continues to pose technological and market challenges
balancing innovation and competitiveness with regulatory compliance
persisting concerns regarding security, privacy, and performance in cloud migration strategies
continued reliance on legacy technological systems due to organisational innertia and risk aversion

Financial ML

FinGPT
BloombergGPT
financial machine learning has proven to be very successful and is likely to be a major factor in shaping the future of financial markets, but it shouldn’t be ignored that it presents major challenges that need to be taken into consideration (like the problem of false discoveries)

FinTech

Regulatory requirements and compliance

To avoid costly financial crises, the financial sector has been subjected to a large number of regulations, both national and international.

A significant part of financial regulatory requirements concerns the way banks should collect, store, aggregate, and report data.

To meet regulatory requirements, financial institutions must establish specialized financial data engineering and management teams to develop and maintain a reliable data infrastructure. This infrastructure should efficiently capture, process, and consolidate relevant data and metadata from various sources, while upholding stringent security standards and ensuring both operational and financial resilience. It must also provide risk and compliance officers with rapid, accurate access to the data necessary for proving regulatory compliance. Additionally, financial data engineers are responsible for establishing and upholding a financial data governance framework that ensures data quality and security, fostering trust among management, stakeholders, and regulators.

Responsibilities and activities of a financial data engineer

The role of a financial data engineer varies based on the job, the hiring institution, and the firm’s data maturity. Data maturity reflects an organization’s progress in managing and utilizing its data and is typically categorized into three stages:

Starting with Data

In the early stages of data maturity, financial institutions may initiate digital transformation efforts. Data engineers handle broad responsibilities, including data engineering, software development, analytics, and infrastructure. The focus is on rapid feature expansion and speed over adherence to best practices.

Scaling with Data

At this stage, the institution evaluates its processes to identify bottlenecks and scalability needs. Financial data engineers work on improving the scalability, security, reliability, and quality of the data infrastructure by adopting best practices, such as governance, security standards, DevOps, and system scalability

Leading with Data

Once an institution becomes data-driven, its operations are fully automated, and scaling is seamless. Engineers specialize in areas like site reliability, platform engineering, MLOps, and FinOps. There is a continued focus on optimization and new integrations within the well-established data processes and infrastructure

Skills of a financial data engineer

financial domain knowledge (fin. instruments, markets, data generation mechanisms, company reports, financial variables, financial theory terms, compliance and privacy concepts, regulation and data protection laws)
technical data engineering skills (databases, cloud, data workflow and frameworks, infrastructure, programming, analytical skills)
soft and business skills (explaining complex concepts, identifying data requirements, staying updated on financial technology trends, establish data access policies, analyze high-value data needs, provide guidance on leveraging financial data for analysis and ML)

Chapter 2 - Financial data ecosystem

Sources of financial data

The sources may differ depending on:

accessibility: whether the data is open access, commercially licensed, or proprietary
usability: whether the data source is easy to use and extract data from, and whether the data is provided in a raw format or in a clean structure that is ready to use
coverage: whether the data source offers complete and wide coverage of financial data
reliability: whether a source is trusted, secure, timely, and durable
quality: whether the data provided by the sources is of high quality
documentation: whether additional info is available on the data structures, fields, and content of the data source

There are difference sources, but below 6 are identified.

Public financial data

Public financial data includes various types of freely accessible information provided through regulatory disclosures, institutional databases, public research, and free stock market APIs. Regulatory disclosures, such as those required by the SEC, include filings like 10-Ks, 10-Qs, and 8-Ks, which provide essential financial performance details of companies. The SEC’s EDGAR database is a primary source for these filings, handling thousands of documents daily. Additionally, international organizations such as the World Bank, European Central Bank, and the Federal Reserve offer vast collections of economic and financial data. Public research datasets, like the Ken French Data Library, offer specialized financial data for academic purposes. Moreover, free stock market APIs from platforms like Yahoo Finance and Google Finance provide access to real-time data such as stock prices, news, and financial reports, often supporting programmatic access through APIs. The main challenges with public financial data include dealing with raw, unstructured formats and incomplete disclosures designed to balance transparency with competitive protection.

Security exchanges

Market exchanges like LSE, NYSE gather and manage diverse datasets on the transactions they handle and provide up-to-date data crucial for trading and investment purposes.

Commercial data vendors, providers, and distributors

Commercial companies specialising in financial data and software providers collect and curate data from a large number of sources, including regulatory filings, news agencies, surveys, company websites, brokers, banks, rating agencies, company reports, bilateral agreements with companies, exchange venues, and many more.

Vendors like Morningstar, Bloomberg, and FactSet offer standardized data, comprehensive documentation, and customer support, with options for customized solutions. Their services vary in data coverage, delivery mechanisms, and pricing, enabling financial institutions to select based on specific needs such as data frequency, quality, or budget. Key considerations when choosing a vendor include the required data types, delivery formats, data update frequency, and technical limitations.

Survey data

Survey data, collected through questionnaires, provides valuable insights by enhancing existing datasets. For example, banks use surveys to assess clients’ financial knowledge and risk levels to personalize investment plans. Organizations like the Institute for Supply Management (ISM) use surveys to create key economic indicators like the Purchasing Managers’ Index (PMI). While survey data is flexible and customizable, challenges include respondent bias, incomplete responses, and potential framing bias, making ethical survey design and bias mitigation crucial for reliable outcomes.

Alternative data

These sources are not inherently embedded in the financial system, unlike trading venues, payment systems, and commercial banking. Examples are:

satellites (i.e. collecting shipping images)
social media (tweets)
articles
blog posts
news channels
weather observers
online clicking patterns and browsing history
shipping and pipeline trackers
emails and chats
product and service reviews
security, social, and environmental scores and ratings

Confidential and proprietary data

Confidential and proprietary data in financial institutions includes three main types:

contextual, nontransactional data: this includes information about clients, employees, suppliers, products, pricing, and branch locations
transactional data: generated from activities such as payments, deposits, credit card transactions, and trading
analytical data: created by internal research teams and analysts, covering areas like sales forecasting, customer segmentation, credit scoring, and market analysis

Additionally, financial institutions must report confidential data (i.e. trade details, large transactions, suspicious activity) to regulatory bodies for compliance with AML, KYC, tax, and risk management requirements.

Structures of financial data

Time series data

The time series is the most common data structure in financial markets, mainly due to the dynamic and transactional nature of financial activities that happen over time. Examples of financial activities that generate time-series data include trading, pricing, payment, investment, estimation, valuation, risk measures, volatility figures, corporate events, performance and accounting, and many more.

Cross-sectional data

A cross-section refers to a set of records for one or more variables observed among multiple entities at a specific point in time. In cross-sectional data, the time dimension is not relevant; the focus is instead on the variables themselves. The entities in a cross-section typically share similar characteristics, such as companies within the same industry, investments aligned with a particular strategy, or fund managers.

Financial cross-sectional data arises from the interaction of various participants in financial markets, including financial institutions, brokers, investors, consumers, traders, and managers, as well as diverse entities like financial assets, investment strategies, and the behaviors and choices of consumers and firms.

This type of data is useful for understanding and explaining significant differences among financial entities at a specific time, such as why different stocks yield varying returns, why companies exhibit different behaviors, or why performance levels vary among fund managers. Additionally, cross-sectional analysis serves as a vital tool for identifying correlations, causal relationships, or anomalies across different entities, which can be crucial for investment decisions, policy development, and comprehending market dynamics.

Panel data

Panel data combines both time series and cross-sectional structures, representing data for a set of variables across multiple entities at different points in time.

Matrix data

The use of matrix structures is quite common in finance. The best example is perhaps portfolio optimization theory, in particular Markowitz’s mean-variance optimization (MVO). According to MVO, when constructing a financial investment portfolio, three elements should be taken into consideration:

the expected return on the assets in the portfolio
the variance (risk) of the asset returns
the covariances between the asset returns

If we have a portfolio with three assets, two matries can be constructed:

Using these matrix representations, MVO relies on matrix algebra to optimize the portfolio’s expected return for a given level of risk appetite.

Graph data

Network analysis, or network science, is essential for studying these complex interactions. Traditional time-series or cross-sectional data are insufficient for analyzing financial networks, necessitating graph data, which comprises nodes (entities) and links (relationships).

Graph data can be represented in several ways:

network visualization: a 2d diagram showing nodes and links, best for small networks
adjacency matrix: an N × N matrix indicating connections between nodes
adjacency list: an array where each node links to its neighbors
edge list: a list of edges defined as tuples of source and target nodes, possibly with attributes

Financial graph data often stems from other data sources, such as transaction histories, executive datasets, or curated datasets from financial data vendors like S&P Global. Building networks often requires additional processing, such as calculating correlations between stock prices for similarity networks.

It went in-depth on graph data and types of graphs which was a bit surprising because usually texts don’t concentrate on graph data too much.

Text data

it is the most common
have certain characteristics. For example, terms like ‘risk’, and ‘liability’ may have a negative connotation in certain contexts, but they are common in finance

Types of financial data

Fundamental data

Financial fundamental data, also called financial statements data, balance sheet data, or corporate finance data, conveys information about firms’ structures, operations, and performance.

Market data

Market data includes price and related information for financial instruments traded on market venues such as the New York Stock Exchange (NYSE) or in off-exchange OTC markets.

Transaction data

A financial transaction is a legal agreement between two or more parties to exchange financial securities, services, goods, risks, or commodities in return for money.

Generally speaking, a financial transaction can be characterized by at least five elements: transaction specifications, transaction parties, initiation date, settlement date, and settlement method.

Analytics data

This type of data is often derived from other types of data such as fundamental, market, and transaction data. This data is typically computed using simple formulas, statistical models, machine learning techniques, and financial theories

Examples:

news and market sentiment (novelty, score, relevance, impact)
financial risk measures (value at risk, option Greeks, bond duration, implied volatility)
market indexes (MSCI, Global Indexes),
ESG scores
others

Alternative data

Alternative data refers to nontraditional datasets used for financial analysis, investment, or forecasting, including sources like news, social media, satellite images, patents, and consumer behavior. Unlike conventional financial data, alternative data is often unstructured, lacking standard identifiers, and may be incomplete or biased.

The key advantage of alternative data is its novelty and diversity, which can provide a competitive edge in portfolio analysis. However, challenges include the need for significant investment in data structuring and the potential for biases that are difficult to detect.

Reference data

Alternative data encompasses nontraditional datasets used for financial analysis, investment, or forecasting, which do not originate from conventional financial contexts like trading or corporate finance. Examples include news articles, social media posts, satellite imagery, patents, Google trends, and consumer behavior insights. These datasets can be sourced from data vendors or public repositories.

The main advantage of alternative data lies in its novelty and diversity, offering financial institutions the potential for a competitive edge in portfolio analysis and decision-making. For example, satellite image data can predict financial performance by analyzing the number of cars in company parking lots.

However, working with alternative data presents challenges, such as its often unstructured format, which requires substantial investment to organize and analyze. Additionally, alternative datasets frequently lack standardized identifiers, may be incomplete or imbalanced, and can contain biases that are difficult to identify. Thus, while alternative data can enhance financial analysis, it necessitates careful handling and expertise to unlock its full potential.

Entity data

Entity data provides information about corporate entities and their characteristics, often used in conjunction with reference data for deeper insights. Key elements of entity data include company name, identifiers, year of establishment, legal form, ownership structure, sector classification, associated individuals, credit ratings, ESG scores, risk exposures, and major corporate events.

Commercial datasets, such as LSEG’s legal entity data, Moody’s Orbis database, and SwiftRef’s entity data, are available for various applications, including corporate finance analysis, risk management (ie. supplier or credit risk), and compliance efforts such as anti-money laundering (AML), know your customer (KYC), and client due diligence (CDD).

Benchmark financial datasets

Notable financial datasets:

CRSP: offers comprehensive stock data for over 32,000 securities, including price data and corporate actions
Compustat: standardized dataset for fundamentals of 80,000+ international public companies, useful for historical analysis
TAQ: high-frequency trade and quote data from major exchanges, with time precision from seconds to nanoseconds
I/B/E/S: earnings estimates database for over 60,000 companies, contributed by global analysts since 1976
IvyDB OptionMetrics: historical options data, including quotes and implied volatility for equity and index options
TRACE: bond transaction data from FINRA, providing real-time information on OTC transactions
Orbis: global database with information on 450 million companies, focusing on financial strength and ownership
SDC Platinum: source for corporate finance and market event data, including mergers and acquisitions
S&P DJI: provider of historical index data for various markets, including detailed information on popular indexes

Chapter 3 - Financial identification systems

Financial identifiers are unique character sequences (i.e. alphanumeric codes) that accurately identify financial entities (like companies, transactions, and assets) across various datasets and systems.

A financial identification system establishes principles for generating, interpreting, storing, and maintaining these identifiers, often referred to as symbology.

Importance of Financial Identifiers

transaction identification: identifiers are crucial for distinguishing financial instruments and entities involved in market transactions, facilitating efficient transactions and communication among participants
reporting compliance: regulators require accurate reporting from financial institutions, which necessitates the aggregation and consolidation of data through identifiers. Regulatory frameworks like MiFID II mandate the use of specific identifiers in reporting
exchange listing and trading: identifiers are essential for listing and trading securities, enabling easy tracking and analysis of financial instruments
data analysis: in financial analysis, identifiers help in cross-sectional studies by ensuring accurate data sampling, quality checks, and entity matching across datasets

Desired properties of financial identifiers

uniqueness: financial identification system must uniquely identify financial entities and never assign the same identifier to distinct entities
globality: a financial identification system must be able to accommodate the expanding ecosystem of financial activities and entities
scalability: a scalable financial identification system does not exhaust its pool of assignable identifiers
completeness: if an identification system is created to identify a set of financial entities, then each entity in the set must have an identifier
accessibility: a financial identifier data should be accessible, meaning it should not be restricted by license fees or usage limits, or monopolized by market players
timeliness: refers to the identifier’s ability to process and generate identifiers quickly and efficiently
authenticity: an authentic financial identification system can confirm whether a specific identifier was generated by it
granularity: a financial identification system is granular if it can scale to provide different levels of details that reflect market hierarchies
permanence: a reliable financial identification system must ensure that identifiers and their associated data are permanent and durable
immutability: a robust financial identification system must ensure the immutability of its identifiers over time
security

Financial identification systems landscape

The current landscape of financial identification systems is extensive, comprising open source and proprietary systems designed with different properties and for various purposes. Several standardization efforts have been initiated, yet a universally accepted standard has not been established.

Financial identification systems in use globally:

Chapter 4 - Financial entity system (FES)

Vast majority of data in the world exsits in unstructured formats. To this end, many financial institutions develop systems to extract, recognize, identify, and match financial entities within financial datasets.

The book defines a financial entity as:

A financial entity is a real-world object that may be recognized, identified, referenced, or mentioned as an essential part of financial market operations, activities, reports, events, or news. A financial entity may be human or not. It can be tangible (i.e. an ATM machine), intangible (i.e. common stock), fungible (i.e. one-dollar bills), or infungible (i.e. loans). A financial entity system is an organized set of technologies, procedures, and methods for extracting, identifying, linking, storing, and retrieving financial entities and related information from different sources of financial data and content.

Financial named entity recognition

NER is the task of detecting and recognizing named entities in text, such as persons, companies, locations, events, symbols, time, etc.

Example outcome of NER in finance:

Making a knowledge base:

This chapter went in depth into different approaches for developing NER and it seems this is a good reference to learn about NER in general.

Chapter 5 - Financial data governance

As financial markets grow, the techniques and applications for collecting, storing, and utilizing financial data are also evolving. This has raised concerns among both the financial sector and regulatory bodies, which are advocating for robust data controls, quality assurance, privacy regulations, and enhanced security measures. Consequently, data governance frameworks have surfaced as an effective strategy for establishing and enforcing rules and principles that guide data practices within financial institutions.

Data governance is critical to securing financial data, ensuring regulatory compliance, and fostering trust among stakeholders. By implementing robust DG, financial institutions can safeguard sensitive information, adhere to legal requirements, and maintain the integrity of their financial operations.

DG definition by Google:

Data governance is everything you do to ensure data is secure, private, accurate, available, and usable. It includes the actions people must take, the processes they must follow, and the technology that supports them throughout the data life cycle.

Financial data governance defined by the book:

Financial data governance is a technical and cultural framework that establishes a set of rules, roles, practices, controls, and implementation guidelines to ensure the quality, integrity, security, and privacy of financial data in compliance with both general and financial domain–specific internal and external policies, standards, requirements, and regulations.

Implementing a financial data governance framework necessitates significant investment, involving the establishment and integration of the framework within a financial institution’s data infrastructure and training employees to adhere to governance principles. The value of such a framework is especially pronounced in financial organizations due to its impact on performance and risk management.

Effective data governance enhances data quality, a crucial factor in financial operations and decision-making. Poor data quality leads to operational inefficiencies and hinders institutions’ abilities to gain insights, make informed decisions, and communicate accurately. Additionally, robust data governance alleviates the burden of constant data quality checks for employees and boosts confidence among developers and business teams regarding compliance and quality.

Financial institutions face various risks, including cyberattacks, data breaches, biases in applications, erratic data affecting model outcomes, and privacy risks associated with third-party data sharing. Regulatory frameworks like the Sarbanes–Oxley Act, Bank Secrecy Act, and GDPR have emerged in response to these risks, compelling institutions to implement strong data governance frameworks to ensure compliance.

Given the complexity of data governance, this chapter will present a practical framework focused on three key areas common to all financial institutions: data quality, data integrity, and data security and privacy.

Data quality

Data quality in financial institutions refers to how well a dataset meets its intended use in various applications. Since financial data plays a crucial role in decision-making and product development, maintaining high-quality data is essential. Issues with data quality, known as “data downtime,” can disrupt operations, damage customer trust, hinder research, and negatively affect decisions.

To ensure data quality, financial institutions need a Data Quality Framework (DQF), which varies by institution but generally involves assessing key data quality dimensions (DQDs). These dimensions, such as errors, outliers, biases, granularity, and timeliness, help measure the overall quality of financial data. While the specific DQDs may differ depending on the institution’s needs, they form the foundation of a robust data quality strategy.

dimension 1: data errors: records that have been recorded erroneously, and therefore reflect invalid or incorrect values (i.e. negative stock price, wrong decimal place)
dimension 2: data outliers
dimension 3: data biases
dimension 4: data granularity: the level of detail within a dataset
dimension 5: data duplicates
dimension 6: data availability and completeness
dimension 7: data timeliness: is the data available and accessible at the time it is expected to be; does the data reflect the most recent observations
dimension 8: data constraints: the degree to which data conforms to pre-defined technical and business rules and limitations
dimension 9: data relevance: the degree to which available data aligns with the specific problem or purpose it aims to address

Data integrity

Data integrity refers to the principles and practices that ensure data remains consistent, accurate, traceable, and reliable throughout its lifecycle. As data undergoes various transformations, movements, and adjustments, maintaining its integrity is crucial. In financial institutions, ensuring data integrity involves adhering to key principles such as standards, backups, archiving, aggregation, data lineage, catalogs, ownership, contracts, and reconciliation. These principles help guarantee that data is trustworthy, usable, and compliant with business and regulatory requirements.

principle 1: data standards
principle 2: data backups (serve in data loss accidents)
principle 3: data archiving (server data retention purposes)
principle 4: data aggregation (Basel’s 13 principles)
principle 5: data lineage
principle 6: data catalogs (set of metadata combined with search and data management tools that allow a data producer or consumer to find and document a data asset within a financial institution)
principle 7: data ownership (data ownership involves designating an individual or team with the task of overseeing the collection, cleansing, maintenance, sharing, and management of a particular data asset within a financial institution)
principle 8: data contracts (API-like agreements between Software Engineers who own services and Data Consumers that understand how the business works in order to generate well-modeled, high-quality, trusted, real-time data)
principle 9: data reconciliation (the process of aligning different sets of records to ensure consistency and accuracy in financial transactions and balances across various systems)

Data security and privacy

Data privacy

Data privacy involves the governance of data to ensure it is collected, stored, processed, and shared according to legal regulations and the consent of the individual. It protects personally identifiable information (PII), such as names, social security numbers, and financial details. Laws like the EU’s GDPR give individuals rights over their data, including access, deletion, and portability. Other regulations, such as CCPA in the US and PIPEDA in Canada, provide similar protections. In system design, balancing data privacy with data sharing is crucial, as excessive sharing can lead to breaches, while overly restricting it may limit innovation. A privacy culture within organizations, backed by management, is key to ensuring data privacy.

Data anonymisation

It is a practice that enhances data security and privacy by transforming data to obscure its identifiable elements, making it unlinkable to specific individuals. Proper anonymization renders data unusable if breached. In financial institutions, it can be adopted as a best practice and is sometimes legally required. For instance, under GDPR, anonymized data is exempt from certain privacy restrictions.

Anonymization strategies

identifiability spectrum: data ranges from fully identifiable (i.e. direct identifiers like names or social security numbers) to fully anonymized. Indirect identifiers (quasi-identifiers) can still reveal identity when combined with other data. The goal is to decide how much identifiability is acceptable based on reidentification risks
analytical integrity: anonymized data must remain useful for analysis, preserving key relationships between variables without distorting the dataset’s analytical value
reversibility: some anonymization processes may allow reidentification if necessary, such as when internal use data needs to be re-accessed (i.e. credit card numbers)
simplicity: methods range in complexity. Simple approaches are easier to implement and reverse, while complex methods offer more robust protection but at higher costs
static vs. dynamic: static anonymization alters data once for storage, while dynamic anonymization applies on-the-fly for query results, impacting speed and performance
deterministic vs. nondeterministic: deterministic methods yield the same anonymized output consistently, while nondeterministic methods may vary each time they are applied

Techniques for data anonymization

generalization: replaces precise data (i.e revenues) with ranges
suppression: omits sensitive fields entirely
distortion: alters numerical data by adding noise
swapping: shuffles data between records to obscure patterns
masking: obfuscates sensitive data with modified characters

Data encryption

Data encryption is a key practice in information security, converting plain text into unreadable ciphertext using cryptographic algorithms. It ensures that unauthorized users cannot access the data without the decryption key. Data can be encrypted in three states: at rest, in transit, and in use. Encryption methods include symmetric encryption, which uses a single key (i.e. AES), and asymmetric encryption, which uses a public-private key pair (i.e. RSA). While symmetric encryption is faster, asymmetric encryption is more secure but computationally intensive. In the financial sector, encryption is essential for security and compliance, as exemplified by standards like ISO 9564 for PIN management.

Access control

Access control is a data security practice that manages who can access data and resources, defining access privileges and a strategy for handling anomalies. It consists of two main components: authentication, which verifies the identity of the user, and authorization, which determines what resources the authenticated user can access and their level of privilege.

Best practices include the principle of least privileges, granting users only the minimum permissions necessary, and multifactor authentication for enhanced security. Additionally, access control audit logs monitor user activities to detect unauthorized access or anomalies. This practice is essential in financial institutions to prevent data breaches and misuse.

Stream

Today I discovered something new and cool - the inspection module in sklearn, and learned about partial dependence plots (PDP) and individual conditional expectation (ICE) plots which can provide some level of interpretability to models. I explored it on stream, but have some more to explore on tomorrow’s stream.

That is all for today!

See you tomorrow :)