Hello :) Today is Day 364!
A quick summary of today:
- read Real-World ML Systems on Kubernetes - Chapter 2
- read Software Engineering for Data Scientists - Chapter 1
- read a white paper by Google on MLOps
- streamed for the last time
Real-World ML Systems on Kubernetes - Chapter 2 - Fundamentals of Kubernetes
At its core, Kubernetes is a highly extensible orchestration system for containerized workloads.
K8s is cloud-agnostic and it’s one of the reason for its wide adoptation by businesses.
K8s architecture
Consists of 2 major components:
- control plane which comprises the internal components that K8s needs to function
- data plane where apps run
The control plane
This is where the brains of a K8s cluster live and comprises a highly available set of servers running K8s’ system processes and a database
- on the cloud, the control plane is managed by the cloud provider (maintaining, scaling, upgrading, monitoring, troubleshooting)
Components:
- api server: the
kube-api-server
is the central control point of the K8s cluster; it exposes the K8s API which is used by both the internal and external components to interact with the cluster - controller manager: runs a collection of controllers that manage the state of the cluster - it ensures the current state matches the desired state
- scheduler: decides where to run a particular workload
- cloud controller manager: manages cloud specific resources like VMs, networking and storage
- ETCD: a cluster’s state is persisted in highly-available etcd key-value store and the API server interacts with it to read/write this state
The data plane
Consists of the infrastructure that executes the workloads we deploy. Technically, we can run workloads on the same hosts that run a cluster’s control plane, but this is an uncommon practice available only in self-managed clusters.
Key components:
- nodes: VMs that run containers
- CNI: provides networking for containers
- CoreDNS: DNS server in a cluster
More in-depth:
- nodes run 3 processes: container runtime - responsible for creating, running, stopping containers (i.e. containerd, docker, CRI-O); the kubelet - acts as the agent between the K8s control plan and the container runtime; Kube-proxy - implements K8s network abstraction by maintaining network rules
- autoscaling nodes: a typical data plane is amde up of a group of nodes that provide resources to run workloads; over time nodes might run out of resources and more nodes need to be added, and this autoscaling is usually done using K8s Cluster Autoscaler
- Container Network Interface (CNI): provides network connectivity to containers that run inside a K8s cluster
- CoreDNS: handles name resolution and provides service discovery within a cluster
Interacting with a K8s cluster
The primary way is through the kubectl
CLI.
Common commands:
Command | Operation |
---|---|
kubectl get |
List all resources |
kubectl describe |
Describe a resource |
kubectl delete |
Delete a resource |
kubectl edit |
Edit a resource |
kubectl logs |
View container logs |
kubectl patch |
Update a resource |
kubectl apply -f filename.yaml |
Create or update a resource |
kubectl config view |
View kubectl configuration |
kubectl events |
Show events |
kubectl exec |
Execute a command inside a running container |
kubectl port-forward |
Forward a port on the local machine to a Pod |
kubectl run --image=<container_image> |
Create a Pod from an image |
K8s Objects
Objects are persistent entities that represent the state of the K8s cluster. When an object is created, the K8s system will constantly work to ensure that object exists, and its desired state is maintained.
Containers
Popularised by Docker, containers are bundled packages that contain all code, dependencies and configs needed to run an app
Pods
A pod contains one or more containers that run on the same node. Typically, there is one container running the main app and sidecar containers that run processes that provide helper functions
All containers within a Pod run on the same node. They also share the Linux network namespace, which enables them to intercommunicate using localhost
Containers in a Pod always run on the same node. They can intercommunicate using the loopback interface. Every Pod gets an IP address where it is accessible. Pods intercommunicate with each other using Pod’s IP address (or DNS name)
A pod has an IP address, and other pods in the cluster can connect to other pods using that IP address. The pod’s private IP address is inaccessible from outside the cluster and we cannot connect to the pod directly. However, we can use port forwarding to test an app running inside a cluster.
Port forwarding tunnels traffic through the Kubernetes API server and Kubelet. This tunnel is temporary and exists only for the duration of the kubectl port-forward command.
- resource allocation: within a pod we can define the min and max resources a container gets and the scheduler considers resource requirements when allocating a pod to a node; if no nodes meet the min resource requirements, the pod will not be scheduled onto any node and will remain in a
pending
state until a suitable node is available for scheduling - probes: K8s can detect failire in an app container and restart or stop sending it more work; probes are used to determine the health and readiness of containers in a pod and are defined per container
- init containers: all containers in a pod are started simultaneously; init containers are special containers that run before the main app container starts and their job is to handle any initialisation or config tasks that need to happen before the main app can start (i.e. if a website needs to wait for a db to start so it can connect to it)
- sidecar containers: a sidecar container can be created by setting an init container’s
restartPolicy
to Always. This means the container starts before the main application, ensuring setup tasks are completed, and then continues running for the Pod’s entire lifetime, acting as a sidecar. This combines the benefits of init containers’ startup order with sidecar containers’ persistence
Deployments
Apps are deployed in a K8s cluster by creating Deployments which contain declarative configuration that instructs K8s how we’d like to manage and update a workload.
A deployment is made up of one or more pods, giving us the ability to manage a group of identical pods as single resource.
Behind the scenes, when a Deployment is created, K8s creates a Replica Set which represents a group of pods. This Replica Set maintains the number of desired replicas specified in the config.
K8s Deployments are perfect for horizontal scaling as we can increase/decrease the number or replicas depending on traffic and optimize resource utilization.
Horizontal Pod Autoscaler (HPA) helps with automatically scaling the number of Pods in a Deployment. It has configurable scaling policies that define the min and max number of Pods as well as the rate at which Pods are added/removed.
To update a deployment in Kubernetes, create a new container image for the application, push it to a container registry, and update the deployment’s image field. Kubernetes performs a rolling update by default, creating new Pods with the updated image and terminating old ones gradually to ensure availability. Alternatively, the ‘Recreate’ strategy can be used, where all old Pods are terminated before new ones are created, causing temporary downtime.
We can also easily rollback deployments if needed
Services
As Deployments scale up and down, Pods get created and destroyed, how can apps in the cluster connect to pods that may have different IPs? The Services acts as a stable network endpoint that our apps can use to find and commuicate with each other. A service gives us a stable IP address and DNS name that doesn’t change as we scale and update Pods.
There are different types of K8s Services:
- ClusterIP: exposes the service on a cluster-internal IP, allowing other apps running in the cluster to access it
- NodePort: exposes the service on each node’s IP at a static port, making it accessible from outside the cluster
- LoadBalancer: provisions a cloud-provided load balanced and assigns it a fixed, external IP to access the service from outside the cluster
- ExternalName: maps the service to an external DNS name, allowing internal apps to access external services
A Kubernetes Service of type LoadBalancer is designed to expose an application to the public internet by provisioning a cloud load balancer. The LoadBalancer Service will have a public IP address or DNS name that external users can connect to. The load balancer then forwards traffic to the appropriate Kubernetes Service and pods.
In the cloud, we’ll mostly deal with ClusterIP (internal) and LoadBalancer (external).
A service uses labels and selectors to identify the group of Pods to rout to traffic to.
- labels are key-value pairs attached to K8s objects like pods, deployments services, nodes, others; labels are applied to K8s objects to provide identifying metadata
- selectors are used to target a set of objects based on their labels;
Namespaces
With namespaces, each team can deploy their objects, like apps and services, into their own designated area of the Kubernetes cluster. They can manage and modify their own stuff without interfering with what the other teams are doing.
A pod can connect to any service in the same namespace using just the name of the services. If a Pod wants to call a service in another namespace, it must add the target’s namespace to the DNS name.
Ingress
A LoadBalancer Service provides an external entry point for a single K8s Service, giving each service its own public IP and DNS name. While simple, managing multiple LoadBalancers for numerous services can become expensive and complex.
An Ingress, on the other hand, acts as a centralized entry point for multiple services, routing traffic based on URL paths or hostnames
Volumes
Volumes can be used to attach persistent storage to a container, allowing the app running inside the container to read and write data to a designated location.
For ML workloads, object storage systems (like Amazon S3, GCS) are preferred for their scalability, durability, and compatibility with ML frameworks like TensorFlow and Spark. Applications access data via embedded code or mounted storage systems.
K8s also simplifies traditional storage integration through the Container Storage Interface (CSI). CSI allows storage vendors to develop drivers for seamless K8s compatibility, supporting various storage types (block, file, object). This decoupled approach lets storage providers innovate independently while Kubernetes users access modern storage features efficiently.
We can add storage to workloads through:
- static provisioning: administrators manually allocate Persistent Volumes (PVs) beforehand. Persistent Volume Claims (PVCs) are then created by users to request and attach storage to Pods
- dynamic provisioning: K8s automatically provisions storage volumes based on PVC requests and a defined StorageClass, simplifying management for dynamic workloads
Persistent Volume (PV) are the actual storage resource. Access modes include:
- ReadWriteOnce: single Pod access
- ReadOnlyMany: multiple Pods with read-only access
- ReadWriteMany: multiple Pods with read-write access
Persistent Volume Claim (PVC) specify storage requirements (size, access mode) and is used by Pods to bind to PVs.
ConfigMaps
Stores app configuration as key-value pairs, enabling us to externalize settings like db connection strings or logging levels. It decouples configuration from application code, allowing the configuration to be provided to containers as environment variables or files
Secrets
Similar to ConfigMaps but for sensitive information.
Jobs
The key difference between a Job and a regular Deployment is that a Job will run a task to completion and then stop, rather than continuously running the application.
There are 3 types of jobs:
- non-parallel jobs: simplest type, designed to run a single task to completion
- multiple parallel jobs: for tasks that can be parallelised, like processing a large batch of data
- parallel jobs with fixed completions: similar to multiple parallel jobs but with a specified number of successful completions required before the job is considered done
We can schedule jobs in a cluster using CronJobs which are similar to traditional cron jobs.
StatefulSets
Manage stateful applications that require stable identities, persistent storage, and ordered deployment.
- stable network identities: each Pod gets a predictable DNS name (e.g., web-0, web-1) that persists across rescheduling
- persistent storage: Pods are bound to specific Persistent Volumes (PVs) using volumeClaimTemplates. Data persists even if a Pod is replaced
- ordered Deployment and scaling: Pods are created, deleted, and scaled sequentially to ensure consistent behavior
- headless Service integration: a headless Service (ClusterIP: None) provides direct DNS entries for each Pod, supporting stateful communication patterns
DaemonSets
This is for when we need a system service that we want to run on every single node in our cluster (like to collect logs or metrics) - we don’t need multiple replicas, we need exactly 1 instance on each node. A DaemonSet ensures one copy of a Pod is running on every (or a selection of) nodes in a cluster.
Advanced scheduling techniques
In addition to Deployments, where we let K8s pick the node on which a workload should run, and a Daemon service which runs on every node, there are times when we might need finer control over which node runs a particular Pod.
- node labels: we can apply labels to K8s nodes, and then use those labels to specify which nodes should run a particular Pod
- affinity and anti-affinity: we can use affinity to schedule a Pod based node labels or other Pods running on the node
- taints and tolerations – taints, like labels, are key-value pairs that we can add to nodes. When a node is tainted, K8s requires that a toleration is added to any Pod that should be scheduled on the tainted node
This can be used in ML when we need only workloads tht need a GPU get placed on nodes with GPUs.
GitOps
A practice where infrastructure is treated like source code, using Git as the single source of truth for infrastructure and runtime configurations. Changes to workloads or infrastructure are made via pull requests, and GitOps tools, like Flux or ArgoCD, automatically apply these changes to the cluster when detected. Unlike traditional infrastructure-as-code (IaC), which requires manual intervention to apply changes, GitOps automates this process. The infrastructure’s actual state is continuously aligned with the desired state defined in the Git repository, managing Kubernetes cluster configurations like software deployments.
Software Engineering for Data Scientists
This was another book on my ML-related books, and as the author says in the preface: learning to write better code will let you do more data science
Software Engineering for Data Scientists - Chapter 1 - What is good code?
Why good code matters
Good code is crucial, especially when it integrates with larger systems, like ML models in production or tools for other data scientists. As projects grow, the importance of good code increases.
‘Code as craft’ - much like a carpenter takes pride in building a well-made wooden cabinet. Good code is efficient, elegant, provides satisfaction and is easier to maintain.
However, quick, poorly written code—often referred to as technical debt—can create problems down the line. It may involve missing documentation, bad variable names, or disorganized structure, all of which make the code harder to maintain. While technical debt is sometimes necessary due to deadlines or budget constraints, it can lead to more time spent fixing bugs later.
Adapting to changing requirements
When writing code for ML, and other projects, change is inevitable. And good code can be easility adapted to work well with such changes. Starting to write good code from the beginning is important - setting up these best practices and following them and the benefits will come as a project grows.
The book divides clean code principles into: simplicity, modularity, readability, performance, and robustness.
Simplicity
Simplicity is key in coding, as complex systems are harder to understand and modify. As projects grow, keeping all details in your head becomes impossible, and complexity can lead to unexpected issues, such as errors caused by missing steps in the workflow.
Accidental complexity can be reduced by keeping the code simple, avoiding repetition, and making it modular.
Example principles are:
- DRY
- avoiding verbose code
Modularity
Modular code:
- makes the code easier to read
- it’s easier to locate where a problem comes from
- it’s easier to reuse code in the next project
It’s important to break down big tasks into smaller modules. It is not something that can be done right from the get go, but it’s important to keep in mind and that our code will change as a project evolves.
For instance, for a task like:
We can start with a skeleton for each function
def load_data(csv_file):
pass
def clean_data(input_data, max_length):
pass
def plot_data(clean_data, x_axis_limit, line_width):
pass
Readability
…code is read much more often than it is written…
- PEP8
Code is for humans to read and for machines to execute.
We can follow best practices like PEP8, naming conventions, removing unnecessary prints, write docstrings
Performance
Good code needs to be performant and this can be measured in running time and memory usage. This is especially important for product code.
Robustness
This means code should be reproducible, and respond gracefully if system inputs change unexpectedly.
Can be achieved through:
- properly handling errors
- logging
- writing good tests
I guess this would be a fine book to read. It’s not long either.
Practitioners guide to MLOps - White paper by Google
The MLOps lifecycle addresses the challenges organizations face in deploying and managing machine learning (ML) systems effectively. Despite AI/ML being central to digital transformation, most organizations struggle to move beyond pilot projects, with many failing to deploy models into production or maintain them effectively once deployed.
Challenges
- manual and non-reproducible workflows
- inefficient handoffs between data scientists and IT teams
- lack of mid- to senior-level talent
- poor change-management processes and governance models
- challenges in deployment, scaling, and versioning
The role of MLE
MLE integrates software engineering principles with the unique complexities of ML:
- data preparation and maintenance
- monitoring and tracking model performance
- experimentation with data, algorithms, and hyperparameters
- continuous retraining on fresh data
- addressing data inconsistencies between training and serving environments
- ensuring model fairness and security
MLOps defined
MLOps unifies ML development (ML) with system operations (Ops), automating and standardizing critical steps in building, deploying, and managing ML systems. It mirrors the role of DevOps in app development but is tailored to ML-specific problems, such as changing data patterns, adversarial risks, and model drift.
Benefits of MLOps
- shorter development cycles and faster time-to-market
- improved collaboration across teams
- enhanced reliability, scalability, and security
- streamlined governance and operational processes
- higher return on ML investments
In essence, MLOps provides the framework and tools necessary to ensure ML systems are deployed reliably, monitored effectively, and scaled efficiently, aligning them with evolving business goals.
ML-enabled systems
DE is essential for supporting operational, analytics, and ML tasks by ingesting, integrating, curating, and refining data. Effective DE processes are crucial for the success note only for BI and analytics but also ML. ML models are built and deployed using curated data, often provided by the DE team, and integrated into various application systems like BI tools and process control systems. Integrating models requires ensuring their effectiveness within applications and monitoring performance. Additionally, tracking relevant business KPIs helps assess the model’s impact and guide adjustments.
The MLOps lifecycle
- ML development consists of experimenting and developing a robust and reproducible model training procedure
- training operationalisation is about automating the process of packaging, testing, and deploying repeatable and reliable training pipelines
- continuous training is repeatedly executing the training pipes in response to new data or to code changes, or on a schedule
- model deployment is about packaging, testing, and deploying a model to a serving env for online experimentation and production serving
- prediction serving is about serving the model that is deployed in prod for inference
- continuous monitoring is about monitoring the effectiveness and efficiency of a deployed model
- data and model management is a central, cross-cutting function for governing ML artifacts to support auditability, traceability, and compliance; this can also promote shareability, reusability, and discoverability of ML assets
And end-to-end workflow
This is not a waterfall workflow that has to sequentially pass through all the processes. The processes can be skipped, or the flow can repeat a given phase or a subsequence of the processes.
- the core activity during the ML dev phase is experimentation. As DSs and ML researchers prototype model architectures and training routines, they create labeled datasets, and they use features and other reusable ML artifacts that are governed through the data and model management process. The primary output of this process is a formalized training procedure, which includes data preprocessing, model architecture, and model training settings
- if the ML system requires continuous training, the training procedure is operationalized as a training pipe. This requires a CI/CD routing to build, test, and deploy the pipe to the target execution env
- the continuous training pipe is executed repeatedly based on retraining triggers, and it produces a model as output. The model is retrained as new data becomes available, or if model performance decay is detached. Other training artifacts and metadata that are produced by a training pipeline are also tracked. If the pipeline produces a successful model candidate, that candidate is then tracked by the model management process as a registered model
- the registered model is annotated, reviewd, and approved for release and is then deployed to a prod env. This process might be relatively opaque if you are using a no-code solution, or it can invovle building a custom CI/CD pipeline for progressive delivery
- the deployed model serves predictions using the deployment pattern that you have specified: online, batch, or streaming preds. In addition to serving predictions, the serving runtime can generate model explanations and capture serving logs to be used by the continuous monitoring process
- the continuous monitoring process monitors the model for predictive effectiveness and service. The primary concern of effectiveness performance monitoring is detecting model decay - for example, data and concept drift. The model deployment can also be monitored for efficiency metrics like latency, thoughput, hardward resource utilisation and exeuction errors
MLOps platform capabilities
To implement MLOps effectively, orgs should establish core technical capabilities, which can be provided by a single ML platform or a combination of vendor tools and custom services.
Key MLOps platform capabilities include foundational infrastructure (reliable, scalable, and secure compute resources), configuration management, and CI/CD tools. Core MLOps capabilities include experimentation, data processing, model training, evaluation, serving, online experimentation, monitoring, pipelines, and model registries. Additionally, cross-cutting capabilities such as an ML metadata repository and an ML dataset/feature repository are necessary for integration and interaction across workflows.
Experimentation
- provide notebook envs that are integrated with version control
- track experiments, include info about the data, hyperparams, and eval metrics for reproducability and comparison
- analyse and visualise data and models
- support exploring datasets, finding experiments, and reviewing implementations
- integrate with other data services and ML services in your platform
Data processing
- support interactive execution for quick experimentation and for long-running jobs in prod
- provide data connectors to a wide range of data sources and services, as well as data encoders and decoders for various data structures and formats
- provide both rich and efficient data transformations and ML feature engineering for structured and unstructured data
- support scalable batch and stream data processing for ML training and serving workloads
Model training
- support common ML frameworks and support custom runtime envs
- support large-scale distributed training with different strategies for multiple GPUs and multiple workers
- enable on-demand use of ML accelerators
- allow efficient hparam tuning and target optimisation at scale
- ideally, provide built-in AutoML functionality
Model evaluation
- perform batch scoring of your models on eval datasets at scale
- compute pre-defined or custom eval metrics for your model on different slices of the data
- track trained-model predictive performance across different continuous-training executions
- visualise and compare performances of different models
- provide tools for what-if analysis and for identifying bias and fairness issues
- enable model behaviour interpretation using various explanable AI techniques
Model serving
- provide support for low-latency, near-real-time (online) preds and high-throughput batch (offline) prediction
- provide built-in support for common ML serving frameworks for for custom runtime envs
- enable composite prediction routines
- allow efficient use of ML inference accelerators with autoscaling
- support model explainability
- support logging of prediction serving requests and responses for analysis
Online experimentation
- support canary and shadow deployments
- support traffic splitting and A/B testing
- support multi-armed bandit tests
Model monitoring
- measure model efficiency metrics
- detect data skews, data concept shifts and drifts
- integrate monitoring
ML pipelines
- trigger pipes on demand, schedule, in response
- enable local interactive execution for debugging during development
- integrate with ML metadata tracking
- provide a set of built-in components for common ML tasks
- run on different envs
- optionally, provide GUI-based tools for designing and building pipes
Model registry
- register, organize, track, and version trained and deployed models
- store model metadata and runtime dependencies for deployability
- maintain model documentation and reporting
- govern the model launching process
Dataset and feature repos
- enable shareability, discoverability, reusability, and versioning of data assets
- allow real-time ingestion and low-latency serving for event streaming and online prediction workloads
- allow high-throughput batch ingestion and serving for ETL process and model training
- enable feature versioning for point-in-time queries
- support various data modalities
ML metadata and artifact tracking
- provide traceability and lineage tracking of ML artifacts
- share and track experimentation and pipe param configs
- store, access, investigate, visualise, download and archive ML artifacts
- integrate with all other MLOps capabilities
Deep dive of MLOps processes
ML development
Training operationalization
Continuous training
Model deployment
A more complex CI/CD system for model deployment:
Prediction serving
Continuous monitoring
Data and model management
ML metadata tracking:
Model governance
Putting it all together
Delivering business value through ML is not only about building the best ML model for the use case at hand. Delivering this value is also about building an integrated ML system that operates continuously to adapt to changes in the dynamics of the business environment. Such an ML system involves collecting, processing, and managing ML datasets and features; training, and evaluating models at scale; serving the model for predictions; monitoring the model performance in production; and tracking model metadata and artifacts.
End-to-end MLOps workflow:
This white paper seems condensed, yet it is in-depth and detailed. Definitely something to refer back to when working on a project, preparing for an interview, or just thinking about MLOps.
Streamed - Intro to LangSmit
Today I saw that LangChain announced a new free course - Introduction to LangSmith
Tldr - it’s amazing, and the LangSmith platform is great for anyone doing LLM observability. It’s important to note that I am just a single learning dev, and I saw that it’s paid for companies ~ so maybe MLflow’s free LLM observability features can suffice.
They also showed this Chat LangChain webapp kind of like the Ask Astro for Airflow.
I also saw that this stream was my 100th video on my youtube channel. Nice ending
That is all for today!
See you tomorrow :)