(Day 246) First deployment on Kubernetes

Ivan Ivanov · September 3, 2024

Hello :) Today is Day 246!

A quick summary of today:

deploying a simple model on K8s
finishing the Streaming databases book

Deploying a simple model on Kubernetes (K8s)

Who else to explain K8s in an easy and intuitive way better than DataTalksClub… I am not surprised haha. In their ML zoomcamp, module 10 is about kubernetes, and I watched the videos where Alexey (the instructor) explains k8s and goes into a practical example. Below is my understanding of how k8s works and a deployed example model. Everything is on my repo.

Kubernetes intro

Node ~ server/computer (i.e. EC2 instance)
Pod ~ docker container, runs on a node
Deployment ~ group of pods with the image & config
Service ~ the entry point of an application, routes requests to pods
- external = LoadBalancer
- internal = ClusterIP
Ingress ~ the entrypoint to the cluster
HPA ~ Horizontal pod autoscaler - ensures the app scales up and down to match the required demand

Ingress routes traffic to services within the cluster
a sample request comes in from a user and is first sent to the LoadBalancer (the external-facing service), which balances the load of incoming requests across multiple Pods within the Gateway Deployment. This means the LoadBalancer routes the request to a Pod in the Gateway Deployment on any Node in the cluster (e.g., Node 1 or Node 2)
from there, the Gateway Pod processes the request (e.g., handling authentication, routing, etc.) and forwards it to the internal Model Service
the Model Service then assigns the request to one of the Pods in the Model Deployment, which may be running on a different Node within the cluster (e.g., Node 1 or Node 2)
finally, once the Model Pod processes the request and generates a response, the response is sent back through the Gateway Pod and ultimately back to the user

The example in this repo and below only has 1 service and deployment. The above is used as a more complicated example to show routing

Create k8s cluster

kind create cluster

next, build and run our docker app

Create deployment

apiVersion: apps/v1 
kind: Deployment
metadata:
  name: app-deployment  # name of the deployment, used to identify the deployment object
spec:
  replicas: 1  # number of pod replicas that should be running at any time
  selector:
    matchLabels:
      app: app  # identifies the pods managed by this deployment using the "app: app" label
  template:
    metadata:
      labels:
        app: app  # the label applied to the pods, used by the deployment to manage the pods
    spec:
      containers:
      - name: app-pod  # name of the container inside the pod.
        image: test-model:v1  # container image to use
        resources:
          limits:
            memory: "128Mi"
            cpu: "500m"
        ports:
        - containerPort: 9696  # exposed port of the container

create deployment.yaml, and run

kubectl apply -f deployment.yaml

k8s does not know about the test-model:v1 image, so we should register it first. Load it with

kind load docker-image test-model:v1

The deployment keeps retrying to load the image, and once the image is loaded, we get

app-deployment-7bfd99c878-fnbxl 1/1 Running 0 4m32s

when I run kubectl get all

Test the deployment

We can use port-forwarding:

We specify, the name of the pod, and then the port

kubectl port-forward app-deployment-58c8855fbd-xknbf 9696:9696

And I can use curl to test this:

curl -X POST http://localhost:9696/predict -H "Content-Type: application/json" -d '{"features": [5.1, 3.5, 1.4, 0.2]}'

And I get: {"prediction":0}

Create service.yaml

apiVersion: v1
kind: Service
metadata:
  name: app # name of the service
spec:
  type: LoadBalancer # LoadBalancer means expose the service outside the cluster
  selector:
    app: app # selects the pods with the label "app: app"
  ports:
  - port: 80 # port in the service
    targetPort: 9696 # port on the pod

The service will forward requests to the pods. We as users will communicate with the service.

Apply service:

kubectl apply -f service.yaml

We can see it with kubectl get svc

NAME         TYPE           CLUSTER-IP     EXTERNAL-IP   PORT(S)        AGE
kubernetes   ClusterIP      10.96.0.1      <none>        443/TCP        29m
app         LoadBalancer   10.96.94.146   <pending>     80:30966/TCP   2m15s

The EXTERNAL-IP is always pending unless we use some service like GKE and EKS

Test the k8s service

kubectl port-forward service/app 8080:80

Terminal:

Forwarding from 127.0.0.1:8080 -> 9696
Forwarding from [::1]:8080 -> 9696

And if I do (also change the port to 8080 now)

curl -X POST http://localhost:8080/predict -H "Content-Type: application/json" -d '{"features": [5.1, 3.5, 1.4, 0.2]}'

I get back: {"prediction":0}

Overall

Definitely feel more confident about my k8s knowledge ^^

MLOps engineer for 4 years shares their experience

The author reflects on their journey in machine learning and MLOps, beginning with their first major project in 2021: day-ahead electricity consumption forecasting for multiple cities. Initially, they focused on predicting daily electricity use 24 hours in advance using tree-based models like LightGBM and XGBoost. The challenges of model and data drift, as well as external factors like weather and holidays, complicated predictions. To address these, they built automated systems to generate features, select models, and make daily predictions, though manual oversight was still necessary.

Their experience led to a realization of the importance of MLOps in maintaining models and adapting to changes. After this project, they transitioned to working on an MLOps platform in healthcare where they faced new challenges in building production-level software. They had to learn traditional software engineering skills and navigate the complexities of integrating client-specific needs into a scalable platform.

Throughout their career they grappled with the evolving roles in ML and MLOps, questioning their identity as an engineer (MLOps vs MLE vs SE vs DS). They juggled various responsibilities, from research and model development to platform design and deployment. Despite these struggles, having a wide range of skills is of course a benefit, but it is important to balance wide knowledge and expertise in 1 field.

Streaming databases: Chapter 7. Emergence of Other Hybrid Data Systems

Data Planes

Emerging real-time systems can be better understood through a Venn diagram that illustrates where these systems fit within different data planes. This approach helps clarify the use cases and deployment models in real-time analytical scenarios.

The diagram considers both streaming and database perspectives. From a streaming database perspective, it involves systems that can consume and emit streams, while executing materialized views asynchronously. This view hints at the potential characteristics of next-generation databases.

The Streaming Plane

The Venn diagram outlines three data planes:

Operational Plane: Houses OLTP databases, which are consistent and use row-based storage. It also includes the applications that utilize these databases.
Analytical Plane: Contains OLAP databases, optimized for analytical queries, featuring columnar storage and eventual consistency.
Streaming Plane: Acknowledged as the only data plane where data is mostly in motion. It bridges the operational and analytical planes, enabling real-time data flow into the analytical plane. This integration allows for quicker, data-driven decisions by combining real-time and historical data.
- The streaming plane includes source connectors that bring static data into motion and sink connectors that write streaming data into RTOLAP databases for low-latency access.
- Platforms like Kafka and Kafka Connect, along with stateful stream processors and streaming databases, reside in this plane.

The Overlaps

The overlap between the operational and streaming planes, and between the streaming and analytical planes, represents areas where both data at rest and data in motion coexist. This is where streaming databases operate.
The overlap between the operational and analytical planes will be further explored later in the chapter.

Consistency Spectrum

The above image illustrates the consistency spectrum of stream processors, showing a division between strictly consistent and eventually consistent processors, as well as various storage types. Streaming data flows from the operational plane to the analytical plane, leveraging connectors and streaming platforms like Kafka along the way.

The overlapping areas in these diagrams highlight the crucial interactions between different data planes, essential for understanding the role of streaming systems in the broader analytical ecosystem.

Hybrid Transactional/Analytical Database (HTAP)

Streaming OLTP Databases: These databases (e.g., RisingWave, Materialize) combine stream processing with OLTP features using row-based storage.
Streaming OLAP Databases: These databases (e.g., Proton) merge OLAP features with stream processing using column-based storage, optimized for complex queries with eventual consistency.
HTAP Databases: Positioned between OLTP and OLAP without streaming, HTAP databases can handle both workloads, enabling real-time decision-making. They leverage in-memory and persistent storage, allowing analytical queries on in-flight transactions, though they lack stream processing features.
Market Examples:
- Unistore (Snowflake): Hybrid storage with both row and columnar data.
- SingleStoreDB (SingleStore): Supports in-memory row-based and on-disk column-based tables.
- TiDB (PingCAP): Combines transactional key-value store (TiKV) with columnar analytics (TiFlash).
- HydraDB: Offers transactional row-based and column-based storage.
Limitations: HTAP databases are monolithic and not suited for large-scale historical data, thus requiring separate OLAP systems for deep history. They excel in operational plane analytics but may still need data aggregation from OLAP systems.
Hybrid Databases: The evolution of HTAP and hybrid databases addresses real-time analytical challenges but often necessitates additional infrastructure for scalability and optimization.

Motivations for Hybrid Systems

Stream processing systems aim to reduce the complexity often associated with real-time data processing by providing a more user-friendly experience similar to traditional databases. Vendors are working to make stream processing more accessible by simplifying APIs, interfaces, and tools, thereby lowering the barrier to entry for businesses.

A significant challenge in stream processing adoption is the scarcity of skilled data engineers. By integrating a database-like environment, organizations can leverage real-time analytics without needing specialized expertise.

On the other hand, OLTP databases are adopting OLAP and streaming features to better handle analytical queries and meet the demand for real-time data analytics. This helps avoid complex infrastructure typically needed for separate analytical systems. Streaming features also simplify data replication and synchronization in distributed environments by ensuring real-time data consistency.

Both stream processing and OLTP databases are converging toward the goal of providing real-time analytics with less infrastructure and greater accessibility. Hybrid systems often use OLTP databases due to their proximity to user-facing applications, emphasizing the need for data analytics closer to the operational plane. Streaming systems must recognize these needs to improve their reputation and ease of implementation.

Streaming dbs: Chapter 8. Zero-ETL or Near-Zero-ETL

ETL Model

The above shows existing ETL solutions from no ETL in HTAP databases at the top to the turn-the-database-inside-out distributed solution at the bottom. The lower the solution is on the triangle, the more distributed and complex it becomes. Likewise, at the top, solutions are more centralized and monolithic, and they become more decentralized and modular as you move to the bottom. On the left side of the triangle are transactional databases, while on the right are columnar databases. Midway down the triangle is zero-ETL.

Zero-ETL

Zero-ETL is a pattern first defined by Amazon Web Services (AWS) to simplify data integration from an OLTP database to an OLAP database. In its proposal, zero-ETL is defined as follows:

_A set of integrations that eliminates or minimizes the need to build ETL data pipelines. Extract, transform, and load (ETL) is the process of combining, cleaning, and normalizing data from different sources to get it ready for analytics, artificial intelligence (AI) and machine learning (ML) workloads. _ - AWS

Zero-ETL refers to an approach or concept in data integration and analytics that aims to minimize or even eliminate the need for traditional ETL processes. The traditional ETL process involves extracting data from source systems, transforming it to meet the requirements of the target system, and then loading it into the destination.

Zero-ETL challenges the traditional integration approach and seeks to reduce the complexity, latency, and resource requirements associated with ETL at a cost. Key aspects of zero-ETL:

Key aspect	Description
Real-time data integration	Minimizing or eliminating batch processing to enable real-time or near-real-time data integration. This is particularly relevant for scenarios where timely insights are crucial.
Schema-on-read	Adopting a schema-on-read approach, where the data is not transformed into a predefined schema during the ETL process but is instead interpreted at the time of analysis. This allows for more flexibility in handling diverse and changing data.
Data virtualization	Leveraging data virtualization technologies that provide a unified and virtual view of data across multiple sources without physically moving or transforming the data. This can reduce the need for creating and maintaining a separate data warehouse.
In-database processing	Performing transformations and analytics directly within the database systems, where the data resides, avoiding the need to extract and move large datasets for processing.
Event-driven architecture	Adopting event-driven architectures, where data changes trigger immediate updates, reducing the reliance on periodic batch processes.
Modern data architectures	Embracing modern data architectures, such as data lakes and cloud-based solutions, that provide scalable and cost-effective options for managing and analyzing data without the traditional ETL bottlenecks.

Near-Zero-ETL

Near-zero-ETL still tries to limit infrastructure for ETL components without losing the flexibility needed to support complex data integration requirements. This involves using data systems that adopt hybrid approaches.

One solution is to leverage an OLTP database that has embedded features to send data to other systems, without the need for self-managing connectors running on a separate infrastructure.

Embedded OLAP

There is a trend to bring smaller analytical workloads closer to the operational plane. HTAP databases like Hydra and SingleStore provide columnar databases for analytical workloads, for example. However, due to their limited capacity, these databases cannot hold the amount of data analytical systems like Snowflake, Databricks, ClickHouse, and Pinot can.

Conversely, bringing bulky analytical systems to the operational plane for faster serving of real-time analytics makes it harder for analytical systems to source historical data. These are the limitations that created the data divide between operational and analytical data planes.

Alternatively, reducing the analytical data to a size fitting to the scope of the business domain and the capacity in the operational plane could provide a better solution.

Lambda Architecture

The lambda architecture is a data processing model introduced by Nathan Marz in his 2011 book Big Data: Principles and Best Practices of Scalable Realtime Data Systems. It addresses the need for robust and scalable data processing in big data applications by combining both batch and real-time/streaming data processing.

Key Components of Lambda Architecture:

Batch layer: Processes large datasets in batch mode for complex analytics and historical queries. Technologies like Apache Hadoop are commonly used.
Speed layer: Handles real-time data with low-latency processing, using technologies like Apache Storm or Apache Flink. It processes recent data that hasn’t been handled by the batch layer.
Serving layer: Combines results from both the batch and speed layers to provide a unified data view, typically using scalable NoSQL databases like Apache HBase or Cassandra.

Strengths and Challenges:

The architecture’s strength lies in its ability to handle both batch and real-time processing. However, managing two separate processing paths introduces complexities, particularly in maintaining consistency between batch and real-time data views.

Alternatives:

Kappa architecture: Proposes a unified approach to stream and batch processing to simplify system design.
Stream processor and OLAP database: Using tools like Apache Pinot for serving historical data combined with streaming data processed by technologies like Flink or Pathway (for Python users).

Streaming dbs: Chapter 10. Deployment Models

Consistent Streaming Database

A consistent streaming database allows complex asynchronous/stream processing that integrates with application logic. It simplifies infrastructure and allows querying of asynchronous process outputs directly from the database, without requiring extensive streaming knowledge. Examples of such databases include RisingWave and Materialize.

Above, a consistent streaming database is shown to bridge the gap between streaming and operational planes. It consumes data from an OLTP database, performs stream processing for analytical transformations, and stores results in row-based storage. Applications can write to the OLTP database and read from the streaming database, separating read and write resources.

The streaming database can also aggregate data from multiple OLTP databases and send it to a data warehouse or lakehouse for internal business analytics. However, limitations arise when user-facing applications require historical data from the warehouse, or when complex analytical queries demand columnar storage.

Pros:

Low Latency: Provides data freshness within milliseconds.
Scalability: Separates and scales read/write resources independently.
Multi-Source Aggregation: Combines inputs from various OLTP databases.
Incremental Transformations: Processes data before sending it to the analytical plane.
Versatile Querying: Supports both push and pull queries with a single interface.

Cons:

Lacks Columnar Storage: Less suited for complex analytical queries.
Historical Data Limitations: Challenges in sourcing historical data from the analytical plane.
Scalability Issues: Struggles with very large datasets.

Consistent Streaming Processor and RTOLAP

A consistent stream processor, like Pathway, can output streams to a columnar database. These processors execute push queries near the operational plane and integrate with application logic. The output is sent to a streaming platform, such as Kafka, which a RTOLAP database (e.g., Pinot) can consume.

The above illustrates the transformation of data within the stream processor, writing results to both a data warehouse/lakehouse and a RTOLAP database. The RTOLAP database can then combine real-time data with historical data from the warehouse/lakehouse, allowing low-latency pull queries at the operational plane.

Pros:

Low Latency: Provides data freshness in milliseconds to seconds.
Comprehensive Analytics: Includes all historical context in user-facing analytics without burdening operational infrastructure.
Fast Queries: Columnar storage in the RTOLAP database optimizes analytical workloads.
Topic Reuse: The RTOLAP and stream processor can efficiently reuse topics on the streaming platform.
Dual Functionality: The stream processor can handle both real-time analytics and application logic.

Cons:

Query Separation: Push and pull queries are handled separately by the stream processor and OLAP database, requiring coordination between engineers, which can be challenging in practice.

Eventually Consistent OLAP Streaming Database

For simplified infrastructure, a streaming OLAP database like Proton can consolidate analytical workloads into one solution. However, due to its eventual consistency, it should not participate in application logic. The below shows that data flows one way from the OLTP to the streaming database, leveraging the OLAP’s columnar storage for low-latency queries.

Proton, being a streaming database, can output its stream processing results to Kafka, allowing other databases to consume the analytical stream and build replicas. Since Proton integrates ClickHouse (an RTOLAP), it has built-in columnar storage for low-latency analytical queries. Kafka can also distribute analytics globally in real-time.

Pros:

Low Latency: Provides data freshness in milliseconds to seconds.
Comprehensive Historical Data: Offers more context by combining real-time and historical data.
Replication: Can emit analytical changes for building replicas.
Simplified Infrastructure: Merges stream processing and OLAP technologies.
Unified Query Engine: Balances push and pull queries within a single SQL engine.
Efficiency: Less bulky solution.

Cons:

Eventual Consistency: Not suitable for application logic.

This solution reduces infrastructure complexity and provides more historical data access for user-facing analytics, simplifying the engineering effort.

Eventually Consistent Stream Processor and RTOLAP

This solution, commonly used for real-time analytics, typically involves Flink as the stream processor and an RTOLAP database like Pinot (see Figure 10-4). It is widely adopted in high-scale, real-time applications.

Pros:

Low Latency: Provides data freshness in milliseconds to seconds.
Comprehensive Analytics: Combines historical and real-time data for a complete view.

Cons:

Complexity: The solution is complex and bulky, leading to higher costs.
Limited Application Logic: Since Flink is eventually consistent, it should not be used in application logic.
Separated Query Engines: Lacks a unified SQL engine for push and pull queries, increasing engineering and organizational complexity.

This solution is ideal when your application requires more historical data for user-facing analytics, and consistency isn’t a critical concern when enriching fact streams with dimensional data.

Eventually Consistent Stream Processor and HTAP

This solution leverages an HTAP database alongside an OLTP database to keep analytical workloads close to the operational plane. An eventually consistent stream processor captures and sends historical data to a data warehouse or lakehouse.

The stream processor sources limited historical data for the HTAP database, which, due to its columnar format, supports low-latency analytical queries. However, the HTAP database typically holds limited historical data, often with retention constraints.

Pros:

Low Latency: Provides data freshness in milliseconds.
Fast Queries: HTAP databases use columnar storage for quick analytical queries.
Simplified Infrastructure: Low infrastructural complexity.

Cons:

Limited Historical Data: Holds only a small subset of historical data.
Retention Complexity: Managing retention for historical data in the HTAP database increases complexity.
Application Logic Limitations: The stream processor cannot participate in application logic.

This solution is ideal for use cases that require millisecond-level freshness and a small amount of historical data for user-facing analytics.

ksqlDB

ksqlDB offers continuous refinement (similar to eventual consistency) and is built on the Kafka Streams library, designed for deployment in microservices within the operational plane of an application backend.

ksqlDB is best suited for simpler stream processing operations on Kafka, as it only supports Kafka as its data source and sink. Complex operations, especially involving JOINs, can be difficult to implement correctly due to limited SQL syntax and semantics (e.g., no self-JOINs, no nested JOINs). A high level of stream processing expertise is required to avoid inconsistent logic.

Use Cases: Ideal for preparing data for analytical destinations (e.g., data warehouses or lakes) and enabling point queries through materialized views (TABLEs) for batch-only systems.

Pros:

Provides data freshness.
Supports stream processing capabilities.
Enables point queries using materialized views.

Cons:

Limited to Kafka as the source and sink.
Requires significant stream processing expertise.
Complex stream processing operations are challenging.
Limited SQL syntax and semantics support.

Incremental View Maintenance (IVM)

IVM solutions like Feldera, PeerDB, and Epsio support batch-based point queries on preprocessed, fresh data, offering full SQL syntax and semantics. Unlike ksqlDB, they integrate more closely with operational databases like PostgreSQL and do not rely on Kafka. However, these solutions are more inflexible and limited to the data sources and sinks supported by the respective vendor.

Pros:

Data freshness
Full SQL syntax and semantics
Consistency
Enables point queries
Introduces streaming aspects into the database world

Cons:

Restricted to vendor-supported sources/sinks

Postgres Multicorn Foreign Data Wrapper (FDW)

Multicorn is a PostgreSQL extension that facilitates foreign data wrapper (FDW) development using Python. It enables access to non-Postgres databases and integrates operational and analytical planes without relying on Kafka. However, it’s impractical for large organizations that use a variety of databases, as it requires centralizing around PostgreSQL.

Pros:

Full SQL syntax and semantics
Consistency
Access to both OLTP and OLAP databases within Postgres

Cons:

Requires Postgres as the central database technology, which is impractical at large scale

Code-Based Stream Processors

Traditional stream processors like Kafka Streams and Flink are still valuable, especially for complex streaming use cases such as fraud detection in microservices. Newer technologies like Quix Streams, Bytewax, and Pathway (Python-based) provide alternatives, with frameworks like Deephaven and GlassFlow also emerging.

Lakehouse/Streamhouse Technologies

Lakehouse technologies (e.g., Databricks’ Delta Tables, Apache Iceberg, and Apache Hudi) increasingly incorporate streaming features. The Streamhouse architecture (e.g., Ververica’s Apache Paimon) integrates stream and batch processing more closely, with Confluent’s Tableflow enabling similar functionality by exposing Kafka data as Iceberg tables. Streambased.io offers optimized SQL querying on Kafka topics using Bloom filters.

These advancements blur the lines between operational, analytical, and streaming planes, moving more processing closer to data in motion and reducing reliance on data warehouses/lakes.

Caching Technologies

For extremely low latency, tools like Redis or Hazelcast provide caching solutions, which can be integrated with streaming platforms.

General Processing and Querying Considerations

When determining where to process and query data, consider the location of your use case, data origin, processing location, and querying location. While processing and querying on the analytical plane is common, it can lead to challenges like data freshness, sharing, and incremental processing.

Streaming databases simplify stream processing and querying, reducing the need for specialized expertise and making streaming more accessible. Materialized views in streaming databases allow for fresher data and cost-effective incremental processing.

As streaming technologies continue to evolve, the data gravity of the streaming plane increases, blurring the boundaries between operational, analytical, and streaming planes.

That is all for today!

See you tomorrow :)