MLOps 101 Project for a mini-course I teach
After learning a tonne from great online teachers, and projects I decided to transfer my knowledge onto undergraduate students who are curious about the life of a model outside the Jupyter notebook
An end-to-end ML system that processes taxi data, stores models in a model registry, exposes them via an API, and deploys this API to Google Cloud, and keeps logs for observability
Tech: scikit-learn, EvidentlyAI, FastAPI, MLFlow, Docker, Github Actions, Terraform, Google Cloud (GCS, Logging, Compute Engine, Artifact Registry, Kubernetes Engine)
View Project
Esports Voice Data Pipeline (Zach Wilson's DE bootcamp capstone)
Esports team communication data is normally kept private, but for the first time a team is sharing their full voice communication records so with this I am showing a prototype of a pipeline that utilises audio data to extract communication patterns
In addition, I developed visualizations that uncover communication patterns and dynamics, providing the underlying team with actionable insights to enhance their gameplay
Tech: Airflow, dbt, Google BigQuery, Google Cloud Storage, Streamlit, Terraform, Github Actions, Astronomer
View Project
MLOps Architecture for Real-Time Fraud Detection
An AI-driven solution for real-time credit card fraud detection using MLOps techniques
Fully orchestrated pipelines, including data ingestion, model training, and real-time prediction and monitoring
High fraud case detection through Graph Convolutional Network, XGBoost, and CatBoost models
Tech: Neo4j graph DB, Sklearn, PyG, Mlflow, Kafka, Grafana, Mage orchestration, Docker, Streamlit
View Project
Voice-to-Voice Personal Finance Assistant
Talk, Learn and Analyse your spending habits with your Personal Finance Assistant AI Agent. Communicate through speech
Frontend + Backend communicating via a websocket
Ask follow-up questions (the language model can see the chat history)
Detailed observability of live and historical connections to the server via Pydantic Logfire
Tech: Webhooks, FastAPI, PydanticAI, Logfire, SQLite, OpenAI, Ollama, PostgreSQL, React
View on GitHub
Transaction Stream Data Engineering Pipeline
Generate transaction data via Stripe's API
Stream data using Apache Kafka and process it in real-time with PySpark Structured Streaming
Store processed data in PostgreSQL
Manage data transformations and modeling using dbt
Visualize data using Grafana
Tech: PostgreSQL DB, Kafka, PySpark, dbt, Grafana
View Project
Glaswegian Audio Dataset and ASR model
Co-create a 120 minute open-sourced Glaswegian dataset
Preprocess raw audio and transcriptions and upload to HuggingFace
Research into audio AI models and fine-tune ASR and TTS models
Tech: HuggingFace, Python, Fine-Tuning, Audio AI
View on HuggingFace
MLOps Architecture for Insurance Fraud Detection
Building an end-to-end MLOps pipeline to detect car insurance fraud
Pipeline orchestration covering data storage, data preprocessing (using IV and WoE), model training, deployment, and monitoring
Focus on achieving high recall in fraud detection using a Balanced Random Forest Classifier
Tech: PostgreSQL DB, Terraform, Google Cloud Platform, Mlflow, Prefect, Grafana, Evidently, Docker, FastAPI
View Project
Lending Club Data Engineering Pipeline
Build a data pipeline to process and visualize Lending Club data
Extract raw data from Kaggle and load it into Google Cloud Storage
Process data with dbt in BigQuery
Create visualizations using Looker
Manage infrastructure with Terraform
Orchestrate the entire process with Mage
Tech: Docker, Mage orchestration, Google Cloud Platform, Terraform, dbt, Looker
View Project
Other