Projects

MLOps 101 Project for a mini-course I teach

  • After learning a tonne from great online teachers, and projects I decided to transfer my knowledge onto undergraduate students who are curious about the life of a model outside the Jupyter notebook
  • An end-to-end ML system that processes taxi data, stores models in a model registry, exposes them via an API, and deploys this API to Google Cloud, and keeps logs for observability
  • Tech: scikit-learn, EvidentlyAI, FastAPI, MLFlow, Docker, Github Actions, Terraform, Google Cloud (GCS, Logging, Compute Engine, Artifact Registry, Kubernetes Engine)

    View Project

    Esports Voice Data Pipeline (Zach Wilson's DE bootcamp capstone)

  • Esports team communication data is normally kept private, but for the first time a team is sharing their full voice communication records so with this I am showing a prototype of a pipeline that utilises audio data to extract communication patterns
  • In addition, I developed visualizations that uncover communication patterns and dynamics, providing the underlying team with actionable insights to enhance their gameplay
  • Tech: Airflow, dbt, Google BigQuery, Google Cloud Storage, Streamlit, Terraform, Github Actions, Astronomer

    View Project

    MLOps Architecture for Real-Time Fraud Detection

  • An AI-driven solution for real-time credit card fraud detection using MLOps techniques
  • Fully orchestrated pipelines, including data ingestion, model training, and real-time prediction and monitoring
  • High fraud case detection through Graph Convolutional Network, XGBoost, and CatBoost models
  • Tech: Neo4j graph DB, Sklearn, PyG, Mlflow, Kafka, Grafana, Mage orchestration, Docker, Streamlit

    View Project

    Voice-to-Voice Personal Finance Assistant

  • Talk, Learn and Analyse your spending habits with your Personal Finance Assistant AI Agent. Communicate through speech
  • Frontend + Backend communicating via a websocket
  • Ask follow-up questions (the language model can see the chat history)
  • Detailed observability of live and historical connections to the server via Pydantic Logfire
  • Tech: Webhooks, FastAPI, PydanticAI, Logfire, SQLite, OpenAI, Ollama, PostgreSQL, React

    View on GitHub

    Transaction Stream Data Engineering Pipeline

  • Generate transaction data via Stripe's API
  • Stream data using Apache Kafka and process it in real-time with PySpark Structured Streaming
  • Store processed data in PostgreSQL
  • Manage data transformations and modeling using dbt
  • Visualize data using Grafana
  • Tech: PostgreSQL DB, Kafka, PySpark, dbt, Grafana

    View Project

    Glaswegian Audio Dataset and ASR model

  • Co-create a 120 minute open-sourced Glaswegian dataset
  • Preprocess raw audio and transcriptions and upload to HuggingFace
  • Research into audio AI models and fine-tune ASR and TTS models
  • Tech: HuggingFace, Python, Fine-Tuning, Audio AI

    View on HuggingFace

    MLOps Architecture for Insurance Fraud Detection

  • Building an end-to-end MLOps pipeline to detect car insurance fraud
  • Pipeline orchestration covering data storage, data preprocessing (using IV and WoE), model training, deployment, and monitoring
  • Focus on achieving high recall in fraud detection using a Balanced Random Forest Classifier
  • Tech: PostgreSQL DB, Terraform, Google Cloud Platform, Mlflow, Prefect, Grafana, Evidently, Docker, FastAPI

    View Project

    Lending Club Data Engineering Pipeline

  • Build a data pipeline to process and visualize Lending Club data
  • Extract raw data from Kaggle and load it into Google Cloud Storage
  • Process data with dbt in BigQuery
  • Create visualizations using Looker
  • Manage infrastructure with Terraform
  • Orchestrate the entire process with Mage
  • Tech: Docker, Mage orchestration, Google Cloud Platform, Terraform, dbt, Looker

    View Project

    Other