Projects

MLOps Architecture for Real-Time Fraud Detection

  • An AI-driven solution for real-time credit card fraud detection using MLOps techniques
  • Fully orchestrated pipelines, including data ingestion, model training, and real-time prediction and monitoring
  • High fraud case detection through Graph Convolutional Network, XGBoost, and CatBoost models
  • Tech: Neo4j graph DB, Sklearn, PyG, Mlflow, Kafka, Grafana, Mage orchestration, Docker, Streamlit

    View on GitHub

    Transaction Stream Data Engineering Pipeline

  • Generate transaction data via Stripe's API
  • Stream data using Apache Kafka and process it in real-time with PySpark Structured Streaming
  • Store processed data in PostgreSQL
  • Manage data transformations and modeling using dbt
  • Visualize data using Grafana
  • Tech: PostgreSQL DB, Kafka, PySpark, dbt, Grafana

    View Project

    Your Personal Finance Voice Assistant

  • Chat, Talk, Learn and Analyse your spending habits with your Personal Finance Assistant
  • Communicate through text or speech
  • Ask follow-up questions (the language model can see the chat history)
  • On the dev side, see how the RAG is performing by analysing prompts and retrieved information
  • Tech: SQLite, OpenAI, HuggingFace, LlamaIndex, Azire Pheonix monitoring, Streamlit

    View on GitHub

    Glaswegian Audio Dataset and ASR model

  • Co-create a 120 minute open-sourced Glaswegian dataset
  • Preprocess raw audio and transcriptions and upload to HuggingFace
  • Research into audio AI models and fine-tune ASR and TTS models
  • Tech: HuggingFace, Python, Fine-Tuning, Audio AI

    View on HuggingFace

    MLOps Architecture for Insurance Fraud Detection

  • Building an end-to-end MLOps pipeline to detect car insurance fraud
  • Pipeline orchestration covering data storage, data preprocessing (using IV and WoE), model training, deployment, and monitoring
  • Focus on achieving high recall in fraud detection using a Balanced Random Forest Classifier
  • Tech: PostgreSQL DB, Terraform, Google Cloud Platform, Mlflow, Prefect, Grafana, Evidently, Docker, FastAPI

    View Project

    Lending Club Data Engineering Pipeline

  • Build a data pipeline to process and visualize Lending Club data
  • Extract raw data from Kaggle and load it into Google Cloud Storage
  • Process data with dbt in BigQuery
  • Create visualizations using Looker
  • Manage infrastructure with Terraform
  • Orchestrate the entire process with Mage
  • Tech: Docker, Mage orchestration, Google Cloud Platform, Terraform, dbt, Looker

    View Project

    Other