Hello :) Today is Day 200!
A quick summary of today:
- creating a project plan, timeline and learning about graph DBs
My lab mate and I considered two project ideas. First I will talk and show our plan for the project we chose, and then for the other one which I did a simple demo for.
Project idea we chose: Real-time fraud analysis using graph data and graph database
My lab mate said he trusts me to choose the technologies for our project and to set up a plan to follow:
Training Pipeline
Data Collection and Preprocessing:
- Raw dataset obtained from Kaggle containing transaction data.
- Raw data is preprocessed before going into the database.
Storage in ArangoDB:
- Transactions are stored in ArangoDB, a popular graph database.
- Customers and merchants are represented as nodes, and transactions as edges.
Graph Neural Network with PyG:
- PyTorch Geometric (PyG) is used to build and train a Graph Neural Network (GNN) for fraud detection on the graph data.
- The model is trained on a batch of the dataset and saved for inference.
Real-time inference pipeline
Simulated Data Streaming:
- Emulating a live transaction environment with continuous data flow using the downloaded full Kaggle dataset.
Data Ingestion with Kafka:
- Kafka handles the real-time streaming of transaction data, ensuring efficient data flow and availability.
Storage in ArangoDB:
- Real-time transactions are stored in ArangoDB, maintaining the graph structure.
Real-Time Fraud Detection:
- The trained GNN model analyses new transactions in real-time as they are streamed and stored.
- The model uses the saved GNN model to perform real-time inference and detect fraudulent transactions.
Also in excel, I used a project timeline template to create:
The competition started on Wednesday and submission deadline is 11th of Aug, so we are starting from today. The first task is finding a decent dataset with a decent amount of transactions.
One of the new things for me is using a graph db. I saw there is neo4j nd this ArangoDB, I decided to use ArangoDB because the python setup was easier. From playing with it, I included sample transactions and the db webapp (similar to postgres’ pgAdmin) and got a limited (because there are too many nodes and edges) visualisation:
I am really looking forward to this project and working with my lab mate - Jae-hyeok Choi.
As for the idea that we dismissed - banking voice assistant
I was up for both, and my lab mate preferred the first one, and now looking back - I am glad haha because it is so exciting.
Nevertheless, to set up a voice assistant demo was surprisingly easy. I set it up using huggingface spaces. Here is the link.
The above ~30 lines of code take voice -> turn it into text -> a language model answers the text -> the answer is transformed to speech and returned to the user. To try the demo, one needs an OpenAI api key. In order for this simple voice assistant to a bank once, I need to use the LM for a RAG app that talks to a db. Given I made db2chat it should not be that difficult. But this is a project for later.
That is all for today!
See you tomorrow :)