(Day 200) Kukmin Bank AI competition project idea

Ivan Ivanov · July 19, 2024

gnn applying-knowledge

Hello :) Today is Day 200!

A quick summary of today:

creating a project plan, timeline and learning about graph DBs

My lab mate and I considered two project ideas. First I will talk and show our plan for the project we chose, and then for the other one which I did a simple demo for.

Project idea we chose: Real-time fraud analysis using graph data and graph database

My lab mate said he trusts me to choose the technologies for our project and to set up a plan to follow:

Training Pipeline

Data Collection and Preprocessing:

Raw dataset obtained from Kaggle containing transaction data.
Raw data is preprocessed before going into the database.

Storage in ArangoDB:

Transactions are stored in ArangoDB, a popular graph database.
Customers and merchants are represented as nodes, and transactions as edges.

Graph Neural Network with PyG:

PyTorch Geometric (PyG) is used to build and train a Graph Neural Network (GNN) for fraud detection on the graph data.
The model is trained on a batch of the dataset and saved for inference.

Real-time inference pipeline

Simulated Data Streaming:

Emulating a live transaction environment with continuous data flow using the downloaded full Kaggle dataset.

Data Ingestion with Kafka:

Kafka handles the real-time streaming of transaction data, ensuring efficient data flow and availability.

Storage in ArangoDB:

Real-time transactions are stored in ArangoDB, maintaining the graph structure.

Real-Time Fraud Detection:

The trained GNN model analyses new transactions in real-time as they are streamed and stored.
The model uses the saved GNN model to perform real-time inference and detect fraudulent transactions.

Also in excel, I used a project timeline template to create:

The competition started on Wednesday and submission deadline is 11th of Aug, so we are starting from today. The first task is finding a decent dataset with a decent amount of transactions.

One of the new things for me is using a graph db. I saw there is neo4j nd this ArangoDB, I decided to use ArangoDB because the python setup was easier. From playing with it, I included sample transactions and the db webapp (similar to postgres’ pgAdmin) and got a limited (because there are too many nodes and edges) visualisation:

I am really looking forward to this project and working with my lab mate - Jae-hyeok Choi.

As for the idea that we dismissed - banking voice assistant

I was up for both, and my lab mate preferred the first one, and now looking back - I am glad haha because it is so exciting.

Nevertheless, to set up a voice assistant demo was surprisingly easy. I set it up using huggingface spaces. Here is the link.

The above ~30 lines of code take voice -> turn it into text -> a language model answers the text -> the answer is transformed to speech and returned to the user. To try the demo, one needs an OpenAI api key. In order for this simple voice assistant to a bank once, I need to use the LM for a RAG app that talks to a db. Given I made db2chat it should not be that difficult. But this is a project for later.

That is all for today!

See you tomorrow :)