(Day 200) Kukmin Bank AI competition project idea

Ivan Ivanov · July 19, 2024

Hello :) Today is Day 200!

A quick summary of today:

  • creating a project plan, timeline and learning about graph DBs

My lab mate and I considered two project ideas. First I will talk and show our plan for the project we chose, and then for the other one which I did a simple demo for.

Project idea we chose: Real-time fraud analysis using graph data and graph database

My lab mate said he trusts me to choose the technologies for our project and to set up a plan to follow:

Training Pipeline

Data Collection and Preprocessing:

  • Raw dataset obtained from Kaggle containing transaction data.
  • Raw data is preprocessed before going into the database.

Storage in ArangoDB:

  • Transactions are stored in ArangoDB, a popular graph database.
  • Customers and merchants are represented as nodes, and transactions as edges.

Graph Neural Network with PyG:

  • PyTorch Geometric (PyG) is used to build and train a Graph Neural Network (GNN) for fraud detection on the graph data.
  • The model is trained on a batch of the dataset and saved for inference.

Real-time inference pipeline

Simulated Data Streaming:

  • Emulating a live transaction environment with continuous data flow using the downloaded full Kaggle dataset.

Data Ingestion with Kafka:

  • Kafka handles the real-time streaming of transaction data, ensuring efficient data flow and availability.

Storage in ArangoDB:

  • Real-time transactions are stored in ArangoDB, maintaining the graph structure.

Real-Time Fraud Detection:

  • The trained GNN model analyses new transactions in real-time as they are streamed and stored.
  • The model uses the saved GNN model to perform real-time inference and detect fraudulent transactions.

image

Also in excel, I used a project timeline template to create:

image

The competition started on Wednesday and submission deadline is 11th of Aug, so we are starting from today. The first task is finding a decent dataset with a decent amount of transactions.

One of the new things for me is using a graph db. I saw there is neo4j nd this ArangoDB, I decided to use ArangoDB because the python setup was easier. From playing with it, I included sample transactions and the db webapp (similar to postgres’ pgAdmin) and got a limited (because there are too many nodes and edges) visualisation:

image

I am really looking forward to this project and working with my lab mate - Jae-hyeok Choi.

As for the idea that we dismissed - banking voice assistant

I was up for both, and my lab mate preferred the first one, and now looking back - I am glad haha because it is so exciting.

Nevertheless, to set up a voice assistant demo was surprisingly easy. I set it up using huggingface spaces. Here is the link.

image

The above ~30 lines of code take voice -> turn it into text -> a language model answers the text -> the answer is transformed to speech and returned to the user. To try the demo, one needs an OpenAI api key. In order for this simple voice assistant to a bank once, I need to use the LM for a RAG app that talks to a db. Given I made db2chat it should not be that difficult. But this is a project for later.

That is all for today!

See you tomorrow :)