(Day 201) Struggling with neo4j and a fraud GNN

Ivan Ivanov · July 20, 2024

Hello :) Today is Day 201!

A quick summary of today:

  • started using neo4j as a graph database

It is ~3.30am when I am starting to write this as I was occupied with neo4j and trying to get a GNN model to work so I will keep it brief and go to bed.

Why did we switch from arangoDB to neo4j? Neo4j has a free cloud version with up to 400,000 edges so we can use this one for our real-time inference pipeline. Whereas in arangoDB there is no permanent free version. There is a free one but it expires after 14 days, and if anything I’d like to keep the data in the cloud db just in case we need it.

Inserting data into neo4j

This is the dataset we will go with for now. It has a lot of info about the credit card user, transaction, and also we can extract location info about the merchant.

I set up neo4j locally and inserted some data (inserting was easy and fast).

image

In the red square, you can see the amount of nodes, and edges. The right side is an sql-esque type of query space but we need to use neo4j’s language. This is the query which I used to insert credit card nodes and their attributes:

image

And for merchants and transactions - it was similar.

Setting up data for a Graph Neural Network

Academia vs reality strikes. During XCS224W: ML with Graphs, I never really had to worry or think about the data, we used good data and we learned about implementing different models. Well today I got struck with the reality of data. Tldr - I was not sure how to create my graph data exactly for some type of GNN. Yes I knew what were the nodes, what are the edges, node and edge indecis and attributes, but whenever I created some Data or HeteroData (as torch geometric calls the two data class options I tried) I did not manage to get a model working with it. Tomorrow I will continue with this. At the moment my idea is: nodes are CreditCard and Merchant, and edge is TRANSACTION which goes only one way: from CreditCard to Merchant.

That was not the main focus of today, it was about finding a good dataset, which we did and then learning a bit about graph databases. Tomorrow I will try to figure out a way and create even a simple GNN that works.

I am working all locally today so there is not any code I can share at the moment. But hopefully in the next days we will have something set up on github.

That is all for today!

See you tomorrow :)