Hello :) Today is Day 202!
A quick summary of today:
- finally got a GNN to work
- learned how to use Mage for data streaming pipelines
I started today where I ended yesterday - trying to create some kind of a graph neural network to predict whether a transaction is fraud or not.
Tldr (as it is ~3.20am, another late night)
I ended up using torch geometric’s Homogeneous data class and the resulting data looks something like:
Data(x=[4290, 18], edge_index=[2, 23278], y=[4290], train_mask=[4290], test_mask=[4290])
The preprocessing involves undersampling the majority class and we end up with a balanced dataset.
The dataset has the following amount of edges and nodes
Neo4j is nice.
The model I found that works (at least for now, version 0.1) is:
After splitting data into train and test, the best model so far achieved the following results:
Accuracy: 0.8833, Precision: 0.8151, Recall: 0.9909, F1: 0.8945
Today I experimented with creating the training pipeline, but nothing is final yet. These are just experiments.
On another note, after getting a model to work - I decided to check out mage’s streaming pipelines.
I set up kafka services in docker compose, and with a python script:
I started sending sample data to try to access it through mage. On the mage side it is quite simple: using a Kafka data_loader block and a python transformer block to read the messages from the kafka stream
When I start running this pipeline we see:
Nice ^^ at least I know we can use mage for the real-time inference pipeline.
That is all for today!
See you tomorrow :)