Hello :) Today is Day 205!
A quick summary of today:
- no minIO, just plain old postgres and local storage for mlflow
- helping my friend better understand github’s workflow
Yesterday I mentioned I set up minIO as an artifact store for mlflow ~
Well today I removed it haha. The problem was that i wanted some extra AWS credentials in order to work, and to get that I need a working AWS account. I can easilty set that up again (because I deleted my aws account recently), but in order for my project collaborator to set it up, he needs to do it as well. We have enough work as it is, and this extra tech is not essential so I just removed it and set up mlflow just using a local postgres docker service and a folder as the artifact store. Maybe later, we can swap out this whole mlflow to run on GCP but it is not crucial if we leave it as is for now.
While setting up mlflow and testing the connection, I was reminded of the value of “network” in dockere-compose. I had the other services using a network called ‘bridge’, but when I set up mlflow I did not add the network to the docker compose service like:
So within Mage, I could not access mlflow. I thoyght it had something to do with how I set up the ports, and after about 30 mins of trying to fix it, I saw that networks: bridge is missing from the mlflow-server service.
On another note ~ we had another Kukmin Bank (KB) project meeting
We shared what we learned from the data and doing some visualisations. I shared mine in yesterday’s post, but as for his - they are local on his machine, but I will share a link to them when we get them on github.
Talking about GitHub, we set that up too. I also got the chance to help my team partner with his understanding of the git workflow - commits, pushes, fetches, pulls, branches, merge conflicts. I was reminded of my days in Lloyds Banking Group in 2020 when back then I was helping team members learn about GitHub as well.
We set up this repo:
And just practiced commit, push, fetch, pull, switching branches. The repo is private at the moment, but we will make it public once we submit it to the competition.
As for next steps, on Friday we will meet again. Till then our tasks are as follows:
- create a script to do basic data transformation, upload data to neo4j, and extract as well - my task
- translate his visualisations written in R to python and put all visualisations (his and mine) in one notebook so it can live in the repo under a folder called EDA
Another thing ~ I saw Meta released a 405B param model ?!
405 billion ?! I checked on huggingface and its files are near 1TB. But that is big for open source AI as they claim it can compete with the best closed source models like GPT4o
That is all for today!
See you tomorrow :)