Hello :) Today is Day 284!
A quick summary of today:
- all-day fune-tuning movie reviewer LLMs
It is almost 3am, I had to wait for the final model (mistral) to finish fine-tuning.
I had to re-calculate the results from yesterday - rougeLSum for the summary and RMSE for the ratings because I was using the wrong ground truth data.
I made a table in excel that I can keep track of my progress and my professor can easily check-in to check my progress
Prompt v1
Your task is to summarise. You are a helpful assistant that helps me evaluate Korean reviews. For each movie you are given 50 reviews. Analyze the reviews, and for the movie itself return a score(1 to 10) and explanation for each of the following criteria: Emotional, Characters, Plot, Visuals, Pacing. Return the review in Korean.
Train size 185
Evaluation size 15
Evaluation is done using the ground truth dataset and the evaluation dataset
The same hyperparameters were used for all models
Model | Train time* | Text RougeLSum | Ratings RMSE | Link |
---|---|---|---|---|
llama-3.1-8b | 2hr 58min | 0.7037 | 1.4665 | here |
llama-3.2-3b | 1hr 42min | 0.6776 | 1.4550 | here |
llama-3.1-70b | ||||
qwen-2-7b | 2hr 40min | 0.6666 | 1.8658 | here |
qwen-2.5-7b | ||||
qwen-2.5-72b | ||||
gemma-2-9b | 4hr 9min | 0.67602 | 1.6392 | here |
gemma-2-27b | ||||
mistral-7b | 3hr 10min | tomorrow | tomorrow | here |
mistral-small-2409 | ||||
gpt4o-mini |
*Train time is based on using Kaggle’s free T4 GPU
The llama models were trained yesterday but today I fine-tuned and evaluated the Qwen, Gemma and Mistral models.
It was a long day.
That is all for today!
See you tomorrow :)