(Day 285) I was muted on stream :( + more LLM fine-tuning

Ivan Ivanov · October 12, 2024

Hello :) Today is Day 285!

A quick summary of today:

  • streamed for ~2hrs
  • fine-tuned more LLMs for the movie review project

Today I streamed for 2 hours. And the 1st 1hr and 20mins are without audio :(

I only realised it when I checked my streaming software that my mic was muted :(

And what’s worse is the stream I did on Thursday for ~45mins, has no audio at all :( I have no idea when or why I muted the mic in Streamlabs.

Here is the link to it.

I read some scikit-learn docs - a bit about calibrating classification models. It’s an area that I still need to read into to better understand.

Also worked on fine-tuning models. Last night I finished the mistral-7b fine-tuning but the results are just …

I give it the normal input, and the output starts producing more reviews. The model I am fine-tuning is an instruct model (or at least says so), but it seems to just have learned to generate movie reviews. So that’s 3.5hrs down the drain.

I also tried to fine-tune some bigger models (>20B params) but Kaggle’s free T4 GPU runs out of memory on those.

Afterwards, I started fine-tuning base models using a 2nd prompt (a modified version of the instruction task):

Your task is to summarize. You are a helpful assistant who will help me evaluate Korean reviews. Up to 50 reviews are provided for each movie. Analyze the reviews by each dimension: emotion, character, plot, visuals, and pacing, and return a dimension-specific rating (1-10) and a summary description. To do this, please perform the following steps:

1. First, for each review, determine which dimension it is related to: emotion, character, plot, visuals, or pacing.

2. Assign a dimension-specific rating (1-10) to each review.

3. Aggregate the ratings of each individual review by dimension to generate a final dimension-specific rating and summary for the movie.

4. Translate the final dimension-specific summary for each movie into Korean and return it with dimension-specific rating (1-10).

The results so far are:

This is using the 1st version of the prompt:

Your task is to summarise. You are a helpful assistant that helps me evaluate Korean reviews. For each movie you are given 50 reviews. Analyze the reviews, and for the movie itself return a score(1 to 10) and explanation for each of the following criteria: Emotional, Characters, Plot, Visuals, Pacing. Return the review in Korean.

image

And 2nd version:

image

All of the models are on my huggingface account

One of the things I will do tomorrow - use OpenAI’s fine-tuning tool to fine-tune gpt4o-mini.


That is all for today!

See you tomorrow :)