(Day 210) 118 minutes of Glaswegian accent audio clips

Ivan Ivanov · July 29, 2024

Hello :) Today is Day 210!

A quick summary of today:

  • final audio clips preprocessing to reach our audio dataset mark

image

Final dataset for the glaswegian voice assistant AI (link to HuggingFace). Today I preprocessed the final audios from 2 of Limmy’s youtube videos (Limmy accidentaly kills the city and The writer of Saw called Limmy a …). Just an update on how the process goes now ~

Since our transcription AI is pretty good (according to my Glaswegian speaking project partner), we pass the full raw audio to our fine-tuned whisper model hosten on HuggingFace spaces. Then the transcript is put into a docs file (where first I check over it for obious mistakes and flag if I see something odd and cannot understand it from re-listening to the audio) and split into sensible (small) bits while listening to the audio, like:

image

(this is the start from Limmy accidentaly kills the city)

Then using an audio tool, I cut the full audio length into clips according to the cut text, then I match clip name and transcript in excel, then using python I get the clip length and sampling rate. Finally I add static info like gender, age, class, location, speaker id to the data, and finally I get a csv which I upload onto the hugging face dataset along with the cut audio clips.

Next steps are to train a final whisper model, and then a Text-To-Speech model using this final dataset. Maybe tomorrow when we go to breakfast I will leave whisper to fine-tune.

That is all for today!

See you tomorrow :)