(Day 86) Made a youtube video - Chat with your PDF for free in colab using huggingface, mongodb, llama_index, langchain

Ivan Ivanov · March 27, 2024

Hello :) Today is Day 86!

A quick summary of today:

  • Coded, planned, recorded and posted a video tutorial making a chat with your pdf rag system for free
  • All code + colab link + pdf used is on this github repo

Well, after waking up today, I definitely did not expect to plan, execute and upload an almost 1hr tutorial on youtube.

I was looking around chat with your PDF videos, to see what I can improve in my pdf_rag_from_scratch but I saw that most of the videos require an OpenAI api key, and I did not like that, given the availability of so many free resources and models.

And I found this great resource from huggingface - Building A RAG System with Gemma, MongoDB and Open Source Model. Instead of a pdf, they were using some dataframe for films, so I decided to improve upon that, and make the code preprocess a pdf, embed it, upload to mongodb, load gemma, create a prompt and chat with the pdf (kind of a combination of the tutorial + my pdf_rag_from_scratch).

The code itself is not that complicated, but I wanted to write it once/twice to make sure when I write live in the video recording, I do not have problems. So the whole process from idea to published video maybe took me 8 hours, mind that I had to find an app to edit the video (the editing was not much, but the app’s video processing time was long because I wanted it in 1080p).

Anyway ~ below I will provide an overall summary of the code

  1. Download libraries

image

  1. Preprocess PDF

2.1 Load PDF with llama-index

image

2.2 Chunk PDF text using langchain

image

2.3 Embed chunks

image

  1. Set up mongodb

An important part is to set up an atlas vector search

image

3.1 Connect to the db

image

3.2 Delete existing (if any), and insert data

image

  1. Find relevant texts

4.1 Perform vector search in db + get context

image image

  1. Load gemma using huggingface

image

  1. Prompt engineering + talk with your PDF

I used similar base_prompt with the pdf_rag_from_scratch

image image

Query: Do you pay or charge interest? Answer: Yes, the Core Banking Agreement states that interest is paid and charged on a daily basis, and the interest rate applicable to your account(s) is stated in the Product & Services Terms & Conditions or, if no such terms are provided, on the website.

The results are not perfect, but is a good starting point for fine-tuning.

That is all for today!

See you tomorrow :)