(Day 190) Learning about evaluating vector search engines for RAG apps

Ivan Ivanov · July 9, 2024

nlp applying-knowledge

Hello :) Today is Day 190!

A quick summary of today:

All the code from today is on my repo.

The first part of the module was related to doing semantic search using dense vectors in Elasticsearch

(picture is from the course)

first 10 rows from the created dataset:

The ID is needed to connect the sample created questions to the documenta they are related to.

Recall

Measures the number of relevant documents retrieved out of the total number of relevant documents available.
Formula: Recall = (Number of relevant documents retrieved) / (Total number of relevant documents)

Mean Reciprocal Rank

To do evaluation, a few different engines were created and compared based on recall and MRR

All ran with different times, so it is about evaluating what do we care more, speed vs a set % improvement

At the end, I completed the homework which covered similar questions like the above learned content.

That is all for today!

See you tomorrow :)