Why RAG Answers Fall Short and How Query Enrichment Can Help
❓Why do the users of RAG describe the answers as ‘correct but not useful’? How can the RAG pipeline be improved to avoid this?
๐ RAG user: ‘this answer is good, but it is not useful to me. It should have also mentioned a few more things’.๐ค RAG Developer: ‘this looks perfectly good to me. I am myself amazed how my program is able to produce such a beautiful answer’.
One will encounter many such interactions while putting RAG into production. During the prototyping, when RAG starts producing correct answers, there is a feeling of awe and surprise. But when the actual users start using it for day-to-day use, they find the answers very basic. They say that they can find these answers reasonably quickly without the tool.
Why does this happen? ๐คจ
In my previous posts, I described how RAG works. It basically has three parts:
1️⃣Retrieve the material relevant to users' query.
2️⃣Augment the prompt with the material.
3️⃣Generate the answer using an LLM.
Each of the above parts can be improved to get better answers. Today we will see just one of the tricks: query enrichment ๐ธ.
The query posed by the users is generally brief and devoid of details. For example, user may type the query ‘changes in strategy’ while reading an annual report.
The retrieval part takes this query and performs a vector comparison with the items in your database. Imagine that an important part of strategy appears on one of the pages, but it does mention much about the term ‘strategy’. The vector comparison will fail ๐to fetch this page.
In general, the shorter the query, the lesser is the possibility of correct retrieval. You might get one item, but miss others. Thus you get an answer that is correct, but does not contain details and nuances.
Since the problem is caused by a short query, there is a simple solution: make the query bigger๐. This is called query enrichment.
How can we make the query bigger such that it fetches more relevant items? There can be so many ways, but we will see two here.
๐ Use Knowledge Graph: Create a knowledge graph of concepts and relationships. From the user query, extract the key terms and fetch related concepts from the KG. Convert the concepts and the connecting relationships into text. Add all this to the original query before retrieval.
๐ Use Language Model: Prompt another language model about concepts related to the ones in the user query. Add the response from the LM to the query before retrieval.
This technique belongs to the general class of query pre-processing.
Can you think of more ways to enrich the query? Please add to the comments.
retrievalaugmentedgeneration
generatiaveai
genaiinproduction
By Devesh Rajadhyax
Co-Founder, Cere Labs
Comments
Post a Comment