RAG Technology Research Report

Table of Contents

Project Reports - This article is part of a series.

Part 1: Analysis Report on the Note-to-Blog Project

Part 3: This Article

Practical Issues
#

The emergence of any technology is driven by real-world problems. RAG (Retrieval-Augmented Generation) technology was developed primarily to address the following limitations of generative pre-trained language models:

Knowledge Limitations: Once a large language model (LLM) is trained, its weight parameters are fixed, meaning its performance and capabilities rely solely on what was learned during the training phase, resulting in limited timeliness.
High Cost of Knowledge Updates: Traditional full-parameter training methods are extremely expensive, making it difficult to quickly integrate newly generated data into the model’s parameters.
Hallucination Issues: LLM outputs are probabilistic, which can lead to factually incorrect or “hallucinated” responses.

Therefore, in real-world production environments, LLMs require a technology that can efficiently, accurately, and rapidly update domain-specific knowledge.

Solutions
#

To address these issues, the academic community has proposed several solutions. These approaches focus on different stages of the LLM development pipeline:

Data-Level Enhancement: Expanding the model’s knowledge sources.
Architecture-Level Improvement: Modifying the Transformer architecture to enable real-time dynamic adjustments to its internal parameters.
Post-Training Optimization: Incorporating fine-tuning techniques.
Query-Level Augmentation: Retrieving relevant knowledge from external databases based on user queries to enhance responses.

The original LLM workflow:

graph LR;  

p1(Data)-->p2(Transformer)-->p4(Generation);  
p3(Query)-->p2;

The improved workflow:

graph LR;  
datatype1(File);datatype2(Image);datatype3(Audio);datatype4(HTML);data(data);  
datatype1-->data;datatype2-->data;datatype3-->data;datatype4-->data;  

architecture1(Transformer);architecture2(Dynamic Transformer);architecture3(Fine-tuned Transformer);  
data-->architecture1;data-->architecture2;  

query(Query);retrieval(Retrival);  
query-->retrieval-->architecture3;  
retrieval-->architecture2;  

finetuning(Fine tuning);data2(Specific data);  
architecture1-->finetuning-->architecture3;  
data2-->finetuning;  

generation1(Generation);generation2(Generation);  
architecture3-->generation1;  
architecture2-->generation2;

Academic Definition
#

RAG (Retrieval-Augmented Generation) is an artificial intelligence technology that combines information retrieval with language generation models. It enhances the performance of large language models (LLMs) in knowledge-intensive tasks—such as question answering, text summarization, and content generation—by retrieving relevant information from external knowledge bases and incorporating it as contextual prompts. The RAG framework was first introduced by Facebook AI Research (FAIR) in 2020 and has since become a widely adopted solution in LLM applications. Original paper: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.

Project Reports - This article is part of a series.

Part 1: Analysis Report on the Note-to-Blog Project

Part 2: Schedule Management Report

Part 3: This Article