What is retrieval-augmented generation?

What is retrieval-augmented generation (RAG)?

Retrieval-augmented generation (RAG) enhances large language models by incorporating data from external knowledge bases to improve the accuracy and relevance of outputs without retraining. This makes it efficient and adaptable for specific domains.

Demo AI

Things to know about generative adversarial networks

Why is RAG important?

What are some retrieval-augmented generation use cases?

What is the difference between RAG and semantic search?

What are the components that enable RAG?

What are RAG challenges?

What are the benefits of RAG?

How does retrieval-augmented generation work?

How do you implement a RAG system?

What are some ways to improve RAG performance?

What is the future of retrieval-augmented generation?

ServiceNow for retrieval-augmented generation

Retrieval-augmented generation is a term that originated in a 2020 paper authored by Patrick Lewis. In the paper, Lewis introduced a method that significantly expanded the capabilities of generative AI models by integrating them with external knowledge sources. This integration was designed to enhance the models' accuracy and applicability across various contexts, propelling RAG into a rapidly expanding area of research and application.

The term 'retrieval-augmented generation' precisely describes the methodology's core function—augmenting the generative process of AI models through the retrieval of external data. The concept quickly gained traction, leading to widespread adoption in academic and commercial spheres. Today, RAG underpins numerous AI systems in both research environments and real-world applications, signifying a crucial evolution in how generative models are utilised and developed.

RAG starts with gathering data from various sources like websites, databases, or documents. This data is then converted into a format that the model can search and use, creating a sort of external knowledge library.

Expand All

Collapse All

Why is RAG important?

Large language models (LLMs) are at the forefront of advancements in artificial intelligence, particularly in natural language processing applications such as intelligent chatbots. These models are designed to understand and generate human-like text with the goal of providing accurate answers in various contexts. However, there are some inherent challenges with LLMs that affect their reliability.

One major issue with LLMs is their tendency to deliver responses that may be inaccurate, outdated or based on non-authoritative sources. Since LLMs operate on fixed datasets, their knowledge is effectively frozen at the point of their last training update.

RAG addresses these challenges by integrating a retrieval mechanism that taps into authoritative, up-to-date external knowledge sources before generating responses. This approach enhances the accuracy and relevance of the information provided by LLMs while ensuring that the responses are grounded in verified data. By doing so, RAG improves user trust and control over the outputs of AI applications.

Introducing Now Intelligence

Find out how ServiceNow is taking AI and analytics out of the labs to transform the way enterprises work and accelerate digital transformation.

Get Ebook

What are some retrieval-augmented generation use cases?

Retrieval-augmented generation is revolutionising various business functions by enhancing the accuracy and personalisation of AI-driven tasks. Here are some key use cases where RAG is making a significant impact.

Improve customer support

RAG technology transforms customer service by powering advanced chatbots and virtual assistants that provide more accurate and contextually relevant responses. By accessing the latest information and data from authoritative sources, these AI systems can offer quick and personalised solutions to customer inquiries. This capability improves the speed of response—and increases customer satisfaction and operational efficiency.

Generate content

RAG also assists businesses in crafting high-quality and relevant content such as blog posts, articles and product descriptions. By leveraging its ability to pull and integrate data from various external and internal sources, RAG ensures that the content is both engaging and rich with verified information. This saves considerable time and resources in content development processes.

Perform market research

RAG is invaluable for conducting thorough market research by compiling and analysing information from a wide array of online sources including news outlets, industry reports and social media. This enables businesses to stay ahead of market trends and make data-driven decisions that align with current market dynamics and consumer behaviours.

Support sales

Utilising RAG can greatly enhance the sales process by providing virtual assistance that can access and relay information about products, including specifications and inventory levels. It can answer customer questions in real-time and offer personalised recommendations based on preferences and prior interactions. It can even pull in reviews and feedback from various channels to aid consumers in making informed purchasing decisions.

Improve employee experience

RAG can improve the employee experience by creating an easily accessible central knowledge hub. Integrating with internal databases, RAG provides employees with accurate, up-to-date information on everything from company policies to operational procedures. This supports a more informed workforce and can streamline internal processes by reducing time spent searching for information.

What is the difference between RAG and semantic search?

RAG and semantic search both enhance LLMs, but they serve distinct functions. RAG improves LLMs by integrating them with external knowledge sources, aiding in generating accurate and relevant responses. It is especially useful in applications like customer support or content generation that require precise and current information.

Semantic search, however, focuses on understanding the intent and contextual meaning behind queries. It uses natural language understanding to navigate large databases and retrieve information that aligns semantically with user inquiries.

While RAG leverages external data to enrich LLM outputs, semantic search automates the data retrieval process, handling complexities like word embeddings and document chunking. This reduces manual efforts in data preparation and ensures the relevance and quality of information used by LLMs.

Together, RAG and semantic search enhance the functionality and accuracy of AI applications by improving both the retrieval and generation processes.

What are the components that enable RAG?

RAG relies on several critical components within its architecture to enhance the functionality of LLMs.

The orchestration layer
This component acts as the central coordinator within the RAG system. It processes the user's input along with any associated metadata, such as conversation history. The orchestration layer directs queries to the LLM and handles the delivery of the generated response. This layer typically integrates various tools and custom scripts, often written in Python, to ensure seamless operation across the system.
Retrieval tools
These are essential for sourcing the context needed to anchor and inform the LLM's responses. Retrieval tools include databases serving as knowledge bases and API-based systems that pull relevant information. These tools provide the factual backbone to responses, ensuring they are both accurate and contextually relevant.
LLM
The large language model itself is the core component that generates responses based on the prompts and information retrieved. Whether hosted by a third-party provider like OpenAI or operated internally, the LLM uses vast data-trained parameters to produce nuanced and contextually appropriate outputs.

What are RAG challenges?

Implementing retrieval-augmented generation comes with a set of challenges that organisations need to navigate. Here are some of the main things to be aware of.

Novelty of the concept

Being a relatively new technology, RAG requires a deep understanding and skilled personnel to implement it effectively. This newness can lead to uncertainties in deployment and integration with existing systems.

Increased initial costs

The integration of RAG into existing infrastructures often involves upfront investments in both technology and training. Organisations may face significant initial costs as they acquire specific resources and expertise.

Modelling data appropriately

Determining the most effective ways to model and structure data for use in a RAG system is crucial. This involves selecting the right data sources and formats that align with both organisational needs and the capabilities of the LLM.

Developing process requirements

Establishing clear requirements for processes that will utilise RAG is essential. This includes defining the objectives and outcomes expected from implementing RAG-driven applications.

Handling inaccuracies

Creating processes to address potential inaccuracies in the outputs generated by RAG systems is vital. This means developing mechanisms to identify, correct and learn from errors to enhance the reliability of responses.

What are the benefits of RAG?

RAG offers several compelling benefits that can significantly enhance the capabilities of AI systems.

Efficient and cost-effective implementation
RAG allows organisations to leverage existing databases and knowledge sources without the need for extensive retraining of models. This means implementation is both time and cost-efficient.
Precise and up-to-date information
By retrieving information from real-time, authoritative sources, RAG ensures that the data used in generating responses is accurate and current, which enhances the quality of outputs.
Enhanced user trust
The accuracy and relevance of information provided by RAG systems help build user trust since responses are more reliable and grounded in verified data.
More developer control
Developers have greater control over the responses generated by AI systems with RAG. They can specify the sources from which information is retrieved and tailor outputs to specific needs and contexts.
Reducing inaccurate responses and hallucinations
By grounding responses in factual data, RAG significantly reduces the likelihood of generating incorrect or fabricated responses, commonly referred to as "hallucinations" in AI terminology.
Providing domain-specific, relevant responses
RAG also excels in delivering tailored responses based on specific industry knowledge or specialised domains. This makes it highly effective for targeted applications.
Easier to train
RAG models can be more straightforward to train as they utilise existing knowledge bases and data, reducing the complexity and resource intensity of the training process.

How does retrieval-augmented generation work?

Here is a step-by-step explanation of how RAG operates.

1. Create external data

RAG starts with gathering data from various sources like websites, databases, or documents. This data is then converted into a format that the model can search and use, creating a sort of external knowledge library.

2. Retrieve relevant information

When a user asks a question, RAG turns this question into a searchable form and finds the most relevant information from its knowledge library. For example, if someone asks about their holiday balance, RAG will find and use the company's holiday policies and the person's own holiday record.

3. Augment the LLM prompt

Next, RAG combines the user's original question with the information it just found. This combined information is then given to the LLM, which uses it to give a more accurate and informed answer.

4. Update external data

To keep the answers relevant, RAG regularly updates its external data sources. This could be done automatically or at scheduled times, ensuring that the information it uses is always current.

How do you implement a RAG system?

Implementing a retrieval-augmented generation system involves several key steps. By following these steps, a RAG system effectively enhances an LLM's ability to generate responses that are not only based on its internal knowledge but are also informed by up-to-date, external data.

1. Prepare data

The first step is to gather and prepare the data that will be used by the RAG system. The data must then be cleaned and formatted properly to ensure consistency and accuracy. This stage may involve removing duplicates and addressing any data quality issues.

2. Index relevant data

Once the data is prepared, it needs to be indexed to make it searchable. This means creating a structured format, often in a database or a search index, where each piece of data is tagged with specific keywords or converted into a numerical representation. This indexing process determines how efficiently the system can retrieve relevant information in later stages.

3. Retrieve relevant data

With the data indexed, the RAG system can now retrieve relevant information based on user queries. This step involves matching the query or certain keywords from the query to the indexed data. Advanced algorithms are used to ensure that the most relevant and accurate data is retrieved for use in generating responses.

4. Build LLM applications

Finally, integrate the retrieved data into the LLM's workflow. This step involves configuring the LLM to accept the user input along with the retrieved data as part of its input prompt. The LLM then uses both its pre-trained knowledge and the newly retrieved external data to generate more accurate responses.

What are some ways to improve RAG performance?

To improve the performance of a RAG system, consider implementing the following strategies:

Provide high-quality data
Clean and accurate data helps prevent the common "rubbish in, rubbish out" problem. This includes removing irrelevant markup and ensuring the data is current. It also means maintaining its integrity (like preserving important spreadsheet headers). High-quality data improves the LLM's ability to understand and generate relevant responses.
Experiment with different text chunk sizes
The way data is segmented into chunks can significantly affect the performance of your RAG system. Smaller chunks may lack context, while larger ones may be difficult for the model to process efficiently. Testing different chunk sizes can help you find the optimal balance that maintains context without overwhelming the system.
Update your system prompt
The prompt you use to engage the LLM can influence its output. If the results are not satisfactory, consider revising the prompt to better specify how the model should interpret and utilise the provided data. This might involve clarifying the context or adjusting the phrasing to guide the model's focus.
Filter your vector store results
Filtering the results retrieved from your vector store can enhance relevance and accuracy. For example, you can set filters to exclude or prioritise certain types of documents based on metadata, such as document type or publication date. This helps ensure that the information being retrieved is most relevant to the query.
Try different embedding models
Different embedding models can vary in how they process and represent data. Experimenting with various models can help you identify which one best suits your specific needs. Additionally, consider fine-tuning your own embedding models using your dataset to make the model more attuned to the specific terminology and nuances of your domain.

By carefully implementing these strategies, you can significantly enhance the efficacy and accuracy of your RAG system for better performance and more reliable outputs.

What is the future of retrieval-augmented generation?

Retrieval-augmented generation is currently making significant strides in enhancing the capabilities of conversational AI applications by providing more contextually relevant responses. However, the potential future applications of RAG extend far beyond current uses.

Looking ahead, RAG technology could evolve to enable generative AI to not only provide information but also take appropriate actions based on the context of user inputs and external data. For instance, a RAG-enhanced AI could analyse assorted options to find the best holiday rental, book accommodations automatically during specific events, and even handle related travel arrangements—all in response to a user's request.

Of course, RAG could even advance the depth of interaction in more complex informational domains. For example, beyond merely informing an employee about tuition reimbursement policies, RAG could integrate detailed, personalised advice about suitable educational programmes that align with an employee's career goals and prior training. It could also facilitate the application process for these programmes and manage subsequent administrative tasks such as initiating reimbursement requests.

As RAG technology continues to mature, its integration into AI could redefine the boundaries of automated assistance and decision-making support.

ServiceNow Pricing

ServiceNow offers competitive product packages that scale with you as your enterprise business grows and your needs change.

Get Pricing

ServiceNow for retrieval-augmented generation

RAG is set to enhance the capabilities of AI in a wide range of industries. The ServiceNow ServiceNow AI Platform integrates AI technologies like machine learning and natural language understanding to streamline operations, automate tasks and enhance decision-making. With RAG systems, ServiceNow can offer even more precise and context-aware AI solutions, boosting productivity and efficiency across various workflows.

For a deeper dive into how ServiceNow can transform your business operations with advanced AI technologies, demo ServiceNow today.

Dive deeper into generative AI

Accelerate productivity with Now Assist – generative AI built right into the ServiceNow AI Platform.

Explore AI

Contact Us

Resources

Articles

What is AI?

What is Generative AI?

What is a LLM?

Analyst Reports

IDC InfoBrief: Maximise AI Value with a Digital Platform

Generative AI in IT Operations

Implementing GenAI in the Telecommunication Industry

Data Sheets

AI Search

Predict and prevent outages with ServiceNow® Predictive AIOps

Ebooks

Modernise IT Services and Operations with AI

GenAI: Is it really that big of a deal?

Unleash Enterprise Productivity with GenAI

White Papers

Enterprise AI Maturity Index

GenAI for Telco

Automotive

Banking

Consumer Packaged Goods

Healthcare

Insurance

Life Sciences

Manufacturing

Nonprofit

National Government

Retail

Technology Providers

Telecom

Find a partner

Become a partner

Partner awards

Partner portal

Partner applications

Careers

Investors

ServiceNow AI Research

Leadership

Locations

Newsroom

Analyst Reports

Global impact

Trust and compliance

AI Agents

IT Service Management

ServiceNow AI Control Tower

IT Operations Management

Customer Service Management

Strategic Portfolio Management

IT Asset Management

Governance, Risk, and Compliance

Security Operations

Field Service Management

HR Service Delivery

Employee Center

AI

Data

Workflows

AI Experience

Infrastructure

RaptorDB

AI Agents

ServiceNow AI Control Tower

Security

App Engine

ServiceNow Store

Responsible AI

Provide better experiences

Resolve issues faster

Create and automate workflows

Enterprise Architecture

Service Operations Workspace

Cloud Governance Suite

Operational Technology Management

IT Asset Management

IT Operations Management

IT Service Management

ServiceNow Cloud Observability

Strategic Portfolio Management

Digital End-user Experience

Customer Service Management

Field Service Management

Sales and Order Management

Configure, Price, Quote

Financial Services Operations

Healthcare and Life Sciences Service Management

Sales and Order Management for Technology Providers

Sales and Order Management for Telecommunications

Public Sector Digital Services

Telecommunications Service Management

Technology Provider Service Management

Security Operations

Security Incident Response

Vulnerability Response

Threat Intelligence Security Center

Integrated Risk Management

Third-party Risk Management

Security Posture Control

Privacy Management

HR Service Delivery

Talent Development

Legal Service Delivery

Workplace Service Delivery

App Engine

Integration Hub

Accounts Payable Operations

Sourcing and Procurement Operations

Supplier Lifecycle Operations