AI Search Retrieval Augmented Generation (RAG)

Release version: Yokohama

Updated July 31, 2025

1 minute to read

You can enhance the search accuracy of your AI Search results by using the AI Search Retrieval Augmented Generation (RAG) application. With RAG, you can limit a large language model's (LLM's) focus to a specific, contextual dataset, instead of the broad, general data that it was trained on.

AI Search RAG overview

RAG combines information retrieval with AI text generation. It works in two steps. It indexes the data to make it searchable and then searches that indexed data by using queries.

The effectiveness of AI Search RAG relies on its embedding model, which is used by the advanced search methods, such as a semantic or vector search, to retrieve the context-oriented information from indexed sources. The embedding model generates embeddings that are based on the user's search query. The embeddings are then used by an LLM to produce relevant responses. The embedding model is the engine behind RAG that enables it to search, retrieve, and embed information into a vector map before passing it to an LLM. By default, RAG uses the Embedding (E5) model, but it also supports additional third-party models such as Azure OpenAI Embedding and Google Gemini Embedding. Users can also bring their own custom embedding models from third-party providers to create embeddings for their specific RAG needs.

Activating AI Search RAG

AI Search RAG functionality is provided by the AI Search RAG plugin (sn_ais_rag). This plugin is automatically activated for your instance when you install Generative AI Controller or any Now Assist application.