ReTreever: A semantic tree of layered document representations
Effective question answering (QA) systems depend on retrieving the right information efficiently, allowing large language models (LLMs) to dynamically tap into vast amounts of information without needing exhaustive fine-tuning.
The conventional method of converting documents into high-dimensional embeddings—mathematical representations of data—for similarity search comes with notable limitations, including:
- Large memory footprint due to the high dimensionality of embeddings
- Heavy computation during inference
- Lack of transparency, making it difficult to understand the retrieval process or derive insights from the collection
To address these issues, we designed ReTreever, a tree-based approach that structures documents hierarchically and represents each at various granular levels. This allows for a coarse-to-fine representation, balancing cost and utility and making the retrieval process more transparent.
Why ReTreever is a game changer
ReTreever offers many benefits, including:
- Intuitive document navigation: ReTreever organizes documents into a hierarchical structure, mitigating the need to sift through a giant, flat set of embeddings.
- Zoom features: ReTreever’s coarse-to-fine representations allow for zooming out to a general overview of the topic or zooming in for the nitty-gritty details.
- Transparency: Unlike other methods, ReTreever provides visibility into how documents are organized. It's like having a map of your data's semantic landscape.
- Efficiency: ReTreever is computationally efficient, eliminating the need for calls to LLMs during both construction and navigation, which can be expensive.
- Flexible representation sizes: ReTreever supports shorter embeddings—such as 64 or 128 dimensions—while maintaining competitive performance. This gives you control over balance between retrieval accuracy and inference speed.
Peeking inside ReTreever
One of the great things about ReTreever is that it displays how it organizes documents (see Figure 1). By exploring the hierarchical tree, you can discover thematic clusters and understand why certain documents are grouped together. It's like uncovering the hidden logic of your data. Even without directly optimizing for semantic coherence, ReTreever naturally arranges documents into meaningful groups.
Figure 1: Visualization of the topics (in bold) and keywords extracted from the contexts assigned to one subtree (in green) rooted at node 5 of a ReTreever tree of depth 10 learned from the Natural Questions dataset. For compactness, we represent only a subset of the nodes and paths, stopping at depth 5. Topics are locally coherent along a path, which indicates that ReTreever naturally groups contexts semantically.
How ReTreever works
ReTreever organizes documents into a learnable perfect binary tree (T) of depth D, where each node represents a cluster of documents. Instead of storing embeddings in a flat list, the tree arranges documents hierarchically: Nodes near the root capture broad topics, whereas deeper nodes represent more specific subtopics.
Each split node in the tree has a routing function, learned jointly across the structure, helping to ensure queries and their reference documents follow similar paths. This directly optimizes retrieval performance by assigning positive query-context pairs to the same leaf node.
The tree uses embeddings from a frozen encoder (E), which converts text into numerical representations. Any standard encoder, such as BGE, BERT, ModernBERT, and Longformer, can be used to generate these embeddings, which guide the hierarchical clustering and retrieval process.
Training ReTreever
ReTreever is designed to retrieve documents that best answer a given query. We measure its performance using standard metrics such as recall@k (whether the correct document appears in the top-k results) and NDCG@k (which rewards ranking relevant documents higher).
To achieve strong performance on these metrics, ReTreever is trained end to end with a contrastive loss that ensures each query is routed to the same leaf node as its relevant context (see Figure 2).
Figure 2: During ReTreever’s training, a positive query-context pair is first encoded by the frozen E (Bi-Encoder). The resulting embeddings pass through a depth D (here D = 3) tree, where each split node assigns probabilities for routing left or right. These probabilities combine into an assignment embedding, where each element represents the chance of reaching a specific leaf. During inference, leaf assignments serve as fine-grained representations, whereas intermediate-level assignments provide coarse representations.
Learning methods
Routing functions
Each branching node learns a routing function that decides whether an input moves to its left or right child. These functions can range from simple linear projections to more complex neural networks or cross-attention mechanisms.
Balanced structure
A contrastive learning objective encourages positive query-context pairs to converge on the same leaf and unrelated pairs to diverge to different leaves, avoiding the trivial solution of routing everything into one node.
Gradient descent
To keep the tree well-structured and differentiable, probabilistic relaxation converts binary splits into continuous probabilities, and tree propagation methods ensure the probabilities at each node remain consistent with its ancestors.
- Probabilistic relaxation: To make the tree learning differentiable, probabilistic relaxations are used to transform hard routes into soft ones, allowing inputs to have soft probabilities of traversing different branches.
- Tree propagation: To ensure the probability of reaching a node is constrained by the probability of reaching its ancestors, a tree propagation mechanism is used. A basic approach multiplies probabilities along the path, whereas more flexible learned methods refine routing further.
Performant coarse representations
To further improve the retrieval performance of coarse representations, a stochastic optimization scheme is used. During training, different tree levels are randomly selected and optimized, ensuring the model remains effective across varying levels of granularity.
Evaluation
We evaluate ReTreever’s performance against standard retrieval methods, exploring its effectiveness across various embedding sizes. Figure 3 presents one such result, comparing ReTreever to BGE on the Natural Questions (NQ) dataset.
We observe that ReTreever trained with stochastic depth performs competitively not only at the standard embedding size of 1024, but also at significantly compressed representations as small as 32, 64, or 128. This highlights ReTreever’s ability to maintain strong retrieval performance even with coarse-grained embeddings, making it highly adaptable to different resource constraints.
Figure 3: ReTreever’s performance on the NQ dataset compared to BGE, a standard text encoder, measured by recall@ and NDCG@100 across various representation sizes. The blue curve represents ReTreever trained with stochastic depth, and the green curve represents BGE fine-tuned with a modified version of loss from the Matryoshka Representation Learning (MRL) work for better, shorter representations. ReTreever maintains relatively high performance even at representation lengths of 64 and 128. The yellow curve shows ReTreever trained exclusively for 1024-length representations, and the orange curve shows BGE fine-tuned with a linear adapter. (Note that the x-axis is on a logarithmic scale.)
For more details about ReTreever and the experiments we conducted, see our paper: https://arxiv.org/abs/2502.07971.
Find out more about ServiceNow Research.