ServiceNow Research

Synthetic Data

BiXSE: Improving Dense Retrieval via Probabilistic Graded Relevance Distillation

Neural sentence embedding models for dense retrieval typically rely on binary relevance labels, treating query-document pairs as …

Understanding the Influence of Synthetic Data for Text Embedders
Recent progress in developing general-purpose text embedders has been driven by training on synthetic LLM-generated data. Nonetheless, …
RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content
Large Language Models (LLMs) are trained on vast amounts of data, most of which is automatically scraped from the internet. This data …
SEVN: A Sidewalk Simulation Environment for Visual Navigation
Millions of blind and visually-impaired (BVI) people navigate urban environments every day, using smartphones for high-level …
Learning to Remove Rain in Traffic Surveillance by Using Synthetic Data
Rainfall is a problem in automated traffic surveillance. Rain streaks occlude the road users and degrade the overall visibility which …