ServiceNow recherche

Efficient Inference

Auto-Cypher: Improving LLMs on Cypher generation via LLM-supervised generation-verification framework
Graph databases like Neo4j are gaining popularity for handling complex, interconnected data, over traditional relational databases in …
Unifying Autoregressive and Diffusion-Based Sequence Generation
We take significant steps toward unifying autoregressive and diffusion-based sequence generation by extending the SEDD discrete …
XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference
In-context learning (ICL) approaches typically leverage prompting to condition decoder-only language model generation on reference …
Layer-Wise Quantization: A Pragmatic and Effective Method for Quantizing LLMs Beyond Integer Bit-Levels
We present a simple meta quantization approach that quantizes different layers of a large language model (LLM) at different bit levels, …