Accueil
Équipe
Publications
Open Source
Démos
Évènements
Blog
Carrières
Nous joindre
Français
Français
English
ServiceNow
ServiceNow IA recherche
Tags
Efficient Inference
ServiceNow IA recherche
Efficient Inference
Apriel-MTP: Multi-Token Prediction for Faster and More Efficient Language
We introduce multi-token prediction (MTP) variants of the Apriel model family, designed to generate multiple to- kens per forward pass. …
Raymond Li
,
Nanda Harishankar Krishna
,
Oleksiy Ostapenko
,
Luke Kumar
,
Torsten Scholak
NOW AI, 2025.
Citation
Apriel-SSM: Converting Pre-Trained Transformer LLMs Into Subquadratic Hybrid Models Through Iterative End-to-End Distillation
Large Language Models achieve their success through transformer architectures with attention mechanisms that compute token …
Oleksiy Ostapenko
,
Shambhavi Mishra
,
Luke Kumar
,
Denis Kocetkov
,
Raymond Li
,
Joel Lamy Poirier
,
Sébastien Paquet
,
Torsten Scholak
NOW AI, 2025.
Citation
Multi-task retriever fine-tuning for domain-specific and efficient RAG
Retrieval-Augmented Generation (RAG) has become ubiquitous when deploying Large Language Models (LLMs), as it can address typical …
Patrice Béchard
,
Orlando Marquez
Knowledge Discovery and Data Mining, 2025.
PDF
Citation
Auto-Cypher: Improving LLMs on Cypher generation via LLM-supervised generation-verification framework
Graph databases like Neo4j are gaining popularity for handling complex, interconnected data, over traditional relational databases in …
Aman Tiwari
,
Shiva Krishna Reddy Malay
,
Vikas Yadav
,
Masoud Hashemi
,
Sathwik Tejaswi Madhusudhan
North American Chapter of the Association for Computational Linguistics (NAACL), 2025.
PDF
Citation
Unifying Autoregressive and Diffusion-Based Sequence Generation
We take significant steps toward unifying autoregressive and diffusion-based sequence generation by extending the SEDD discrete …
Nima Fathi
,
Torsten Scholak
,
Pierre-André Noël
Workshop at the International Conference of Learning Representation (ICLR), 2025.
PDF
Citation
XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference
In-context learning (ICL) approaches typically leverage prompting to condition decoder-only language model generation on reference …
João Monteiro
,
Étienne Marcotte
,
Pierre-André Noël
,
Valentina Zantedeschi
,
David Vazquez
,
Nicolas Chapados
,
Christopher Pal
,
Perouz Taslakian
Workshop at the Neural Information Processing Systems (NeurIPS), 2024.
PDF
Citation
Layer-Wise Quantization: A Pragmatic and Effective Method for Quantizing LLMs Beyond Integer Bit-Levels
We present a simple meta quantization approach that quantizes different layers of a large language model (LLM) at different bit levels, …
Razvan-Gabriel Dumitru
,
Vikas Yadav
,
Rishabh Maheshwary
,
Paul-Ioan Clotan
,
Sathwik Tejaswi Madhusudhan
,
Mihai Surdeanu
ArXiv, 2024.
PDF
Citation
Citation
×