ServiceNow AI Research

Apriel-MTP: Multi-Token Prediction for Faster and More Efficient Language

Abstract

We introduce multi-token prediction (MTP) variants of the Apriel model family, designed to generate multiple to- kens per forward pass. MTP modifies the standard next- token objective by training parallel prediction heads that anticipate future tokens (e.g., 2nd, 3rd, 4th-next), enabling speculative decoding to accept batches of predicted tokens at once. Compared to standard autoregressive decoding, this architecture reduces the number of forward passes re- quired by 2-3× while maintaining output quality, resulting in measurable latency improvements. We demonstrate that unbatched inference sees up to 2.2× speedup using 4 predic- tion heads, with a tokens-per-forward ratio larger than 3 in favorable conditions. These gains come with modest archi- tectural overhead: each additional head requires only one extra transformer layer. We show that a careful fine-tuning strategy allows to realize high acceptance rate without sac- rificing benchmark performance. With this approach, we build MTP variants of Apriel-5B, Apriel-15B-Thinker and Apriel-15B-Thinker-SSM using Fast-LLM, our open train- ing framework optimized for long-context, high-throughput training. This release demonstrates that small, dense mod- els can be extended to operate more efficiently through ob- jective modification, rather than architecture alone.

Publication
NOW AI
Raymond Li
Raymond Li
AI Developer

AI Developer at AI Foundation Model located at Montreal, QC, Canada.

Nanda Harishankar Krishna
Nanda Harishankar Krishna
Visiting Researcher

Visiting Researcher at AI Foundation Model located at Montreal, QC, Canada.

Oleksiy Ostapenko
Oleksiy Ostapenko
Research Scientist

Research Scientist at AI Foundation Model located at Montreal, QC, Canada.

Luke Kumar
Luke Kumar
Applied Research Scientist

Applied Research Scientist at AI Research Deployment​ located at Toronto, ON, Canada.

Torsten Scholak
Torsten Scholak
Research Lead

Research Lead at AI Foundation Model located at Montreal, QC, Canada.