We introduce multi-token prediction (MTP) variants of the Apriel model family, designed to generate multiple to- kens per forward pass. MTP modifies the standard next- token objective by training parallel prediction heads that anticipate future tokens (e.g., 2nd, 3rd, 4th-next), enabling speculative decoding to accept batches of predicted tokens at once. Compared to standard autoregressive decoding, this architecture reduces the number of forward passes re- quired by 2-3× while maintaining output quality, resulting in measurable latency improvements. We demonstrate that unbatched inference sees up to 2.2× speedup using 4 predic- tion heads, with a tokens-per-forward ratio larger than 3 in favorable conditions. These gains come with modest archi- tectural overhead: each additional head requires only one extra transformer layer. We show that a careful fine-tuning strategy allows to realize high acceptance rate without sac- rificing benchmark performance. With this approach, we build MTP variants of Apriel-5B, Apriel-15B-Thinker and Apriel-15B-Thinker-SSM using Fast-LLM, our open train- ing framework optimized for long-context, high-throughput training. This release demonstrates that small, dense mod- els can be extended to operate more efficiently through ob- jective modification, rather than architecture alone.