ServiceNow AI Research

Breaking the Bottleneck with DiffuApriel: High-Throughput Diffusion LMs with Mamba Backbone

Abstract

Diffusion-based language models have recently emerged as a promising alternative to autoregressive generation, yet their reliance on Transformer backbones limits inference efficiency due to quadratic attention and KV-cache overhead. In this work, we introduce DiffuApriel, a masked diffusion language model built on a bidirectional Mamba backbone that combines the diffusion objective with lineartime sequence modeling. DiffuApriel matches the performance of Transformerbased diffusion models while achieving up to 4.4× higher inference throughput for long sequences with a 1.3B model. We further propose DiffuApriel-H, a hybrid variant that interleaves attention and mamba layers, offering up to 2.6×throughput improvement with balanced global and local context modeling. Our results demonstrate that bidirectional state-space architectures serve as strong denoisers in masked diffusion LMs, providing a practical and scalable foundation for faster, memory-efficient text generation.

Publication
arXiv
Vaibhav Singh
Vaibhav Singh
Visiting Researcher

Visiting Researcher at AI Foundation Model located at Montreal, QC, Canada.

Oleksiy Ostapenko
Oleksiy Ostapenko
Research Scientist

Research Scientist at AI Foundation Model located at Montreal, QC, Canada.

Pierre-André Noël
Pierre-André Noël
Research Scientist

Research Scientist at Frontier AI Research located at Montreal, QC, Canada.

Torsten Scholak
Torsten Scholak
Research Lead

Research Lead at AI Foundation Model located at Montreal, QC, Canada.