The Apriel model family: How we built frontier reasoning at 15B parameters

A series of hexagons representing the Apriel model family Eight months ago, we set out to answer a question that many considered settled: Does frontier-level reasoning require frontier-level compute?

The consensus said yes. The best reasoning models demanded 30 billion to 180 billion parameters, multi-GPU clusters, and infrastructure budgets that put advanced AI out of reach for most organizations. We believed there was another path. Today, we're sharing what we learned through building the Apriel model family.

Between January and December 2025, our ServiceNow Language Models Labs (SLAM Labs) team released six models and two training frameworks that challenged conventional assumptions about scale and capability. The latest, Apriel-1.6-15B-Thinker, achieved a score of 57 on the Artificial Analysis Intelligence Index—outperforming Gemini 2.5 Flash, Claude Haiku 4.5, and gpt-oss-20B while matching Qwen3 235B A22B at a fraction of the parameters (see Figure 1).

This wasn't luck or benchmark gaming. It was the result of three technical bets that paid off, made possible through our commitment to open science.

Artificial Analysis Intelligence Index

Figure 1: Artificial Analysis Intelligence Index

3 technical bets

Bet 1: Midtraining beats brute-force scaling. Rather than training larger models from scratch, we discovered that strategic continual pretraining on carefully curated reasoning data could substitute for raw parameter count.

Our Apriel-1.5 paper demonstrated that midtraining curriculum design alone, without reinforcement learning or preference optimization, could achieve frontier performance.

Bet 2: Hybrid architectures break the attention bottleneck. Transformer attention scales quadratically with sequence length, creating an inference cost ceiling that limits practical deployment.

Our Apriel-H1 work replaced up to 40 of 48 attention layers with Mamba blocks, achieving throughput improvements in the range of 2.1 to 3.4 times while preserving reasoning quality. We documented the key insight in our Hugging Face technical blog: Successful distillation requires high-quality reasoning traces from the teacher's supervised fine-tuning (SFT) dataset, not pretraining data. Using pretraining data failed dramatically.

Bet 3: Open science accelerates everything. We released all four models under MIT license with complete training recipes. The academic community validated our methodology through peer review; the practitioner community stress-tested it through deployment. This dual feedback loop compressed what typically takes years into months.

As research lead Torsten Scholak put it when we launched Apriel 5B: "Training smarter, not bigger, is the future. Apriel 5B proves pretraining efficiency isn't a constraint. It's a design principle and the core of our philosophy." a constraint. It's a design principle and the core of our philosophy."

From lab to production

Technical breakthroughs mean nothing if they don't translate to impact. That's why every Apriel model was designed for production deployment from day 1.

Apriel Nemotron 15B is in production, powering ServiceNow's Now LLM services. Apriel 2.0, with enhanced multimodal reasoning, is expected to be in production by Q1 2026. It targets regulated industries such as financial services, healthcare, and telecom, where audit trails and compliance are nonnegotiable.

The design philosophy emphasizes deployability: Apriel-1.5-15B-Thinker runs on a single GPU while consuming 40% fewer tokens than comparable models such as QWQ-32B. Apriel-1.6 pushed this further, reducing reasoning token usage by more than 30% compared to its predecessor.

This efficiency isn't academic. It directly reduces inference costs and makes advanced reasoning viable for production workloads that would otherwise be cost-prohibitive.

Enterprise integration runs through AI Agent Studio and Workflow Data Fabric, which connects more than 100 enterprise data sources. Models pre-optimized for IT, HR, and customer service workflows mean partners skip months of domain adaptation.

If your models think slowly, your roadmap does too. -Torsten Scholak, AI Research Lead, ServiceNow

In partnership with NVIDIA

Our collaboration with NVIDIA exemplifies what a strategic partnership can help achieve. For Apriel Nemotron 15B, we combined NVIDIA DGX Cloud infrastructure and high-quality datasets from NVIDIA Nemotron with our domain expertise to build a four-stage reasoning pipeline: to build a four-stage reasoning pipeline:

  1. Model upscaling: Expanded Mistral-Nemo-Base from 12 billion to 15 billion parameters via depth upscaling
  2. Continual pretraining: 68 billion tokens weighted toward reasoning (60%) and chain of thought (25%)
  3. Supervised fine-tuning: Specialized models for function calling, retrieval-augmented generation (RAG), and mathematics, then merged
  4. Reinforcement learning: Group Relative Policy Optimization (GRPO) with rule-based rewards across output format, math, coding, and agentic scenarios

NVIDIA President and CEO Jensen Huang joined ServiceNow Chairman and CEO Bill McDermott at Knowledge 2025 to announce the model. NVIDIA's blog called it a "15B-parameter super genius" that punches above its weight class. The collaboration continued with Apriel-1.6, which was trained on NVIDIA DGX Cloud with GB200 Grace Blackwell Superchips.

Our training stack: Fast-LLM and PipelineRL

Every efficiency claim we made was enabled by training infrastructure we built and open-sourced ourselves.

Fast-LLM (Apache 2.0) is our pretraining and midtraining framework. The key innovation is treating attention and Mamba as interchangeable "mixers," enabling rapid architectural experimentation that made Apriel-H1 possible. On Mistral-7B training across 32 NVIDIA H100 GPUs, Fast-LLM performed 10,350 tokens per second per GPU, representing a 20% efficiency improvement.

As Nicolas Chapados, former vice president of research at ServiceNow, noted, that translates to "huge savings in terms of dollars, time, and CO2 footprint" when training runs cost millions.

PipelineRL is our reinforcement learning (RL) framework for efficient policy optimization. It powers the GRPO training stage that gives Apriel models their reasoning capabilities and token efficiency.

Owning our training stack end to end, from pretraining through RL, means we can iterate faster and validate architectural hypotheses without external dependencies. It also means the research community can reproduce our results completely.

2025: A year of releases January: Fast-LLM; March: Apriel 5B; May: Apriel Nemotron 15B; May: PipelineRL; September: Apriel-1.5-15B-Thinker; November: Apriel-H1; December: Apriel-1.6-15B-Thinker

Lessons learned

We noted some highlights behind the project’s success:

Despite those positives, we learned some important lessons along the way:

Perhaps the most important insight came from Scholak's strategic analysis: Efficiency isn't just an operational concern; it's a capability lever.

Consider that two to three times faster inference means two to three times more RL rollouts during training, which equates to better models: chain-of-thought by default rather than as an expensive special mode, and AI agents maintaining 50 tool calls at full context instead of compressing after 10.

"If your models think slowly, your roadmap does too," Scholak noted.

Why open science?

We open-source our best work because credibility in AI research is earned through reproducibility, not press releases. When we release weights, training recipes, and frameworks under permissive licenses, we're making a verifiable claim: This works, and you can prove it yourself.

Apriel isn't just open weights; it's open methodology. Our arXiv papers document what worked and what didn't. Fast-LLM and PipelineRL let anyone reproduce our training runs. This transparency serves enterprise customers who need to audit production systems, researchers who need to validate claims, and practitioners who need to adapt models to their domains.

Now, with Apriel-1.5 and Apriel-1.6 earning recognition in independent benchmarks such as the Artificial Analysis Intelligence Index—where a 15B model sits alongside offerings 10 times its size—we're seeing that transparency convert to trust.

Finding the right balance between openness and model performance (see Figure 2) largely depends on which components developers are willing to share.

In today's AI economy, the rise of sovereign AI—where nations build their own AI capabilities, including infrastructure, models, and data, rather than relying on foreign systems—has elevated certain ingredients over others. Data sovereignty has become a baseline requirement, while data mix remains closely guarded in high-performing commercial models.

Artificial Analysis Openness Index vs. Artificial Analysis Intelligence Index Figure 2: Artificial Analysis Openness Index vs. Artificial Analysis Intelligence Index

“We don’t have frontier-lab compute budgets, but we’ve been betting that you can build competitive models and contribute back to the ecosystem,” Scholak says. “This chart suggests the bet is paying off.”

What's next

We announced Apriel 2.0 at the NVIDIA GTC AI Conference with enhanced reasoning and native multimodal input support, targeting Q1 2026 availability. We predict that hybrid architectures combining efficient and full attention will become the industry standard.

We'll continue publishing methodology, contributing to Fast-LLM and PipelineRL, and engaging the research community. The Apriel family proves that open science and commercial impact are complementary rather than competing objectives. We intend to keep demonstrating that principle.

We don’t have frontier-lab compute budgets, but we’ve been betting that you can build competitive models and contribute back to the ecosystem. This chart suggests the bet is paying off. -Torsten Scholak, AI Research Lead, ServiceNow

The team

This work was made possible by the collective efforts of our team:

Technical leadership: Sathwik Tejaswi Madhusudhan and Torsten Scholak

Leadership and management: Nicholas Chapados, Sagar Davasam, Sebastien Paquet, Srinivas Sunkara, and Valérie Bécaert

Training systems and infrastructure: Alexandre Piché, Denis Kocetkov, Dzmitry Bahdanau, Ehsan Kamalloo, Joel Lamy-Poirier, Luke Kumar, Oleksiy Ostapenko, Rafael Pardinas, Raymond Li, Soham Parikh, and Xiaoyin Chen

Model training: Akintunde Oladipo, Akshay Kalkunte, Aman Tiwari, Jash Mehta, Kelechi Ogueji, Masoud Hashemi, Pulkit Pattnaik, Rishabh Maheshwary, Saloni Mittal, Shiva Krishna Reddy Malay, Shruthan Radhakrishna, and Toby Liang

Architecture and optimization: Luke Kumar, Oleksiy Ostapenko, Raymond Li, Shruthan Radhakrishna, and Soham Parikh

Post-training: Anil Turkkan, Gopal Sarda, and Quaizar Vohra

Evaluation and benchmarking: Aanjaneya Shukla, Anil Madamala, Denis Akhiyarov, Dheeraj Vattikonda, Dhruv Jhamb, Hari Subramani, Jishnu S Nair, Kavya Sriram, Massimo Caccia, Nicolas Gontier, Oluwanifemi Bamgbose, Patrice Bechard, Shashank Maiya, Tara Bogavelli, Tayfun Tuna, and Varun Pandey

Data infrastructure: Segan Subramanian and Vipul Mittal

Additional contributors: Ahmed Masry, Anirudh Sreeram, Khyati Mahajan, Sai Rajeswar Mudumba, Shambhavi Mishra, and Vikas Yadav

Explore the models: Apriel Collection on Hugging Face

Read the papers:

Try the frameworks:

Find out more and explore careers at ServiceNow AI Research.