Apriel 5B: Small but mighty enterprise language model

AI-generated image of a green cube with text reading “Apriel 5B” and a rocket ship with the letter A on it This image was generated by AI using text-to-image technology.

Generally, the performance of a large language model (LLM) improves as the number of parameters increases to tens or hundreds of billions. But this enhanced performance comes at the cost of greater compute demands, energy consumption, and financial expense.

In contrast, small language models (SLMs) are rapidly gaining traction as faster, more affordable, more computationally efficient, and easier-to-deploy options. These streamlined, accessible solutions are tailored to real-world applications, such as automating support workflows, powering internal knowledge assistants, drafting personalized emails, and enabling fast, domain-specific chatbots for finance, healthcare, and IT service desks.

Gartner predicts that “by 2027, organizations will implement small, task-specific AI models, with usage volume at least three times more than those of general-purpose large language models.”¹

High-performing SLM Apriel 5B was developed at a fraction of the cost and resource footprint of traditional LLMs and provides a practical balance between general language understanding and efficiency.

What is Apriel 5B?

Apriel 5B, created by ServiceNow Language Models Labs (SLAM Labs), a joint effort by ServiceNow Research and ServiceNow AI, is a family of 4.8-billion-parameter, decoder-only language models designed for efficiency and strong performance. Trained on 4.5 trillion tokens, Apriel 5B supports general-purpose tasks in natural language and code generation.

The SLM offers flexibility through two key versions:

Apriel 5B-Base is a pretrained model that can be fine-tuned for specific such as custom content generation or process automation.
Apriel 5B-Instruct is optimized for tasks that require following instructions and real-time interactions, such as chatbots or customer service automation.

Both versions were released under an MIT license, encouraging developers to adapt, extend, and experiment.

“Training smarter, not bigger, is the future,” says Torsten Scholak, research lead of foundation models at ServiceNow Research. “Apriel 5B proves pretraining efficiency isn’t a constraint. It’s a design principle and the core of our philosophy.”

“Apriel 5B shows how strategic midtraining decisions unlock exceptional performance in small language models,” adds Sathwik Tejaswi Madhusudhan, architect and principal scientist for ServiceNow AI. “Getting the middle right makes all the difference.”

Impressive performance in a small package

Apriel 5B was benchmarked against larger models, including OLMo-2 7B, LLaMA-3.1 8B, and Mistral-Nemo 12B. Despite its smaller size, Apriel 5B delivered impressive performance of around 1,250 tokens per second (see Figure 1).

Despite its smaller size, Apriel 5B delivered impressive performance of around 1,250 tokens per second.

Figure 1. Balancing inference speed and benchmark performance, delivering an impressive throughput of around 1,250 tokens per second:

The green region highlights the ideal trade-off curve, where gains in either speed or benchmark accuracy are meaningful and nontrivial. That’s the Pareto frontier, where models push the boundaries of what’s possible. Apriel 5B lands in that zone, combining strong performance with high throughput.

Minimal infrastructure needed

Apriel 5B demonstrates that high-quality language models can be built with fewer resources, making it easier for teams to build high-performing, cutting-edge models without relying on massive infrastructure.

From a training efficiency standpoint, Apriel 5B is a major leap forward. It slashes graphics processing unit (GPU) hours by 2.3 times and compute by 31% compared to OLMo-2 7B, showing that performance and efficiency truly align . This was made possible by our open-source Fast-LLM training stack.

Apriel 5B is also highly versatile when it comes to deployment. It can run on both powerful data center GPUs and everyday consumer GPUs—and even on some edge devices, such as laptops and high-end smartphones, when optimized.

Outclassing its weight

Apriel 5B–Base performs competitively across a wide range of benchmarks, often matching or surpassing larger models (see Table 1).

Benchmarks of Apriel 5B-Base versus larger models Table 1. Benchmarks of Apriel 5B-Base versus larger models

With an average score of 58.7, Apriel 5B-Base delivers robust general capabilities while remaining lightweight and efficient.
Apriel 5B-Base achieves the highest score on GSM8K (64.2), surpassing all the other models, highlighting its strength in mathematical reasoning.
Apriel 5B-Base beats OLMo-2 7B on MBPP, Global MMLU, and MMMLU, showcasing better coding and multilingual performance.

Apriel 5B-Instruct performs on a variety of tasks designed to test reasoning, coding, factual accuracy, and instruction following (IF Eval; see Table 2).

Benchmarks of Apriel-5B-Instruct versus those of larger models Table 2. Benchmarks of Apriel-5B-Instruct versus those of larger models

Apriel 5B-Instruct shines in instruction-tuned performance. With an average benchmark score of 49.64, it surpasses both OLMo-2 7B-Instruct and Mistral-Nemo 12B-Instruct-2407.

Apriel 5B-Instruct tops the IF Eval benchmark (80.78), demonstrating its strength in precise and reliable instruction-following.
On GSM8K (80.36), Apriel 5B showcases strong step-by-step reasoning, essential for logic and math-intensive tasks. In Math500, it outperforms larger models, proving its exceptional ability in solving complex mathematical problems.

Apriel 5B capabilities

Apriel 5B's key capabilities showcase its efficient performance, multilingual and coding proficiency, and flexible deployment options (see Table 3). These features make it a powerful family of models suitable for a wide range of use cases.

Key capabilities of Apriel 5B Table 3. Key capabilities of Apriel 5B

Special thanks to Lambda

We’d like to extend our gratitude to Lambda for its ongoing support. During the early phases, as we worked through various technical challenges, Lambda’s dependable infrastructure and its responsive and knowledgeable team made all the difference.

The flexible access to compute provided by its GPU Flex Commitments, coupled with the performance of its 1-Click Clusters, allowed us to confidently push forward with experimentation and iteration—without concerns about the right compute access, downtime, or disruptions. We’re grateful for Lambda’s expertise and availability when we needed it most, and we truly value the reliability it's brought to the table .

Getting started with Apriel 5B

SLAM Labs is focused on delivering a steady stream of open-weight small language models, advancing platform innovation, and pushing boundaries of efficiency through new architectures, refined training data, enhanced mid- and post-training techniques, and improved GPU utilization. Apriel 5B is just the beginning.

We’re releasing Apriel 5B under an MIT license to encourage collaboration and innovation. We want to hear from you—whether you’re exploring new applications, tweaking the model, or experimenting with entirely new architectures. The possibilities are endless.

Experience the power of our model firsthand in the Lambda Playground. Ask questions, run tasks, and see how it stacks up.

Explore and download Apriel 5B.

¹ Gartner, Gartner predicts by 2027, organizations will use small, task-specific AI models three times more than general-purpose large language models, April 9, 2025. GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved.