- Post History
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
06-09-2025 09:45 AM - edited 06-09-2025 04:52 PM
At ServiceNow, building enterprise-grade AI goes beyond choosing powerful models. It requires a structured, responsible, and repeatable process that translates research into solutions, minimizing costly workflow downtime and driving real-world business value.
This four-part series takes you inside our AI Software Development Lifecycle (AI SDLC), highlighting the methods and principles that guide how we evaluate, train, align, validate, and deploy GenAI models across the Now Platform.
You’ll learn how we:
- Scope models based on business priorities and platform requirements
- Tune with targeted data to optimize performance and align behavior with platform standards
- Deploy responsibly at scale—monitoring, measuring, and continuously improving
Let’s begin with Part 1…
Part 1: Pick the Right Model Before You Train: How AI Model Development Begins at ServiceNow
As organizations increasingly rely on AI to drive productivity and innovation, it’s critical that the models powering these systems are built with care, clarity, and purpose. At ServiceNow, the AI model development lifecycle begins long before a model is trained or deployed, it starts with rigorous planning, research, and evaluation.
In this first post of our four-part series, we’ll explore how ServiceNow scopes, vets, and prepares candidate models for development, ensuring that the right choices are made before any code is written.
Laying the Groundwork: Why Planning Matters
Developing an AI model isn’t just about picking a high-performing system from a leaderboard. It’s about aligning technical capabilities with business needs, ensuring legal and ethical compliance, and building a foundation for ongoing development. That’s why the first phase of ServiceNow’s lifecycle is entirely focused on research, vetting, and planning.
Landscape Scanning and Model Discovery
Between major model releases, our teams continuously scan the landscape of academic research, open-source projects, and proprietary systems. We evaluate new models based on several key dimensions:
- Reported performance on standard benchmarks
- Model architecture and size
- Context window capabilities
- Licensing and usage constraints
- Relevance to platform and domain-specific needs
Our research team typically screens dozens of models each quarter, but only a select few advance to feasibility testing. This disciplined scanning process ensures we stay ahead of emerging advancements while focusing our resources on the most promising candidates.
Preliminary Evaluation and Feasibility Testing
Once promising models are identified, they undergo preliminary evaluation using a controlled suite of tests designed to validate claimed capabilities on neutral data the models haven’t encountered before. For example, a model may need to achieve a ROUGE-L score of 80 or higher on our ITSM summarization benchmark to advance.
The focus is not on perfect accuracy, but on understanding the model’s baseline abilities, how well they generalize, and whether they show promise for further tuning. Evaluation results are documented and reviewed to inform next steps.
Initial Legal and Licensing Review
Before any development work begins, models are reviewed to ensure their usage complies with licensing terms and regulatory expectations. This includes:
- Verifying that models are appropriately licensed for enterprise use
- Identifying any restrictions on downstream usage or data handling
- Flagging potential risks related to intellectual property, compliance, or other business factors that could hinder suitability for production use.
This legal check allows development to proceed responsibly, while flagging any cases that may require deeper legal involvement later.
Setting Release Goals with Product Teams
AI development at ServiceNow is goal driven. Technical teams work closely with product management to define what the next model release should achieve, whether that’s unlocking new capabilities, improving quality, or enhancing performance.
For example, in the Yokohama release, the ServiceNow SLM incorporated Text2Flow, which previously ran on a separate model. This consolidation simplified model architecture and reduced complexity in production.
Typical release goals like this often focus on:
- New capabilities (e.g., supporting new languages or use cases)
- Improved quality in specific domains or languages
- Performance and efficiency enhancements
This collaboration ensures tight alignment between engineering, product, and design from the very beginning.
Gathering and Prioritizing Requirements
With high-level goals in place, requirements are gathered from across the organization. These may include:
- Dependency requests from feature teams
- Observed gaps in prior model releases
- Emerging customer needs or feedback
Requirements are then formalized and prioritized based on business value, technical feasibility, and alignment with the strategic direction of the platform. This prioritization shapes what models are selected and how development will be focused.
Selecting Candidate Models
Selecting candidate models for development is a critical phase in the model lifecycle. This step ensures that promising models are identified and thoroughly evaluated before investment. At a minimum, the selection process includes:
- The incumbent model – the current production model, used as a baseline for comparison
- One or more new candidate models – selected based on eligibility and potential for improvement
This multi-candidate approach supports a competitive, data-driven evaluation process that helps identify the most suitable model for further development.
Assessing Base Abilities
To evaluate candidate models, a comprehensive assessment of their base abilities is performed. This assessment uses a mix of public, academic, industry-standard, and internal ServiceNow-specific benchmarks. These benchmarks include evaluation datasets with structured questions and verifiable answers to measure performance across various dimensions.
Categories of Abilities Evaluated:
- Basic Abilities:
- Linguistic understanding
- Reasoning and commonsense
- General knowledge (breadth and depth)
- Domain-specific expertise (e.g., math, scientific reasoning, coding)
- Content moderation, security, and truthfulness
- Advanced Abilities:
- Complex reasoning
- Instruction following
- Conversational fluency
- Multilingual capabilities
- ServiceNow-Specific Criteria:
- Chat summarization
- Case summarization
- KB article generation
Evaluation Approach:
Model outputs are compared against:
- Ground-truth (verifiable) answers
- Judging models and rubrics designed for qualitative and quantitative analysis
Assessment Objectives:
- Detect Regressions
Ensure there is no decline in performance compared to the incumbent model based on industry benchmarks. - Validate Against Internal Requirements
Confirm that ServiceNow-specific needs—especially formatting and domain alignment are met. - Uncover Gaps and Improvement Potential
Highlight discrepancies between current and desired performance, establishing a baseline and identifying limitations for future improvement.
Conclusion: Building with Intention
Before a single line of training code is executed, ServiceNow invests in thorough research, vetting, and goal setting to ensure AI models are built with clarity and purpose. This foundation is what enables us to innovate responsibly and deliver meaningful outcomes to our customers.
In our next post, we’ll walk through how candidate models are tuned, evaluated, and refined through rigorous experimentation.
Coming up next: “Tuning the Core: From Candidate Models to Capable Systems.”
Download our Responsible AI whitepaper to explore our approach in more depth.
- 1,709 Views
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
I would love to see a guide on moving Now Assist configurations from a dev instance to production. Any special considerations, gotchas to be aware of, ect. This would be a great help to many and I have been unable to find a resource. Thanks.