Timo Weber
ServiceNow Employee

 

AI Agent Masterclass • Session 4

Behind the AI Agent

Prompts, Data & Control

← Back to Masterclass Overview

Date
15 January 2026
Duration
60 minutes
Speakers
Timo Weber, Thomas Geering, Javier Lombana Dominguez
Recording Coming Soon
🎯 Key Takeaways
  • Context Engineering: Providing the right information at the right time – too much context = noise, too little = hallucinations
  • Memory Types: Short-term (conversation), Long-term (preferences), and Episodic (specific events) memory
  • Golden Datasets: Systematic evaluation with representative test cases and expected outcomes
  • Metrics First: Define KPIs BEFORE building, not after launch – both business and quality metrics

From "Prompt Guessing" to "Agent Engineering"

Many teams treat AI agent development like guesswork – trying random prompts, hoping for good results, with no systematic way to measure or improve. This session changes that.

The Problem
  • Trial-and-error prompting
  • No defined metrics
  • Can't prove value
  • Don't know when to stop
The Solution
  • Systematic engineering
  • Defined metrics upfront
  • Structured testing
  • Continuous optimization

Key Insight: "We need to move away from guessing at prompts to systematic engineering of AI agents. Without metrics, you can't prove value or know when to stop."

The AI Agent Factory Framework

Think of agent development as a factory with distinct stations. Each station has specific inputs, processes, and outputs:

🎯
Design
Define purpose & scope
🔧
Build
Implement & configure
🧪
Test
Golden datasets
Optimize
Improve prompts
📊
Monitor
Continuous observation
🚀
Scale
Roll out & expand

Context Engineering

Context Engineering is the art of providing the AI agent with exactly the right information at the right time to make informed decisions.

The Challenge: LLMs have limited context windows. Too much context creates noise and slower responses. Too little context leads to poor decisions and hallucinations. The solution? Dynamically assemble only relevant context.

Context Engineering Best Practices
  • Filter aggressively: Only include information relevant to the current task
  • Prioritize by relevance: Most important context first, within token limits
  • Use structured formats: JSON, tables, or clear sections help the LLM parse
  • Test context combinations: Different contexts produce different results

Memory Types in ServiceNow

ServiceNow provides three types of memory for AI Agents, each serving a different purpose:

💭 Short-Term Memory
What: Current conversation context
Duration: Single session
Example: Chat history within current interaction
🧠 Long-Term Memory
What: Persistent user information
Duration: Across sessions
Example: User preferences, historical patterns
📅 Episodic Memory
What: Specific past events
Duration: Event-based
Example: "Last week user X had issue Y..."

Metrics: Business vs. Quality

Successful AI agents require both business and quality metrics – and they must be defined BEFORE you start building:

💰 Business Metrics
  • ROI / Cost Savings
  • Time Saved
  • User Satisfaction (CSAT)
  • Deflection Rate
  • Tickets Resolved
📊 Quality Metrics
  • Accuracy / Precision
  • Response Time
  • Error Rate
  • Completion Rate
  • Hallucination Rate

Critical Rule: "KPIs must be defined BEFORE building, not after launch. If you don't know what success looks like, you can't measure it."

Evaluation with Golden Datasets

Golden Datasets are curated collections of test cases with known expected outcomes. They're essential for systematic agent evaluation:

The Evaluation Process
Step Action
1 Create Golden Dataset – Representative test cases with expected outcomes
2 Select Agent – Choose which agent configuration to evaluate
3 Define Metrics – Accuracy, Relevance, Helpfulness, Safety
4 Run Evaluation – Execute all test cases systematically
5 Analyze Results – Identify patterns, failures, opportunities
6 Iterate – Improve agent, re-test, compare results

Pro Tip: Your Golden Dataset should include edge cases, not just happy paths. Include examples where you expect the agent to fail gracefully, ask clarifying questions, or escalate to humans.

Knowledge Graphs: Connected Context

Knowledge Graphs represent connected knowledge as nodes and edges, showing relationships between entities. This enables context-aware queries that go beyond simple keyword matching:

[User: Max Müller] --works_in--> [Department: IT]
                   --has_role--> [Role: Developer]
                   --uses--> [System: ServiceNow]
                   --reported--> [Incident: INC0012345]

Benefits for AI Agents:

  • Better understanding of relationships between data
  • More precise, contextually grounded answers
  • Fewer hallucinations due to explicit relationship constraints
🚀 Your Next Steps
  • Define Your Metrics: Before building any agent, define what success looks like
  • Create a Golden Dataset: Start with 20-30 representative test cases for your use case
  • Explore Agentic Evaluations: Check out the resources below to set up systematic testing
  • Complete the Journey: Join us for Session 5 on Data, Scale & Governance

Last updated: January 2026

Version history
Last update:
2 hours ago
Updated by:
Contributors