AI Agent Masterclass - Session 4

Timo Weber · ‎01-17-2026

AI Agent Masterclass • Session 4

Behind the AI Agent

Prompts, Data & Control

← Back to Masterclass Overview

Date

15 January 2026

Duration

60 minutes

Speakers

Timo Weber, Thomas Geering, Javier Lombana Dominguez

🎬 Watch Recording & Resources

🎯 Key Takeaways

Context Engineering: Providing the right information at the right time – too much context = noise, too little = hallucinations
Memory Types: Short-term (conversation), Long-term (preferences), and Episodic (specific events) memory
Golden Datasets: Systematic evaluation with representative test cases and expected outcomes
Metrics First: Define KPIs BEFORE building, not after launch – both business and quality metrics

From "Prompt Guessing" to "Agent Engineering"

Many teams treat AI agent development like guesswork – trying random prompts, hoping for good results, with no systematic way to measure or improve. This session changes that.

❌ The Problem

Trial-and-error prompting
No defined metrics
Can't prove value
Don't know when to stop

✅ The Solution

Systematic engineering
Defined metrics upfront
Structured testing
Continuous optimization

Key Insight: "We need to move away from guessing at prompts to systematic engineering of AI agents. Without metrics, you can't prove value or know when to stop."

The AI Agent Factory Framework

Think of agent development as a factory with distinct stations. Each station has specific inputs, processes, and outputs:

🎯

Design

Define purpose & scope

🔧

Build

Implement & configure

🧪

Test

Golden datasets

⚡

Optimize

Improve prompts

📊

Monitor

Continuous observation

🚀

Scale

Roll out & expand

Context Engineering

Context Engineering is the art of providing the AI agent with exactly the right information at the right time to make informed decisions.

The Challenge: LLMs have limited context windows. Too much context creates noise and slower responses. Too little context leads to poor decisions and hallucinations. The solution? Dynamically assemble only relevant context.

Context Engineering Best Practices

Filter aggressively: Only include information relevant to the current task
Prioritize by relevance: Most important context first, within token limits
Use structured formats: JSON, tables, or clear sections help the LLM parse
Test context combinations: Different contexts produce different results

Memory Types in ServiceNow

ServiceNow provides three types of memory for AI Agents, each serving a different purpose:

💭 Short-Term Memory

What: Current conversation context

Duration: Single session

Example: Chat history within current interaction

🧠 Long-Term Memory

What: Persistent user information

Duration: Across sessions

Example: User preferences, historical patterns

📅 Episodic Memory

What: Specific past events

Duration: Event-based

Example: "Last week user X had issue Y..."

Metrics: Business vs. Quality

Successful AI agents require both business and quality metrics – and they must be defined BEFORE you start building:

💰 Business Metrics

ROI / Cost Savings
Time Saved
User Satisfaction (CSAT)
Deflection Rate
Tickets Resolved

📊 Quality Metrics

Accuracy / Precision
Response Time
Error Rate
Completion Rate
Hallucination Rate

Critical Rule: "KPIs must be defined BEFORE building, not after launch. If you don't know what success looks like, you can't measure it."

Evaluation with Golden Datasets

Golden Datasets are curated collections of test cases with known expected outcomes. They're essential for systematic agent evaluation:

The Evaluation Process

Step	Action
1	Create Golden Dataset – Representative test cases with expected outcomes
2	Select Agent – Choose which agent configuration to evaluate
3	Define Metrics – Accuracy, Relevance, Helpfulness, Safety
4	Run Evaluation – Execute all test cases systematically
5	Analyze Results – Identify patterns, failures, opportunities
6	Iterate – Improve agent, re-test, compare results

Pro Tip: Your Golden Dataset should include edge cases, not just happy paths. Include examples where you expect the agent to fail gracefully, ask clarifying questions, or escalate to humans.

Knowledge Graphs: Connected Context

Knowledge Graphs represent connected knowledge as nodes and edges, showing relationships between entities. This enables context-aware queries that go beyond simple keyword matching:

[User: Max Müller] --works_in--> [Department: IT]
                   --has_role--> [Role: Developer]
                   --uses--> [System: ServiceNow]
                   --reported--> [Incident: INC0012345]

Benefits for AI Agents:

Better understanding of relationships between data
More precise, contextually grounded answers
Fewer hallucinations due to explicit relationship constraints

🚀 Your Next Steps

Define Your Metrics: Before building any agent, define what success looks like
Create a Golden Dataset: Start with 20-30 representative test cases for your use case
Explore Agentic Evaluations: Check out the resources below to set up systematic testing
Complete the Journey: Join us for Session 5 on Data, Scale & Governance

📚 Resources

← Previous: Session 3 Overview Next: Session 5 - Data & Governance →

Last updated: January 2026