TapeAgents: An innovative framework for agentic workflows

TapeAgents: an AI-generated image of blue tape surrounded by AI agents in a room full of computers Image generated with Microsoft Copilot

The landscape of enterprise automation has evolved from highly scripted, manually programmed systems with low automation to more intelligent solutions.

Initially dominated by manual scripting and robotic process automation (RPA), the field has progressed to include classical AI workflows using machine learning (ML) and vision models, conversational systems enhanced by generative AI, and agentic workflows that promise significant automation potential.

Automation in enterprise workflows - from lower left to upper right: Scripted workflows, RPA workflows, AI workflows, Conversational workflows, Agentic workflows

In the Gartner^® 2025 Top Strategic Technology Trends, Agentic AI sits at the top of the list. According to Gartner, “by 2028, at least 15% of day-to-day work decisions will be made autonomously through Agentic AI, up from 0% in 2024.”¹ We believe this will effectively assist humans in the workplace.

Introducing TapeAgents

ServiceNow Research has made significant strides in the field with the development and optimization of Agentic AI frameworks and systems. We created and open-sourced TapeAgents, a holistic framework designed to support AI developers throughout the entire Agentic AI workflow development lifecycle.

TapeAgents is built around a structured, granular log called a “tape,” which serves as the AI agent's state and facilitates various stages of development.

A primer on AI agents

Large language models (LLMs) enable the creation of agents that can effectively address multiple low-volume tasks, such as scheduling tweets and organizing meetings that are difficult to automate with traditional methods. When taken together, these smaller, unique tasks collectively represent a substantial workload that LLM agents can help streamline.

As illustrated below, a typical LLM agent architecture uses an LLM agent capable of triggering actions, which are then sent to an environment that returns observations. This LLM agent uses short-term memory to track previous actions and responses, enabling it to make informed decisions.

Additionally, it incorporates planning processes, allowing for various strategies, such as ReAct and chain of thought. To complete its tasks, the agent requires a range of tools. These tools include simple calculations, code execution, and web search, among others. More advanced agents may even synthesize new tools, storing them in long-term memory for future use.

LLM-based single agents: Typical architecture

AI agent development and optimization

To develop and debug LLM agents, developers need an effective software framework. TapeAgents, an innovative approach to agent development, simultaneously facilitates the creation, engineering, and debugging of LLM agents while establishing the necessary primitives and concepts for effective prompt optimization and fine-tuning of the underlying LLM.

TapeAgents uses existing ideas from AutoGen and LangGraph agent frameworks, which emphasize multi-agent collaboration, modularity, and persistence.

Effective prompt engineering is essential for optimizing LLM agents. While some optimization can be performed manually, automated machine optimization typically produces superior results.

Inspired by DSPy optimizers, TapeAgents enables data-driven agent optimization using tapes. In certain instances, it may be necessary to optimize the LLM itself to create highly effective agents. TapeAgents also helps generate training data from tapes.

Tape, a unifying concept

The unifying concept that underlies TapeAgents is the “tape,” which serves as a comprehensive log of all thoughts and actions from multiple agents within the system. Each agent records its activities on the tape. Any outcomes from interactions with the environment are also logged.

While it may seem like a simple log, the tape serves as a rich data structure. It contains valuable metadata that allows other agents to use it as input for further processing.

This figure below illustrates the relationship between tape metadata, agent configurations, and agent interactions with the environment. Tape metadata links to elements such as prompt templates, LLM configurations, nodes, and subagents. Notably, the computed views from a tape allow agents to keep their steps private. Agents choose which information to share when calling subagents or when responding to the parent agent.

The ability to treat the tape as a comprehensive data structure, rich in content and usable by downstream algorithms, represents the true power of TapeAgents.

The relationship between tape metadata, agent configurations, and agent interactions within an environment

How TapeAgents works

As illustrated below, Agent B accesses the entire history of the tape and selects which of its internal nodes to execute at any given time. These nodes determine how to prompt the LLM, and developers can choose the number and structure of nodes as they see fit.

The nodes call the LLM, which may suggest thoughts or actions that are added to the tape. An orchestrator then executes any unprocessed actions in the environment and retrieves the responses, continuing the loop.

How TapeAgents works: Agent B accesses the entire history of the tape and selects which of its internal nodes to execute at any given time.

This execution model may seem straightforward, but it has significant versatility: It supports multiple agents, each capable of delegating tasks to subagents (Agent C and Agent D). This design creates a highly modular, adaptable structure that enhances the flexibility of the agentic system.

The "tape as data" concept offers some remarkable advantages. For instance, it enables comprehensive auditing and debugging features, giving full transparency into each agent’s decision-making process. Additionally, it gives us the ability to develop algorithms that use the tape as input, iterating through cycles of prompt optimization to refine results continuously.

Furthermore, this model supports creating specialized agents that distill the outputs from more complex feature agents. By doing so, it enables us to build leaner, cost-effective “student” agents, optimizing the underlying LLM for efficiency.

Benefits of TapeAgents tapes: Audit, debugging, data reuse for evaluation, improved data improves the agent, agent optimization, and fine-tuning

A financial agentic system example

Assuming a simple two-agent system, we created an agent that answers financial inquiries and a subagent that conducts web searches. This straightforward setup, combined with a detailed execution loop, generates comprehensive records of interactions, including internal dialogues and reasoning processes.

Although the resulting tapes can be extensive, they provide valuable insights into agent behavior and environmental responses. You can easily replicate this example using the introductory notebook, making it accessible for anyone. This framework allows agents to process these tapes effectively, facilitating optimization.

Agent reasoning loop: Example

Cost-effective AI agents

Our goal is to create sophisticated agents that deliver high-quality user interactions while remaining cost-effective. In a case study, we developed an AI agent to help users fill out complex forms related to their daily work activities. This allowed us to distill a complex agent into a simpler, more efficient one.

This approach enables the creation of cost-effective conversational agents with desirable attributes, such as being grounded, responsive, accurate, disciplined, transparent, and helpful. Each attribute can be measured independently, letting us collectively optimize for them. See our detailed technical report for further information.

We simulated a company knowledge base with user-agent interactions to train and test our model's effectiveness in out-of-distribution scenarios. Our methodology involved a complex, high-parameter-count agent using Llama 405B models, which was costly to run and generated extensive data recorded as discrete execution traces. We then fine-tuned the Llama 8B model using this data to match the performance of the larger model.

The results are compelling. The graph below illustrates the cost of running 1 million conversation turns (x-axis) against a quality score (y-axis). Unsurprisingly, the largest agents, such as the Llama 405B models and the GPT-4o model, achieved high scores but at a steep cost of $30,000 to $40,000 per million conversation turns.

In contrast, our approach of using recorded tapes for fine-tuning allowed us to achieve similar performance with an 8 billion-parameter model for greater than 300 times less cost—an impressive gain in efficiency.

The cost of running 1 million conversation turns (X axis) against a quality score (Y axis). The largest agents achieved high scores but at a cost of $30K-$40K per million conversational turns. Recorded tapes achieved similar performance at 300 times less cost.

Empowering agentic systems creators

TapeAgents represents a significant advancement in the development and optimization of AI agents. By providing a structured, granular log and supporting various stages of the development lifecycle, TapeAgents empowers AI practitioners to build, debug, and optimize agents effectively. With TapeAgents, developers can create sophisticated agents that deliver high-quality user interactions while remaining cost-effective.

In conclusion, TapeAgents advances AI agent development to a new level, aligning with ServiceNow AI Agents.

Call for collaborators

We welcome researchers and developers to try the TapeAgents tutorial and read the TapeAgents paper. If you’re interested in contributing to the framework, please visit our GitHub page and open an issue.

Find out more about ServiceNow Research.

¹ Gartner ebook, Gartner Top 10 Strategic Technology Trends for 2025, Oct. 21, 2024, https://www.gartner.com/en/articles/top-technology-trends-2025.