AI Desktop Action - Screens, Anchors, and Steps: The Building Blocks of Desktop Action (UI blocks)

Divya78

Overview

When the Agentic Desktop Recorder captures your desktop interactions, it organises everything into three building blocks: screens, anchors, and steps. Understanding how these three relate to each other is the foundation for building automations that run reliably every time.

The Hierarchy: Screen › Anchor › Step

Think of it as a strict chain of ownership. A screen is the parent. Each screen contains one or more anchors. Each anchor owns the steps that execute relative to it. Actions are carried out step by step within defined anchors under a single application screen. Each anchor must complete successfully before the workflow moves to the next. If any anchor fails, the entire workflow stops. This is intentional: it ensures dependent execution stays predictable and errors surface immediately rather than cascading silently.

Definitions

SCREEN

Application state snapshot

A screenshot of the application window captured at a specific point during recording. Each screen is the parent container for the anchors and steps that belong to that application. Names must be alphanumeric, with no spaces or special characters, and unique within the desktop action.

ANCHOR

Visual landmark for Computer Vision

A reference image of a stable UI element that the Computer Vision engine uses to locate the correct position on screen at execution time. It acts as a spatial landmark: the engine finds the anchor first, then calculates where to perform the step relative to it. During recording, the system simply captures your mouse clicks and keystrokes in the background.

Anchors are only created afterwards, when the system processes the recording and generates the workflow. Anchor positioning is a background step that happens automatically once recording is finished however it can be fine-tuned in the Design workspace.

STEP

Single recorded interaction

One captured action the automation will replay at execution time: a click, keystroke, selection, or data entry. Steps live under their parent anchor and together form the complete playbook the AI agent follows during execution. Names must be alphanumeric and unique within the parent anchor

Anchor reviewing Principles post recording for further refinement

During recording, the system identifies UI elements using the Windows Accessibility API (UI Automation).

It reads the element name, control type, and bounds directly from the application. These principles apply post-recording in the Design workspace, as anchors are automatically generated based on user interactions and can only be reviewed and refined. Choosing the right anchor is what separates a brittle automation from a resilient one.

1. Choose Anchors That Stay Stable

Avoid anything that changes between runs

An anchor is only useful if it looks the same every time the automation runs. Dynamic content like timestamps, notification badges, counters, or data grid values will look different on the next execution and cause matching failures. Prefer static, structural elements: toolbar icons, application logos, static labels, and menu bar items.

✓ Good Anchors

• Application icons and logos

• Toolbar and menu bar buttons

• Static form labels

• Fixed UI section headers

✗ Avoid These

• Timestamps and date fields

• Notification badge counts

• Animated or loading elements

• Data grid cell values

2. Keep the Anchor Close to the Target

Consistent relative positioning = reliable offsets

CV automation calculates where to click based on the distance between the anchor and the action target. If that distance can shift because of collapsible panels, responsive layouts, or dynamic content, the automation will land in the wrong place. Place your anchor as close to the target element as possible, and avoid anchoring to elements that may move relative to the target.

Why it matters: If the relative position shifts due to layout changes or dynamic panels, the automation may perform actions at incorrect coordinates

3.Plan Alternate Anchors for UI Variations

Build resilience into your automation from the start

Application updates, theme changes, and resolution differences can shift the appearance of UI elements. Where possible, define alternate anchor so your automation has options if the primary anchor fails. This is especially important for desktop actions that need to run across multiple machines or environments.

Before You Publish - Readiness Checklist

Before publishing any desktop action, verify the following:

Anchor contains distinct, recognizable visual features
Anchor represents a stable UI element that won’t change between runs
Distance between anchor and step target is consistent
Anchor is unlikely to change after application or theme updates
The red overlay appears only during mouse interactions, it does not appear for keyboard steps (SendKeys). Also, it shows the accessibility element detected at the time of click and it is a real-time indicator during recording however not a confirmation that the anchor was generated.
Target window was in focus before each captured action

A few extra seconds of care during anchor review pays off in automations that run reliably across machines, and AI agent sessions without needing constant rework.

For more details on screen, anchor, and step properties, refer to the Product Documentation.

AI Desktop Action - Screens, Anchors, and Steps: The Building Blocks of Desktop Action (UI blocks)

Definitions

Anchor reviewing Principles post recording for further refinement

1. Choose Anchors That Stay Stable

2. Keep the Anchor Close to the Target

3.Plan Alternate Anchors for UI Variations

Build resilience into your automation from the start

🚀 The Otto Adoption Bridge: A Draggable "Switch to Employee Center" Widget That Accelerates AI ROI

Agentic Desktop - Bringing AI to Legacy Systems

When AI Takes the Wheel: Probabilistic Desktop Actions Explained