The 2 AM Question: Australia innovations in Service Observability, SRM, and Synthetic Monitoring

Michael Hansen

It's 2 AM and something is wrong. You know this because an alert fired, your phone woke you up, and now you're staring at a screen trying to figure out what "wrong" actually means.

The first few minutes follow a familiar pattern. You open the service observability dashboard. You scan the application metrics. Latency is elevated but not catastrophic, error rates are creeping. You check the infrastructure. Still not definitive. Then someone on the call asks the question that derails the next forty minutes: "Is this a network problem?"

Application team says the app looks fine. Network team says the network looks fine. Meanwhile the incident ages, the error budget drains, and the SRE on call is manually rolling through five separate dashboards trying to assemble a picture that no single tool is giving them.

That's the moment this release is built around.

One View, Across Everything

The most consistent ask from customers heading into this quarter was broader coverage: business context and network visibility alongside the application performance data that Service Observability already surfaces.

Splunk log ingestion now brings business metrics directly into Service Observability dashboards. When an incident fires, teams can correlate application performance with customer experience indicators and KPIs without pivoting to a separate tool. On the network side, integrations with SolarWinds NPM, Zabbix, and Cisco ThousandEyes add firewall, router, and switch telemetry right alongside existing infrastructure signals. The network question gets answered in the same workspace, in the same investigation, without a handoff.

Teams can rule network in or out in minutes rather than hours, and escalate to network operations only when there's actual reason to.

The 2 AM Question, Answered Automatically

Even with unified data, someone still has to read it. At 2 AM that someone is usually exhausted, context-switching between charts, making judgment calls on incomplete information.

Gen AI Health Analysis changes that. Now Assist reads across the observability dashboards automatically, identifies the anomalies that actually require attention, and surfaces them with suggested next steps. CPU at 82, network at 68, four anomalies flagged across the overview and host dashboards. The analysis takes roughly 30 seconds, and the insight connects directly to platform workflows so an observation can immediately become an incident or alert.

This isn't a reporting feature. It's the difference between a specialist spending forty minutes assembling a picture and having that picture ready when they open the service.

Honoring What Customers Already Have

Dynatrace has been going through a significant platform shift from Classic to Grail, and many enterprise customers are still running Dynatrace Managed on-premises. Previously, which deployment model a customer was on determined what ServiceNow could see. This release closes all three gaps simultaneously.

Classic, Grail, and on-premises Dynatrace Managed are all now supported within Service Observability. No migration required, no workarounds. The screenshot tells the story: a unified host view pulling from both Grail and Classic in the same widget. Whether a customer migrated to Grail last year or is still running managed on-prem, they get the same experience.

After the Incident: Closing the Gap

Here's something that happens after almost every resolved incident: the team fixes the problem, closes the ticket, and moves on without adding any monitoring to detect it next time. Not because anyone decided against it. Just because adding synthetic coverage is manual, it's outside the incident flow, and there's always something more urgent.

Synthetic Monitoring's post-incident monitor creation is designed to close that gap. When an incident references a service with HTTP endpoint relationships, the system suggests appropriate synthetic monitors, pre-selects the relevant endpoints, and maps them back to the service automatically. The action is embedded directly in the incident's recommended actions, with alerts surfacing in Express List and the Service Operations Workspace where operators are already working.

For teams managing hundreds of endpoints, the new external API support extends the same capability to enterprise scale: programmatic creation, tagging, updating, and retiring of synthetic monitors, integrated with CI/CD and service registration workflows so coverage stays synchronized as services evolve.

The Error Budget You Didn't Have to Configure

SLOs tell you whether a service is performing within its reliability commitments, and error budgets track how much of that tolerance has been consumed. They're foundational to SRM. They're also notoriously painful to set up. Every service needs objectives, indicators, and thresholds configured, and for teams managing dozens or hundreds of services, that setup is usually the reason SRM adoption stalls.

The Autonomous SLO Creator Agent eliminates that barrier. It analyzes observed service behavior including alert history, incident patterns, severity distribution, and business criticality, then generates a best-practice SLO configuration without requiring manual input. With SRM enabled, it runs on a default two-week cycle, continuously generating and maintaining SLO coverage across the service catalog.

The practical effect is that every service gets an error budget. Not just the ones someone had time to configure.

Back to 2 AM

The 2 AM question doesn't go away because you have better tools. But it gets answered faster. And with synthetic coverage that closes the gaps incidents reveal and SLOs that reflect how your services actually behave, fewer incidents make it to 2 AM in the first place.

Broader visibility into what's happening. AI that reads the picture so you don't have to start from scratch. Monitoring that keeps up with a changing service catalog. Reliability objectives that cover the whole portfolio.

The goal isn't a perfect incident. It's a shorter one, and eventually, one that never starts.

The 2 AM Question: Australia innovations in Service Observability, SRM, and Synthetic Monitoring

One View, Across Everything

The 2 AM Question, Answered Automatically

Honoring What Customers Already Have

After the Incident: Closing the Gap

The Error Budget You Didn't Have to Configure

Back to 2 AM

What's new in ITOM for Australia: fewer operational blind spots, more operational AI

AWS Discovery on ServiceNow: Pattern-Based vs. Service Graph Connector - Which Should You Choose?

Why it's one . . . two . . . three strikes, you score! in the Australia AIOps Release