Exploring Service Observability
Summarize
Summary of Exploring Service Observability
Service Observability assists operations teams in managing incidents within complex production systems by integrating telemetry from external monitoring systems with data from the Configuration Management Database (CMDB). It consolidates application, infrastructure, and network health metrics in the Service Operations Workspace (SOW), enabling users to assess service health effectively.
Show less
Key Features
- Integration with Observability Vendors: Supports various vendors such as Amazon CloudWatch, Datadog, Dynatrace, and others, allowing for comprehensive service monitoring.
- Data Mapping: Users can map CMDB services to observability metrics using tags, facilitating a unified view of performance data.
- User Roles: System admins manage configurations and connections, while operators monitor service health and investigate incidents.
- Detailed Metrics Access: Operators can delve into specific metrics and related incidents through the Observability tab, enhancing incident triage processes.
- Dashboard Customization: Admins can tailor dashboard templates to display relevant metrics effectively.
Key Outcomes
- Improved Incident Management: Operators can quickly identify and address service issues, reducing the mean time to resolution (MTTR).
- Comprehensive Service Insights: Users gain a full-stack view of service health by consolidating data from multiple monitoring tools.
- Enhanced Analysis: Generative AI tools help operators analyze metrics for deeper insights into service health and incident response.
- Streamlined Workflows: Integrating Service Observability data into incident management workflows enhances the overall user experience and efficiency.
Service Observability helps operations teams triage and manage incidents in a complex and distributed production system. It combines external observability monitoring systems' telemetry with related data from the Configuration Management Database (CMDB) and displays both in a single workflow in the Service Operations Workspace (SOW).
Service Observability overview
Service Observability displays application, infrastructure, and network health metrics in the SOW related to a given service. Metrics can be ingested from an external observability vendor (application, network, and cloud monitors) and displayed alongside information for related configuration items in the CMDB.
Service Observability supports the following observability vendors:
- Amazon CloudWatch
- AppDynamics
- Cisco ThousandEyes synthetic tests
- Datadog
- DynatraceSaaS and on-premise (both Classic and Grail environments)
- Microsoft Azure Monitor
- New Relic
- Prometheus on-premise
- SolarWinds on-premise
- Splunk Observability and logs from Splunk Enterprise
- Zabbix on-premise
- MySQL
- PostgreSQL (not supported with Splunk)
- RDS (Relational Database Service) for Amazon CloudWatch
After connecting an observability vendor to Service Observability, you map services in the CMDB to observability metrics using existing tags.
For example, say you use Dynatrace to monitor your checkout service, databases, and hosts, and that metrics from all these entities use the tag checkout-service to denote requests coming from that service.
By mapping the checkout service CI to the Dynatrace data tagged with checkout-service, Service Observability retrieves metrics for those databases and hosts and CIs related to the service, then displays them together. Operators can pinpoint issues on entities related to the service and narrow down the
mitigation process without having to leave the SOW.
Service Observability users
| User | Description |
|---|---|
| System admin |
Version 1.5 only. System admins configure users and teams, register services to be monitored, connect Service Observability to observability vendors, and then map those services to that data. They can also view the data in the SOW |
| Service Observability admin | Version 1.6.x and later. Service Observability admins can configure users and teams, connect Service Observability to observability vendors, and then map services to that data. They can also view the data in the SOW. Admins can also customize dashboard templates used to display metrics and related information. |
| Operator/operations manager Note: These users must belong to an srm group type to see all data. |
Operators use Service Observability when triaging incidents in the SOW. They can view basic health metrics for a service, along with related incidents, alerts, and changes. They can get more detailed information by navigating to the Observability tab to view additional service metrics, along with metrics from related entities, such hosts, networks, or databases. |
Service Observability workflow
Admins configure Service Observability by creating a connection to an observability vendor and then mapping CI services to that data. Operators use Service Observability to determine if another related entity is causing issues surfaced by the service's performance.
As an admin, you:
- Determine the services to be monitored by Service Observability based on business criticality.
- Connect existing observability vendor instances to Service Observability.
- Map services to observability metric data using vendor-based tags attached to that data.
- Customize the templates used to display metric charts.
As an operator or manager, you:
- Spot an issue with a service while working in the SOW, for example, from an alert, the Service dashboard, or Express List, then navigate to the Service Details page.
- View overall health metrics for the service, along with related incidents, alerts, and changes. If one of the metrics seems unhealthy, navigate to the Observability tab.
- View more detailed service metrics, as well as information from related entities, to start root cause investigation. When finding that the issue is further down the system's stack, identify the ownership for that entity to start remediation.
Service Observability benefits
| Benefit | Feature | Users |
|---|---|---|
Consolidate data from existing monitoring tools, network health tools, cloud providers, ServiceNow agents, and third-party tools for a full-stack view of service health:
|
. | Admins |
| Increase efficiency and reduce mean time to resolution (MTTR). View combined metrics from entities associated with a service to begin to determine blast radius and ownership of an incident. | View service health metrics | Operators |
| See related changes to the system and alerts associated with a service in one place. | View overall service health. | Operators |
| Use generative AI to analyze metric data and find insights to help determine service health. | Operators | |
| See Service Observability data as part of Incident Management workflows | Digital End-User Experience and Service Observability UI experience on investigate tab | Operators |
| Customize dashboard templates. | Customize Service Observability dashboard templates | Admins |