Visualizations in the Service reliability dashboard

  • Release version: Zurich
  • Updated July 31, 2025
  • 3 minutes to read
  • Summarize
    Summarized using AI
    This content was generated using new OpenAI-powered functionality. Results are provided on an as is basis and are not guaranteed to be accurate or complete.

    Summary of Visualizations in the Service Reliability Dashboard

    The Service reliability dashboard in Service Reliability Management (SRM) provides a comprehensive set of visualizations and tools to monitor and analyze the health and reliability of your services based on their Service Level Objectives (SLOs). It helps you track service states, error budget consumption, and risk trends to proactively manage service reliability.

    Show full answer Show less

    Key Visualizations and Their Use

    • Service State Charts: These top-level charts display the number of services categorized as critical, at risk, or stable based on their remaining error budget.
      • Critical: Services with 0% error budget remaining, indicating immediate attention is required.
      • At Risk: Services with ≤ 25% error budget remaining, signaling approaching critical thresholds.
      • Stable: Services with > 25% error budget remaining, reflecting overall healthy service states.

      Each chart includes a trend line showing service count changes over the past 12 months and comparative figures from a week prior to help identify reliability trends.

    • Risk Trends Over Time: Line charts track SLOs with high burn rates (≥ 1) and low error budgets (≤ 25%) across 12 months.
      • High Burn Rate: Indicates services consuming error budget quickly and likely to breach SLOs before the compliance period ends.
      • Low Budget Remaining: Highlights SLOs nearing or breaching their error budgets, useful for spotting declining reliability or recurring issues.

      Hovering over these charts reveals counts and percentages of at-risk SLOs, and users can select charts to drill down into detailed SLO information and adjust time ranges.

    • Service Level Objectives (SLOs) Table: Lists all defined SLOs sorted by default by name. It provides critical details such as:
      • SLO name and direct access to its record.
      • Current reliability state (stable, at risk, critical).
      • Measured reliability percentage versus target objective.
      • Burn rate and percentage of error budget remaining.
      • Associated service and assigned team, with links to their records.

      The table is customizable to show relevant columns for your monitoring needs.

    Dashboard Customization and Management

    The dashboard leverages Platform Analytics features, enabling customization, duplication, and sharing. Because changes affect all SRM users in your instance, it’s recommended to create personalized dashboards by duplicating the existing one or building new dashboards to tailor views for specific teams or purposes.

    This ensures you can adapt the dashboard to your organization's monitoring workflows without impacting others.

    List of visualizations and options on the Service reliability dashboard in Service Reliability Management (SRM).

    Service state charts

    Top-level charts show the number of services in critical, at-risk, and stable states. Their states are based on the error budget remaining on their service level objectives (SLOs). You can select the charts to view service names, adjust the time range, and access additional chart options.

    Note:
    An error budget is the amount of failure a service can experience before breaching its SLO.
    Each visualization also includes a trend line showing changes in service count over the past 12 months. Smaller figures indicate how the count has changed compared to a week ago, for example, ↓25 (22%) since Jun 11.
    Chart What it is How to use it
    Critical Displays the number of services in a critical state. Critical services have 0% error budget remaining on their SLOs. View how many services have consumed their error budgets and identify the services needing immediate attention.
    At risk Displays the number of services at risk. At-risk services have <= 25% error budget remaining on their SLOs. Monitor how many services are approaching critical thresholds and find issues early.
    Stable Displays the number of stable services. Stable services have more than 25% error budget remaining on their SLOs. Get insights into overall service health and identify if services are staying reliable over time.

    Risk trends over time

    Line charts track the number of SLOs with high burn rates and low error budget remaining over the past 12 months. You can use them to find recurring patterns and potential reliability risks.
    Chart What it is How to use it
    High burn rate (>=1) Shows the number of SLOs with a burn rate >= 1 over time. A high burn rate indicates that the service linked to the SLO is likely to breach its error budget before the compliance period ends.

    For example, if a service has 30 days to meet its SLO but is using up its error budget in 15 days, the burn rate is 2.

    • Find risks early by seeing when services begin consuming error budgets too quickly.
    • Identify emerging or recurring reliability issues by tracking burn rates over time.
    • Point to the chart to see the number and percentage of SLOs with a high burn rate at that time.
    • Select the chart to view SLO details, adjust the time range, and access additional chart options.
    Low budget remaining (<=25%) Shows the number of SLOs with low or no error budget remaining over time.
    • Monitor how many services are nearing or have breached their SLOs.
    • Track rising trends, which might indicate declining reliability or recurring issues that need investigation.
    • Point to the chart to see the number and percentage of SLOs with little or no error budget remaining at that time.
    • Select the chart to view SLO details, adjust the time range, and access additional chart options.

    Service level objectives (SLOs) table

    The SLOs table lists the SLOs defined in Service Reliability Management (SRM), and it’s sorted by SLO name by default. Use the table to monitor overall reliability, identify services at risk, and find the assigned teams.

    The SLO table includes the following columns by default. To customize the columns shown, select the gear icon.
    • Name - Name of the SLO. You can select the arrow to sort the table by SLO name, and you can select the name to view the SLO record.
    • Reliability - Current state of the SLO. For example, stable, at risk, or critical.
    • Measured reliability - Percentage showing the actual performance of the service. For example, if your SLO is 99.9% success, and the actual performance for the month is 99.7%, the measured reliability is 99.7%.
    • Objective (percentage) - Target SLO value.
    • Burn rate - Numeric value showing how quickly the service is consuming its error budget.
    • % Error budget remaining - Percentage of the error budget still available in the current compliance period.
    • Service - Name of the service associated with the SLO. You can select the service name to view the service record.
    • Assigned - Team responsible for the service.

    Dashboard filters and actions

    The Service reliability dashboard is built with Platform Analytics and includes standard dashboard features. For details on customizing, duplicating, or sharing dashboards, see the Dashboards in Platform Analytics documentation.
    Note:
    Changes to the Service reliability dashboard affect all SRM users on your instance. To create a personalized version, either build a new dashboard or duplicate the existing dashboard and edit it. Learn more in Create a dashboard with the in-line editor and Duplicate a Platform Analytics dashboard.