Get started with Service Reliability Management
Summarize
Summary of Get started with Service Reliability Management
Service Reliability Management (SRM) in the Zurich release enables IT Operations and DevOps teams to monitor service health through service level objectives (SLOs) and streamline incident resolution. SRM helps teams ensure agility, performance, and uptime by providing a comprehensive interface to manage services, teams, and alerts in a single platform.
Show less
Key Features
- Home Page: Centralized view highlighting critical alerts, incidents, and assigned work for quick prioritization.
- Team and Service Setup: Define and add teams responsible for specific services, including creating on-call schedules and escalation policies to ensure timely incident response.
- Service Management: Configure service instances and technology management services to define operational parameters and behaviors.
- Service Relationships: Visualize dependencies using a map canvas to understand how child and parent services impact each other.
- Third-Party Integrations: Integrate with external monitoring tools like Datadog or ServiceNow Cloud Observability to centralize alert management within SRM.
- Service Level Objectives: Establish SLOs, Service Level Indicators (SLIs), and error budgets to set performance goals and allowable failure thresholds.
- Alert Automations: (Requires Alert Automations application) Define alert conditions and automate notifications from Application Performance Monitoring (APM) tools to improve response times.
- Role-Based Interface: The SRM interface adapts based on user roles and permissions, ensuring access to relevant features.
Practical Use for ServiceNow Customers
By adopting SRM, ServiceNow customers can centralize service reliability efforts, improve collaboration between operations and development teams, and proactively manage incidents aligned with business-level objectives. Setting up teams, services, and integrations allows for comprehensive monitoring and faster issue resolution. The ability to define on-call rotations and escalation policies ensures accountability and responsiveness. Using the service map helps understand service dependencies, which is critical for impact analysis and root cause identification.
Establishing SLOs and error budgets enables customers to measure and enforce service quality commitments, helping to prioritize reliability improvements. Integrating alerts from third-party tools into SRM consolidates monitoring, reducing noise and enhancing situational awareness.
Overall, SRM equips your organization with the tools to maintain high service availability and performance, aligned with defined business objectives, resulting in improved customer satisfaction and operational efficiency.
Service Reliability Management (SRM) accelerates your path to viewing service health in the context of service level objectives and incident resolution. Helps IT Operations and DevOps teams deliver on the promise of agility, performance, and uptime.
Get started with SRM to understand the different sections in the SRM interface.
For more information on roles, see SRM roles and responsibilities.
Basic SRM tasks
| Step | Description | Reference |
|---|---|---|
| Set up teams and services | Setup guides for the Home, Services, and Teams pages show how to add a team or service to SRM. | Add an SRM team |
| Visit and learn about your Home page | The Home page is where you find the things most important to you. For example, the services with critical alerts and incidents and work assigned to you and your team. | SRM Home page |
| Learn how to navigate SRM | Get familiar with the different sections and elements of the SRM interface. The sections and elements are used throughout the documentation. | SRM interface |
SRM helps you when you must create and administer teams, services, and integrations.
| Step | Description | See this |
|---|---|---|
|
Manage a service instance or technology management service in SRM |
Define the basic tasks and parameters that make up your service and how it should behave. | Add a service to SRM |
|
Set up an SRM team |
Set up a team. Teams are responsible for the issues that occur in their associated services. | Add an SRM team |
|
Set up on-call schedules and escalation policies |
Create an on-call schedule for your team to make sure they're available to resolve issues. You can set up an escalation policy to make sure at least one team member is engaged in incident response. | Create an SRM on-call schedule |
|
Configure service relationships |
Use a map canvas to add, configure, and arrange services. You can add child services that depend on parent services. | View impact of child service on parent service |
|
Integrate services with third-party monitoring tools |
Set up a third-party integration, such as Datadog or ServiceNow Cloud Observability, with SRM so that alerts are available to your teams within SRM. | Working with integrations in SRM |
|
Establish SLOs, SLIs & error budgets for services |
Establish goals for how well your service should operate. Also, specify the maximum amount of time that a technical system can fail without contractual consequences. | Service Level Objective Management |
|
Set up alert automations
Note: This functionality is only available if you have installed the Alert automations application. |
Alert automations enable you to define alert conditions. Set up alert rules for each APM tool to define the conditions when the APM tool should send notifications to SRM. | The Alert automation application is available from the ServiceNow Store. |