darius_koohmare
ServiceNow Employee
ServiceNow Employee

Site reliability engineering practices are proliferating in most organizations as the delivery of application services to employees, customers, and partners has grown as part of digital transformation initiatives both in and outside of IT. If you're newer to concepts of site reliability engineering, you can read more about how SRE as a practice can help your organization with incident response here. At the same time, the need to support and empower teams to self-implement has grown due to the pace of digital transformation and needs of lines of business are outpacing the capacity of central IT departments.

 

With the introduction of Service Reliability Management, generally available August 2024 (Washington), we are aiming to help organizations perform incident response utilizing team based SRE to improve digital application/technical service reliability, availability, and health.

 

When discussing SRE driven incident response, some of the pain points we've heard from customers included:

  • SRE/DevOps/IT Ops/Networking teams are bottlenecked by a central sys admin/IT team that is governing the configurations of the systems (like ServiceNow) that they rely on. These teams want the autonomy and independence to configure the systems with business logic and data specific to their teams, while central teams still want some level of governance. 

mode1.JPG

  • Because teams want to self serve, the product UI and interface needs to be easier to use and more guided. The myriad of modules, lists, and forms found in event management, setting up alert connectors, or getting an on-call schedule working was too difficult to use in the current state.
  • Core SRE capabilities like Service Level Objects, Service Level Indicators, and Error Budgets were absent from ServiceNow's CMDB and data models. Additionally, the connection between the ITOM alert and health features and the ITSM on-call and major incident response features was not deeply integrated.

Screenshot 2024-01-09 at 3.17.20 PM.png

For most of these organizations deploying digital services, there is a common "Plan, Build, Run" set of activities that are followed. Specific to SRE, there is a focus on identifying the performance goals of the service (SLO definition), implementing monitoring, logging, and tracing for the service, and finally ensuring that there are resources available on-call incase an issue is detected with the service in production. 

 

SOWblog4.JPG

srevision.JPG

 

While ServiceNow has many features aligned to modern SRE practices, the SRM app aims to help teams with the "run" portion of their services, providing the on-call alert and incident response alongside the business visibility of SLO performance. For ServiceNow customers with teams modernizing into, or already practicing Site Reliability Engineering, the SRM app provides a self serve, guided experience for teams to autonomously manage the health of their technical services using SRE.  The core capabilities the app will provide, within the Service Operations workspace, includes:

 

darius_koohmare_0-1712080133658.png

 

 

Watch a demo of the Washington GA features of Service Reliability Management here:

 

The planned lifecycle of the features from setup and ingestion of alert, to remediation can be found below. You can click through a demo of this lifecycle here (use the right arrow key).

Screenshot 2024-05-10 at 11.23.28 AM.png

 

While the above image represents a comprehensive view of what we have planned for the SRM application, we will deliver the following set of capabilities for our Feb 2025 store release: 

Planned capabilities post GA: 

  • GenAI / AIOps
  • SLI/O enhancements (Ratio based, CI Rollups, Outage support, SLO Dashboard, SLO Templates)
  • Embedded Observability Context
  • Native synthetic monitoring
  • Operational readiness for services

Screenshot 2024-05-10 at 11.21.56 AM.png

 

It's important to note that this application will replace two existing store applications which are built on the older agent workspace technology, Site Reliability Operations and Site Reliability Metrics. To invest in the future, the new SRM app is built within the Service Operations Workspace and using UIB technology to allow for additional configurability.

 

SOWblog2.JPG

 

If you are interested in trying the application out, the app is generally available for customers with the appropriate ITOM Operator Pro/ITSM entitlements. For those with a Washington instance, you can request access from the ServiceNow app store here

 

I look forward to your feedback,

Darius K 

Product Management 

ITOM- Foundational Reliability

 

Forward statement disclaimer: Any statement that is not purely historical is considered a forward-looking statement. Forward-looking statements included in this repository are based on information available to ServiceNow as of the date they are made, and ServiceNow assumes no obligation to update any forward-looking statements. The forward-looking product roadmap does not represent a commitment, guarantee, obligation or promise to deliver any product or feature, or to deliver any product and feature by any particular date, and is intended to outline the general development plans. Customers should not rely on this roadmap to make any purchasing decision.

 

Comments
michaeljames12
Giga Contributor

Thank you for sharing detailed insights into the upcoming Service Reliability Management (SRM) application and its planned capabilities. The information about the challenges faced by teams in incident response and the features to address those pain points is highly appreciated. The emphasis on autonomy for SRE/DevOps/IT Ops teams, the user-friendly interface, and the integration of core SRE capabilities within ServiceNow's CMDB are indeed promising. The outlined roadmap, especially the early access features and planned post-GA enhancements, provides a comprehensive understanding of what to expect. I am particularly interested in exploring the Admin center for setup/governance and the upcoming Alert Automation capabilities. The commitment to replacing older applications with newer technology demonstrates a forward-thinking approach. I look forward to trying out the application and providing feedback as it aligns with our organization's focus on digital service reliability.

 

Kevin Burck
ServiceNow Employee
ServiceNow Employee

Great write-up!

Version history
Last update:
‎04-22-2025 09:01 AM
Updated by:
Contributors