Integrating proactive monitoring with customer service

  • ServiceNow Blog
  • Now on Now
  • Solutions
  • 2020
April 29, 2020

Mastering proactive monitoring

Everyone knows the existential question: If a tree falls in the forest, and nobody is there to hear it, does it make a sound? Leading-edge customer service management today has produced a corollary: If a network problem is fixed before the customer even suspects there’s an issue, did it even happen?

For Dominic Walton, senior director of site reliability engineering at ServiceNow, the answer is a firm no. Walton leads a global team of three dozen engineers who have spent the last five years honing the craft of pre-emptive monitoring—scanning real-time analytics for indications that customers might be affected by some future IT event. They are trained to spot issues. The team prevents a majority of issues before customers are aware of any kind of impact.

Walton doesn’t linger long on the philosophical implications. “Proving the negative,” Walton says with a chuckle. “It’s actually one of the things we struggle with when we report on what we do. Say we resolve an issue. If left alone, the customer might have suffered. But we intervened, so it didn’t. How do we say that it would have?”

Signal vs. noise
Proactive monitoring starts with the ability of the Now Platform® to integrate a variety of cloud monitoring tools. “The beauty is that because of our platform, all of these elements are connected,” says Brooke Hendricks, director of business process management for the ServiceNow customer support portal. “Your alerts are in the same system as your events; your events are in the same system as your customer incidents or cases; your cases are all in the same system as your change tickets. We can see how they all affect each other and mitigate any risks.”

Two applications on the platform have been key to the progress: event management for monitoring, and analytics dashboards that bring all that disparate data into a single view. As the team gained experience, it fine-tuned the dashboards, experimented with how thresholds should be set for creating various kinds of alerts, and looked for patterns that indicated the likelihood of common issues. Once captured, many of these lessons were then automated into workflows, improving the signal-to-noise ratio significantly. A good signal-to-noise ratio helps translate to higher quality alerts.  

Walton has found that about half of what used to trigger service alerts qualified as noise, not requiring remediation. And more than half of what remained could be addressed immediately and solved without involving customers. Only about one in six potential issues resulted with his team opening a service ticket and informing a customer that the team was working on an issue that the customer hadn’t even known about before then.  This is the only time customers are made aware of the issue and customer satisfaction becomes a primary factor.

Natural disasters are one example of how his team may get ahead of issues. In the case of facilities potentially being struck by hurricanes, he says, “Where necessary, we may proactively decide to temporarily move the services away from those data centers. This failover would be seamless from the customer perspective.”

Leaner, smarter team
By moving the point of resolution to the point of detection, Walton’s team has made the traditional network operations center redundant.  In the past, a company the size of ServiceNow may have operated with engineers organized into tiers and that used scripts to triage, escalate, and respond to events within a ticketing system.

Walton has instead created a smaller but highly skilled global team, without tiers. Engineers can take responsibility for issues as they arise and see them through resolution. Team members are located around the world and provide follow-the-sun coverage without the need for graveyard shifts. His team is staffed by “highly experienced technical engineers who've been there, seen it, done it.” He hires the best multi-discipline engineers he can find to ensure the team resolves the various scenarios that pop up.

Their methods have been designed to not only address the immediate need of restoring service, but to drive toward platform-level solutions that increase reliability by solving underlying causes. He invokes the 80/20 rule to describe the division of focus.

“Eighty percent of the events we've seen before so we can plan for them and automate. Twenty percent of things that happen we have never seen before, and that’s where we need to be well prepared and rely on our experience and calm,” he adds. “Altogether, we should be that team that no one thinks about. We want our customers to go about their day unaware that we are helping to protect their ServiceNow instances from the issues that might arise.”

For more information on how ServiceNow uses its own technology to run its operations, visit the Now on Now, our website that is chockful of webinars, case studies, and other information on the Now Platform.

© 2020 ServiceNow, Inc. All rights reserved. ServiceNow, the ServiceNow logo, Now, and other ServiceNow marks are trademarks and/or registered trademarks of ServiceNow, Inc. in the United States and/or other countries.

Topics

  • Career shift: two women in conversation at a table in an office
    Careers
    Automation is opportunity for a career shift into tech
    Teresa Ko considered making a career shift into tech, but she didn’t want to go back to school or lose income to do so. Find out how her dream came true.
  • Operational excellence strategy: a women’s rowing team on the water
    Cybersecurity and Risk
    A 6-step operational excellence strategy
    Forward-thinking business leaders understand that even in times of uncertainty, a sound operational excellence strategy is critical to competitive advantage.
  • Process mining: 3 workers in discussion over a conference table
    Now on Now
    Why process mining is a game changer for process optimization
    Vital to the push for automation is investing in process mining, which can help you do more with less, transform to increase revenue, and innovate to compete.

Trends & Research

  • RPA: group of workers gathered around a conference table looking at a laptop
    AI and Automation
    Forrester report: ServiceNow debuts as a Strong Performer in RPA
  • Digital innovation: three workers looking at a computer monitor
    AI and Automation
    Survey says digital innovation is the way to navigate macro uncertainty
  • Innovation is a top management imperative: man standing in a corner office overlooking a city
    IT Management
    Survey: Innovation is a top management imperative

Year