Hakan Isik
ServiceNow Employee

Answers make the world go around

 

We all have our questions; questions that we need to answer to decide the next thing, either in our personal or professional lives.

They can be as simple and mundane as those ones: How are you? Coffee or tea? Will it rain today? Lunch? What should I wear for the meeting? How will traffic be today? Etc.

Or as complex and multi-dimensional as those: Are we progressing according to our original plan? Will stock prices go back up? How can I reduce my costs? Why is internet banking down? Etc.

In any case, we need the right data in the right context to able to answer our questions and take the next step.

 

 

The situation is not different for companies

 

Well, it is actually much more difficult for them considering ever increasing user expectations, market pressure, amount of data and technological complexity.

Because now, they need to find the answers to their questions and act accordingly faster than ever in order to connect their business with their employees, partners, customers, and keep them happy.

Luckily, so far most of the enterprises have come to the realization that they have a secret best friend who is willing to help them with those challenges: Their IT.

They knew that there was no way for them to achieve visibility and agility they need, without the support of their IT and accepted the fact that "IT is business and business is IT".

 

find_real_file.png

 

 

IT knew it all along but had its own challenges

 

On the other hand, IT always knew that data was the essential ingredient for the required visibility and agility.

That's why IT kept collecting not only technical but also business events, metrics, logs, etc. coming from all those different sources with different shapes.

And they also tried to;

  • Put all that data in a business-relevant context
  • Make it accessible by all required parties
  • Automate the analysis process so that they could answer their questions as fast as possible.

 

find_real_file.png

 

Of course that was easier said than done!

First of all, getting the data from various sources, normalizing it and putting it into a single source of truth wasn’t an easy job in traditional silo-based environments.

There were lots of integrations to maintain and lots departmental silos to bring together which required lots of time and effort.

On top of that, IT had limited analytical and machine learning capabilities which made things even more difficult in terms of automating key processes such as Root Cause Analysis.

As a result, IT got stuck in a reactive fire-fighting mode where even answering a basic question like "Why are users experiencing slowness in our application?" is a significant challenge.

 

 

BTW, seriously, what is the root cause of that slowness?

 

Now, among many other things, that is something ServiceNow can help you with.

But before getting into "How" part, there is one thing I want to emphasize: The Now Platform!

The Now Platform allows us to eliminate silos by bring together processes, systems, people, and data across the enterprise within a single data model and without requiring internal integrations.

And the capabilities I'm about to talk about are nothing but platform features which can be used with the single data model and with any application created either by ServiceNow or by users on the Now Platform.

 

 

Just fix the problem already!

 

OK! We need two things to solve the issue and keep our users happy:

  1. Right level of visibility, to know what is going on within my business and whether my users are happy or not.
  2. Agility, to act fast enough to solve problems before they negatively impact my users and my business, and prevent them wherever possible.

Let's see how ServiceNow can help us  with those two:

 

#1: Business-aware discovery:

 

Let's start with real time visibility.

Driving real time visibility starts with understanding our operational landscape.

Once we have a clear view of our business processes, supporting business services and underlying applications and infrastructure, we can accurately answer questions in many different domains such as Root Cause Analysis, Incident Management, Change Management, Asset Management, SecOps, etc.

There are many potential sources available to help us identify and map our environment; and using those sources, the Now Platform gives us the ability to both discover and map our services in context with the business.

The ultimate goal is to bring all of these sources together to provide the most accurate, real time view of our business services.  We call this our business aware data layer.  This layer allows operational decisions to be made in context to business services.

  find_real_file.png

 

#2: Event collection & management:

 

Since now we have a clear view of our business, it is time to bring that view to life.

For this, we need data!

ServiceNow allows us to ingest data in forms of events from multiple monitoring sources up and down the stack to bring together a unified view of health in context to your business services.

This helps us to fill in the visibility gaps protecting our current investments.

It also provides event management capabilities which helps us to identify health issues across the datacenter on a single management console allowing us to focus on what matters for our business.

  find_real_file.png

 

#3: Alert correlation:

 

One of the most important of those event management capabilities is "Automated Alert Correlation".

The first thing ServiceNow does when it gets technical events, is turning them into meaningful, business-relevant alerts.

That not only reduces the number of things that we need to keep an eye on significantly, but also shows us the right things to focus on.

And then using machine learning techniques, it starts learning patterns in those alerts in order to automatically create alert groups highlighting cause-consequence couples as primary and secondary alerts.

Now, instead of dealing with thousands of technical events, we're looking at only a bunch of alerts giving us meaningful hints regarding potential root causes of the issues we're having.

   find_real_file.png

 

#4: Anomaly detection:

 

But, events aren't the only data type we can leverage to achieve complete visibility.

Metrics, can also give us great insight about trends of our KPIs and root causes of our problems.

And ServiceNow allows us to ingest raw metrics from various monitoring sources too.

The main issues with them, though, as they behave differently from system to system, application to application, architecture to architecture, enterprise to enterprise; if we try to handle them using traditional methods like static thresholds, they can potentially create lots of noise in forms of false alarms.

At this point, again machine learning capabilities come to rescue us.

  find_real_file.png

The platform automatically learns from historical metric data, and builds standard statistical models to project expected metric values along with upper and lower bounds.

It then uses these projections to detect statistical outliers and to calculate anomaly scores indicating anomalous behavior of CIs which may not be captured by events.

As a result of that process, we get anomaly alerts which can be promoted to regular alerts and this helps us to get rid of false alerts while only capturing the real ones.

And those regular alerts also contributes to alert correlation mechanism as any other alerts to enhance our visibility.

 

#5: Alert-CI binding:

 

Then Event Management uses event rules and other mechanisms to automatically bind alerts to CI information from the CMDB.

This simplifies and speeds up diagnosis and remediation processes revealing impacted CIs within business service context.

  

 

#6: Collaboration of processes:

 

Yes, our main goal is trying to figure out why we're having slow transactions in our application; but before moving forward with that, I'd like to take a quick detour into a bigger picture.
Our business consists of multiple moving parts and those parts run on various tightly coupled processes.

  find_real_file.png

As we're looking for answers within our business, most of the time we need to move back and forward between those processes using overlapping subsets of our whole data.

Let's take our case as an example:

We're trying to answer one very simple question; "Why are users experiencing slowness in our application?"

While doing that, there are multiple dimensions that we have to take into account:

  • Which CIs have contributed to it and what are their relations?
  • Have we already created an incident and assigned it to the right people/group?
  • Has any of the recently performed changes caused the slowness?
  • It that something we've seen before? Should we treat it as a problem?
  • Do we already have a solution to it in our knowledge base?
  • How are we doing with our SLAs?
  • Do we need to be concerned from a security standpoint?
  • What is its impact to out business?

 

Interesting enough, even though all those questions somewhat look independent from each other, because they're using highly correlated datasets, they significantly contribute to each others answers.

Considering the collaboration between various processes, it doesn’t really make sense to think or position them separately.

And that's the point where The Now Platform steps in and provides its biggest value.

It allows us to eliminate silos by bring together processes, systems, people, and data across the enterprise within a single data model and without requiring internal integrations.

 

#7: Root cause analysis:

 

At this point, we have a good view of our business services and they're clearly reflecting their current state in real time.

We're actually in a good position to diagnose the issue.

But ServiceNow provides us one more machine learning capability in order to fully automate the root cause analysis process for achieving the desired agility.

  find_real_file.png 

Event Management applies root cause analysis algorithms to identify root cause CIs within the impacted services and allows users to drill down and pinpoint which CIs and alerts are affecting business service and application health.

This information helps us to dramatically reduce Mean Time To Repair and move to a more proactive state from our current reactive state.

 

#8: Predictive alerts:

 

ServiceNow can take this even further.

Remember, event management gives us the ability to group alerts capturing patterns in our alert flow.

It can also highlight the causal alerts and associate related ones to it.

  find_real_file.png

As an extension of this machine learning capability, when ServiceNow event management captures a pattern that it has seen before, as it knows potential consequences it creates a predictive alert to give us heads up.

This is invaluable as it gives us the ability to go beyond being proactive preventing issues before they even happen.

 

#9: Automated remediation:

 

Once we determine root cause of events either with real-time or predictive capabilities, it is critical that we remediate them as fast as possible.

The Now Platform helps us to close the loop providing automation capabilities to drive remediation related actions.

  find_real_file.png

Those actions can be in forms of:

  • Attaching a knowledge article
  • Recommending a potential solution
  • Creating a change request
  • Triggering an automated workflow

 

Regardless of whether remediation is manual or automated, the goal is resolving issues before impacting users and continuously improving the knowledge base to help drive more insightful recommendations for future cases.

 

Healthy services, happy users

 

In summary, ServiceNow can help us to reduce Mean Time To Repair and increase Mean Time Between Failures by automating those basic operational steps:

 

>>> DETECT >>> PRIORITIZE >>> ASSIGN >>> DIAGNOSE >>> RESOLVE

 

And this ultimately leads us to better quality services and to happy users.

1 Comment