How to automate recognition of performance issues?

Brett Zipkin1 · ‎01-28-2022

I've been tasked to determine how to baseline and check instance performance over time to maintain it's health. Especially if we turn on a new feature or integration and want to understand it's impact.

I've read through the entire Performance Management section of the docs and they provide a lot of information about reactive ways to research performance problems when you know you have something to look into, possibly based on user complaints. I know there are a number of charts in the ServiceNow Performance Dashboards to give some insights.

I've been reading through the KB article found here: Recommendations for Optimal Instance Performance which seems to cover best practices and manual steps to perform daily/weekly/monthly to look for issues that may be occurring.

What I'm not seeing anywhere is how to set reasonable thresholds on metrics to warn the administrative team that there is something they SHOULD look into. And not just how to set them but to know what a reasonable threshold is for a given statistic. Ex. If I'm looking at Semaphore use and in a given week it fluctuates on each node from 2-3 up to 10-15 is that a bad thing? A normal thing? Should I only be concerned if it hits 100? I'm not seeing any guidance anywhere on these types of questions and those unknowns apply to almost every metric that you can trend on practically every graph set. The only thing I found was if session counts hit 10,000 that can cause performance issues. Thankfully that's not an issue but at least I know that if my session counts are closer to 600 on a given node it's a non-issue.

It could practically be a full time job to have someone tasked to go look for problems. And would be a huge waste of time trying to solve a problem that really isn't a problem at all. Without knowing how to properly recognize and automatically flag bad things, that was be a massive waste of administrative resources.

What methods do other administrators use to automate these things? What is the best way to determine when you need to look more closely at certain aspects of the instance performance without relying on a complaint?

guythatusesserv · ‎01-28-2022

For reference, I am no expert on performance, but we've been poking away at it for a couple years now. I'll share what we've done, but I'm not sure how helpful it will be.

ServiceNow does have a lot of their own monitors and metrics you don't have access to directly. Sometime you could ask product support for a demo. I've seen them and they are fairly impressive. Having said that, it still leaves a lot of gaps.
We have created several monitors (scheduled jobs) of our own to watch various queues and transactions. This includes the Active Transactions (all) queue, Event Log, Today's scheduled jobs and others.
We have created performance analytics reports to trend performance over time. This has been one of our best tools to see the effect a change has had on performance. These reports mostly show performance on form views, list views, and background transaction averages for various forms. We query the transactions table.
Our ability to load test on the system is extremely rudimentary. We use an autoit script and leave it running for a day or two against our test instance. Not a very good solution.

xavier_robert · ‎01-31-2022

Hi Brett,

By installing Application Insights you will have a centralized view of your instance's metrics. It seems to be great for monitoring the system.

https://docs.servicenow.com/bundle/rome-platform-administration/page/administer/platform-performance...

But there is dependency with Metric Base so it is not clear to me il can be used with our subscription.

Brett Zipkin1 · ‎01-31-2022

Thank you both for the feedback.

@xavier.robert ,

I'm going to install Application Insight in one of our sub-prod environments to test it out. According to the store, MetricBase is free IF you are installing it for the sole purpose of Application Insight and as such has to be installed by support. They put the process in the Store App description. I watched the demo video and it seems like it might be a helpful tool. I like that you can set thresholds and tie them to Flows to take action (even if it's just an email).

Disappointed that there is still no guidance on what thresholds are reasonable. At least not yet. They rely on you to know your own system and what is normal. They implied it's a future feature that will be coming so that's at least something to look forward to. One of my concerns, coming into this after having been live for a year, is that what is normal for our system may not necessarily be efficient. So a threshold based on normal use may falsely hide an issue. But I guess I'll find out more once I start playing with it.

I am curious if anyone else is using any other methods or possibly anything in conjunction with the above.

-Brett