Discovery scan duration impacted by ServiceNow instance

ronlucas · ‎06-15-2017

Running Istanbul...

Yesterday, when I had several discovery jobs running, I tried to perform another discovery of a single IP address of a Windows computer. This usually takes less than 2 minutes, but it took over an hour to complete. Then I noticed the "Scheduler queue length" seen in System Diagnostics was as high as 180 on 4 out of 8 ServiceNow cluster nodes. After the scheduled discovery scans were complete, I ran another discovery of the single IP address again, and it finished within 2 minutes as expected.

So it would appear that the total duration of a discovery job is impacted not only by settings on MID servers (threads, java memory, etc), but also the load on the ServiceNow instance.

Can someone confirm this?

Thanks.

ronlucas · ‎06-15-2017

I forgot to mention that when the several discovery jobs were finished, the "Scheduler queue length" on all ServiceNow cluster nodes was back to 0.

Chris M3 · ‎06-15-2017

Definitely is. In fact, it seems that nodes are turning into the bottleneck.

Most of the probes return data fairly quickly. Even if the mid server is doing post probe processing, those scripts are super quick in general. The long pole on processing is bringing those payloads back into the instance, and processing the rest of the sensor. In particular, things like running processes cause a ton of additional work for the nodes.

A good indication is if you take a schedule that a single mid server is running, and add that server to a cluster with another. The probes will be split fairly evenly between the two servers in the cluster, but you might only see a 10% improvement (if you're lucky) on discovery time.

Dave Ainsworth · ‎06-15-2017

Hi Ron,

One way to check this is the ServiceNow Performance graphs on the dashboard if you have access (you will probably need admin role). If you select the discovery set, there are 2 that I tend to use more than others which are the Probe Run time and Sensor queue time graphs.

If Probe run time is long, the bottleneck is probably on the MID server but if sensor queue time is long it means that probe results coming back to the instance are taking a long time in a queue to be processed.

If you don't have access to this dashboard, another way is to look at the ECC queue. If during discovery, probes are staying in the 'ready' state for a long time, the MID server might be busy. Also, look at the probe results input creation time and the time it is processed. For small payloads like Windows - Classify results, this should be quick so the difference between the 'Created' and 'Processed' times will be very small on an instance where the workers are not busy. But if this is a few minutes or more then the workers are probably busy and the probes results are queued up.

Regards,

Dave

ronlucas · ‎06-15-2017

Hi David.

I am able to access Performance graphs and see the following:

Discovery Probe Run Time

Discovery Sensor Queue Time

Discovery Sensor Run Time

In your reply, I'm not sure what you mean by saying "If you select the discovery set...". What I'm able to do from a dashboard is click "Add content", "Performance Graphs", then I pick for example "Discovery Probe Run Time", then it gives me 8 choices which I believe are the nodes of our instance. If I pick one of those, I see a graph.

Am I doing this as you suggested?

Thanks,

Ron