Setting to limit discovery duration for single CI within a schedule

EricCfromAZ · ‎08-12-2024

Hello experts!

My customer has a very large network and CMDB, more often then not Discovery gets hung waiting for a handful of CI's to finish which in turn causes the job to cancel. The offending Cis are not consistent and while troubleshooting in adhoc scans, they scan just fine and complete without issue. Is there a way to "limit total execution time for a single CI" within a schedule of thousands of Discovered devices. Customer has been looking for a setting that will limit the duration of a single Discovered CI for some time now with no luck. Although there are plenty of timeout settings available, I cannot find a setting that matches this requirement.

Thank you!

Community Alums · ‎08-13-2024

hi @EricCfromAZ ,

This is an interesting use case. Is the discovery hunging up during a particular phase of discovery (shazzam, classification, identification or exploration)? Depending on where discovery is taking more time to run exactly will help understand if there is a recurring pattern to this issue.

In the related links of discovery schedules, we can find an ui action called " discovery timeline". Please look at your discovery timeline to see if there is any particular discovery taking long time to respond.

Regards,

Srinija

Selva Arun · ‎08-14-2024

Hi Eric,

Please check why is your schedule is being stuck, we had similar issues and followed the below process and have documented it for future reference:

Scenario Details

Schedule Frequency: Weekly
Number of MID Servers: 2 (used in a cluster)
Scope: Large schedule scanning approximately 32K IP addresses

Root Causes

Large Discovery Scope: The single schedule covered a large number of IP addresses, leading to extended execution times.
Insufficient MID Server Resources: The MID Server JVM memory allocation was inadequate, causing performance issues.

Solutions Implemented

Splitting Discovery Scopes

We divided the large discovery scope into smaller segments.
Each schedule now handles approximately 16K IP addresses.
Smaller scopes allowed for more efficient processing and reduced the risk of schedules getting stuck.

Increasing MID Server Resources

Memory Allocation: We increased the MID Server JVM memory allocation to 6 GB located at \agent\conf\wrapper-override.conf file in the MID Server installation directory on all lower instances and restarted the MID Server services.

CPU and Memory Reallocation: Additionally, we reallocated more CPU and memory resources to the MID Servers and restarted the MID Server services.

These adjustments significantly improved performance.

Implementation Results

After implementing the solutions, the issue was resolved.
Discovery schedules now complete successfully without getting stuck.
We achieved this without allocating more MID Servers to our lower instances.

Conclusion

By splitting discovery scopes, optimizing MID Server resources, and reallocating additional CPU and memory, we successfully resolved the issue. Our proactive approach ensured smoother discovery processes and improved overall system performance.

Please use the above implementation if your discovery schedules get stuck.

For more details, please check Set the MID Server JVM memory size (servicenow.com) & Discovery IP address configuration (servicenow.com)

Also, please set the max run time in your schedule to cancel the discovery schedule:

Discovery Schedules can have a Max Runtime set, so that they will eventually be automatically cancelled if:

it runs for longer than expected, and may run outside of its allowed time window, effecting performance of MID Server or the instance, or prevent the next 'run after' schedule in the sequence starting on time.
it gets stuck and will never complete, perhaps because a Sensor crashed and the Started/Completed numbers will never add up and trigger the completion
https://support.servicenow.com/kb?id=kb_article_view&sysparm_article=KB1522270- Please check this article.

If you believe the solution provided has adequately addressed your query, could you please **mark it as 'Helpful'** and **'Accept it as a Solution'**? This will help other community members who might have the same question find the answer more easily.

Thank you for your consideration.

Selva