Setting to limit discovery duration for single CI within a schedule
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-12-2024 05:58 AM
Hello experts!
My customer has a very large network and CMDB, more often then not Discovery gets hung waiting for a handful of CI's to finish which in turn causes the job to cancel. The offending Cis are not consistent and while troubleshooting in adhoc scans, they scan just fine and complete without issue. Is there a way to "limit total execution time for a single CI" within a schedule of thousands of Discovered devices. Customer has been looking for a setting that will limit the duration of a single Discovered CI for some time now with no luck. Although there are plenty of timeout settings available, I cannot find a setting that matches this requirement.
Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-13-2024 04:11 AM
hi @EricCfromAZ ,
This is an interesting use case. Is the discovery hunging up during a particular phase of discovery (shazzam, classification, identification or exploration)? Depending on where discovery is taking more time to run exactly will help understand if there is a recurring pattern to this issue.
In the related links of discovery schedules, we can find an ui action called " discovery timeline". Please look at your discovery timeline to see if there is any particular discovery taking long time to respond.
Regards,
Srinija
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-14-2024 10:21 AM
Hi Eric,
Please check why is your schedule is being stuck, we had similar issues and followed the below process and have documented it for future reference:
Scenario Details
- Schedule Frequency: Weekly
- Number of MID Servers: 2 (used in a cluster)
- Scope: Large schedule scanning approximately 32K IP addresses
Root Causes
- Large Discovery Scope: The single schedule covered a large number of IP addresses, leading to extended execution times.
- Insufficient MID Server Resources: The MID Server JVM memory allocation was inadequate, causing performance issues.
Solutions Implemented
- Splitting Discovery Scopes
- We divided the large discovery scope into smaller segments.
- Each schedule now handles approximately 16K IP addresses.
- Smaller scopes allowed for more efficient processing and reduced the risk of schedules getting stuck.
- Increasing MID Server Resources
- Memory Allocation: We increased the MID Server JVM memory allocation to 6 GB located at \agent\conf\wrapper-override.conf file in the MID Server installation directory on all lower instances and restarted the MID Server services.
- CPU and Memory Reallocation: Additionally, we reallocated more CPU and memory resources to the MID Servers and restarted the MID Server services.
- These adjustments significantly improved performance.
Implementation Results
- After implementing the solutions, the issue was resolved.
- Discovery schedules now complete successfully without getting stuck.
- We achieved this without allocating more MID Servers to our lower instances.
Conclusion
By splitting discovery scopes, optimizing MID Server resources, and reallocating additional CPU and memory, we successfully resolved the issue. Our proactive approach ensured smoother discovery processes and improved overall system performance.
Please use the above implementation if your discovery schedules get stuck.
For more details, please check Set the MID Server JVM memory size (servicenow.com) & Discovery IP address configuration (servicenow.com)
Also, please set the max run time in your schedule to cancel the discovery schedule:
Discovery Schedules can have a Max Runtime set, so that they will eventually be automatically cancelled if:
- it runs for longer than expected, and may run outside of its allowed time window, effecting performance of MID Server or the instance, or prevent the next 'run after' schedule in the sequence starting on time.
- it gets stuck and will never complete, perhaps because a Sensor crashed and the Started/Completed numbers will never add up and trigger the completion
- https://support.servicenow.com/kb?id=kb_article_view&sysparm_article=KB1522270- Please check this article.
If you believe the solution provided has adequately addressed your query, could you please **mark it as 'Helpful'** and **'Accept it as a Solution'**? This will help other community members who might have the same question find the answer more easily.
Thank you for your consideration.
Selva