Resolving Discovery Schedule Stuck in Active State: A Case Study

Selva Arun · ‎04-17-2024

Resolving Discovery Schedule Stuck in Active State: A Case Study

Our organization faced a critical issue where 2 of our Discovery schedules were getting stuck in an active state for prolonged periods. This situation impacted our ability to efficiently discover and manage configuration items (CIs). Specifically, we encountered the following challenges:

Stuck Schedules: Discovery schedules remained in an active state without progressing eventually requiring us to cancel it due to lack of progress.
Resource Constraints: We had limited MID Servers in our lower instances and didn’t want to allocate additional servers.

Scenario Details

Schedule Frequency: Weekly
Number of MID Servers: 2 (used in a cluster)
Scope: Large schedule scanning approximately 32K IP addresses

Root Causes

Large Discovery Scope: The single schedule covered a large number of IP addresses, leading to extended execution times.
Insufficient MID Server Resources: The MID Server JVM memory allocation was inadequate, causing performance issues.

Solutions Implemented

Splitting Discovery Scopes

We divided the large discovery scope into smaller segments.
Each schedule now handles approximately 16K IP addresses.
Smaller scopes allowed for more efficient processing and reduced the risk of schedules getting stuck.

Increasing MID Server Resources

Memory Allocation: We increased the MID Server JVM memory allocation to 6 GB located at \agent\conf\wrapper-override.conf file in the MID Server installation directory on all lower instances and restarted the MID Server services.

CPU and Memory Reallocation: Additionally, we reallocated more CPU and memory resources to the MID Servers and restarted the MID Server services.

These adjustments significantly improved performance.

Implementation Results

After implementing the solutions, the issue was resolved.
Discovery schedules now complete successfully without getting stuck.
We achieved this without allocating more MID Servers to our lower instances.

Conclusion

By splitting discovery scopes, optimizing MID Server resources, and reallocating additional CPU and memory, we successfully resolved the issue. Our proactive approach ensured smoother discovery processes and improved overall system performance.

Please use the above implementation if your discovery schedules get stuck.

For more details, please check Set the MID Server JVM memory size (servicenow.com) & Discovery IP address configuration (servicenow.com)