Memory contention caused by 'ASYNC: Discovery - Sensors VMWarevCenter' scheduled jobs

Mitali Sahoo · ‎12-27-2020

Issues faced:

Instances can experience outages due to memory contention on few/all UI nodes.

The heap dumps show evidence of multiple worker threads executing 'ASYNC: Discovery - Sensors VMWarevCenter' scheduled jobs with each thread consuming around 300MB of space.

Taking a quick glance at the heap dump, each thread has a huge array in heap which appears to hold many objects with each objects having elements named host_group and vm_group holding GlideElement() (i.e. full GlideRecord()) objects from the cmdb_ci_vcenter_host_group and cmdb_ci_vcenter_vm_group tables.

Resolution:

For VMWarevCenterClusterDRSProbe, create and set probe parameter 'pagesize' to 1 or 2.

Reasons for changes:

- page_size defines how many clusters need to be discovered in VMWarevCenterClusterDRSProbe.
- In the discovery, we see that if the number of cluster passed to VMWarevCenterClusterDRSProbe is more (in customer's case, it was 5/6), it goes for timeout.
- Setting page_size to 1 or 2 for VMWarevCenterClusterDRSProbe makes sure that the number of clusters need to be discovered for DRS will be maximum 2. So that, it fetches lesser information compared to the previous scenario where it was fetching information for 5 clusters.