MID Server Threads and relationship between discovery and event management

Shambo · ‎02-27-2019

Hi,

I am looking for a recommendation on MID Server Threads and its relationship between Discovery and Event Management.

Please can anyone suggest any recommendation how threads and discovery hosts or event management sources are related and if any figures available like thread count vs discovered hosts or event sources?

Regards,

Shambo Maitra

tim_broberg · ‎02-27-2019

threads.max sets the number of worker threads dedicated to handling probes in the standard queue (priority = 2). The default is 25 threads. This is perhaps the most commonly tuned mid server parameter in practice. Many people set it to 100. Plenty of very serious users leave it at 25. There are separate controls for the other priority levels: threads.interactive.max for priority 0 (e.g. Discover Now) and threads.expedited.max for priority 1 (e.g. Service Mapping).

If you start discovery, you'll see a few Shazzam probes doing port scans. Then you'll see a classify start per IP address, a few probes at once. Then an identify probe per IP address - a few more. This may or may not saturate the worker threads. (Note that these multiprobes consume only a single thread, despite containing multiple probes - the subprobes execute serially.)

Then the floodgates open, with like a dozen probes per device. The mid(s) load up a few hundred probes into the internal queue, and start working on threads.max probes at a time. Then those probes get completed and flow back over the ecc_queue to the instance where sensor processing gets scheduled.

These threads get held for the whole duration of the probe. If your SSHCommand probe takes 5 minutes to timeout, that thread sits waiting for the server to respond for the whole 5 minutes.

So, higher threads.max is...

the faster the mid server processes discovery probes. Not enough threads results in long discovery durations. If you start seeing output records on ecc_queue processed and updated long after they are created, you may need more mids or threads.
the more memory the mid server consumes. Too many threads, results in excessive mid server memory usage.
the more concentrated the load from the sensors is on the worker threads of the nodes. Too many threads results in an unresponsive instance.

If you go to the threads tab on your mid server, you will see threads named "Worker - Standard: <task>". There will be up to one of these per threads.max. Worker - Interactive is the pool for priority 0, one per threads.interactive.max and Worker - Expedited for priority 1, one per threads.expedited.max.

The best way to see this at work is to look in the log/agent0.log.* files and search for "LogStatusMonitor.60". This dumps out the status of all the worker thread pools and queues and the state of the heap every minute, so you can see if you're hitting max threads, which priorities are how busy, how deep the queue is, and how heavily loaded the heap is.

The event is handled by a separate web server whose thread(s?) are not included in threads.max.
- Tim.

FLP1 · ‎10-03-2023

I would like to know how the values here are interpreted, if you look at the mid server logs you can see the value for stats threads: 105 but we have defined in the mid server the max threads to be 25, how is this number (105) determine? i was expecting to see a number between the 0 and 25 threads...

(LogStatusMonitor.60) [LogStatusMonitor:49] 2023-10-03T17:35:49.444Z, stats threads: 105, memory max: 910.0mb, allocated: 434.0mb, used: 91.0mb, standard.queued: 0 probes, standard.processing: 0 probes, expedited.queued: 0 probes, expedited.processing: 0 probes, interactive.queued: 0 probes, interactive.processing: 0 probes

tim_broberg · ‎10-04-2023

Confusing, I know. When that parameter name was created, there was only one thread pool for processing probes, and it controlled that.

Now it controls the standard processing pool. If you set max threads to 50, you should never see standard.processing go above 50.

These is a separate pool for expedited and interactive, as well as individual threads for each "monitor" on the MID server that does stuff like read in commands from ecc_queue, write results back, check for updates hourly, even one to print the log message you showed (LogStatusMonitor.60).

There is a related list in the MID server (ecc_agent) form to look at a recent capture of the threads so you can look at exactly what they all are.

FLP1 · ‎10-06-2023

Thank you for the answer, however is still difficult to identify a good explanation of how to interpret these values, they do are confusing, do you know of some document discussing LogStatusMonitor.60 variables?, i have pass days searching and i haven't seen any...

thank you!