Agent Client Collector (Times out eventually)

Rod Cristian Ll · ‎04-15-2024

I have noticed through monitoring and reporting that on a certain midserver we have the number of Agents reporting "Up" decreases over time. Despite the hours where our operations are active, the number of machines reporting UP is quite low. I did the same on my own PC which has ACC and selected the problematic midserver, at first it is reporting UP then the next day it shows DOWN despite the service being active and midserver is online. I tried pinging and telnet test to midserver and all were okay however my computer still not reporting as UP despite restarting the service and computer already. I've tried to check other PCs connected to the same midserver and it is the same.

Based on the logs I saw a common error message
2024-04-15T15:21:14.41 [ERROR] [agent] [read tcp 192.168.1.6:59203->XX.XX.XX.XX:8084: i/o timeout] reconnection attempt failed to the url: wss://XX.XX.com:8084/ws/events, using api-key authentication failed

When I tried to reboot the midserver itself, that's the time that all workstations connected to the midserver works
I have tried to ask ServiceNow for support and I'm not happy with the things they ask me to do.
Hopefully in this community maybe others experienced the same as us.

dougw · ‎05-28-2025

I realize I am replying over a year later, but am replying in case others stumble across this thread.

We have our ACC MIDs running on Kubernetes and learned that ACC likes a persistent connection and it checks in every 60 seconds.

Kubernetes (or the Octavia load balancer involved), by default, drops connections after 50 seconds. We upped that to 70 seconds and our "down agents" problem stopped.

Rod Cristian Ll · ‎05-28-2025

Hi @dougw ,

Thanks for this suggestion. I notice the same thing ACC likes to have a persistent connections. How are you able to modify or configure that? Is it done on ACC endpoint side or in YML or is it on the midserver.

Rgds,

dougw · ‎05-29-2025

The modification was done in the kubernetes load balancer config.

Richard Nelson · 3 weeks ago

Hi Rod, Another late response but we are are doing a POC on the ACC and I have noticed the same. Agents dropping off over time and proliferation of the error message you originally posted. Before I raise case with ServiceNow did you find a solution to this problem?

Regards, Richard