Performance issues across entire system all users

Katie A · ‎03-16-2016

We are having some very impactful performance issues in our instance. We are concerned because we are on the verge of populating our CMDB with multiple data sources -- but we aren't comfortable proceeding considering the system performance is already terrible. Most of the time pages are just very slow to load, but other times the page does not load at all and the browser crashes.

The issue affects ALL users in ALL areas of the system. The issue is affecting production as well as dev and test instances.

We are seeing client transaction times around 15 seconds, some as high as 20 seconds.

I opened a ticket with support but they gave us the run-around with insufficient answers, such as checking homepage refresh times. They were dismissive and tried to mark the incident as resolved without giving us any real solutions.

They checked out the health of our nodes and explained that everything seemed normal. We moved data centers about a month ago and we have requested moving back to the old data center to see if that might have been the cause of the issue.

Still, we are skeptical and extremely unhappy.

The issue cannot be homepage refresh times since the same issue is impacting both Dev and Test where there is only ONE user working there on a daily basis.We are a small company and we only have about 40 users in the system.There is no way that homepage refresh times are causing such response issues.

I checked all of the performance graphs on CPU, JVM memory, SQL transactions, etc. I don't see anything obvious in those graphs to indicate a problem.

Attached is a screenshot of the client response times which is very slow.

We are on Fuji Patch 10. The issue started about a month ago and has gotten worse in the past few weeks.

Has anyone experienced similar issues? We are not sure how to proceed considering the lack of real help from support.

Graham18 · ‎01-31-2017

Hi Jan,

I have looked at the logs and nothing screams out at me as an issue.

I have filtered the logs for glide.memory.watcher and the following error is present;

*** WARNING *** Should have slept for 1000ms, but slept for 2401 ms instead. May incdicate garbage collection pauses.

This is re-occuring every so often, the memory usage however is around 20-30% of max memory.

JC Moller · ‎01-31-2017

Hi,

You do not have to worry about those warnings.

So you have looked at all the EXCESSIVE rows in the log. If you have a transaction that takes 10 seconds to execute, where is most of the time spent, doing SQL operations, handling Business rules, network? Is there any pattern for these slow transactions?

2017-01-31 13:03:49 (499) Default-thread-17 3CAD0153DD603240BBDEC4979E9967F4 *** End #2,282,138, path: /cmdb_ci_linux_server.do, user: xxxxxxx@xxxxx, EXCESSIVE total transaction time: 0:00:08.550, transaction processing time: 0:00:08.550, network: 0:00:00.006, chars: 257,888, uncompressed chars: 1,341,149, SQL time: 4,737 (count: 2,033), business rule: 57 (count: 11), phase 1 form length: 2,180,834, largest chunk written: 32,768, request parms size: 944, largest input read: 0

Have you activated User Presence? We had some issues with the volume and frequency of data pulling it does, so the default 15 sec polling interval was changed to 60 seconds (glide.ui.presence.interval).

- Jan

JC Moller · ‎06-07-2016

Hi,

Also verify that you are not having issues with crashing MID servers, that have recently been reported by several clients using Eureka, Fuji and Geneva versions

More info:

MID Server socket timeout (HTTP 202) errors

https://hi.service-now.com/kb_view.do?sysparm_article=KB0594709

We have recently upgraded our on-premises version and started to see this on our instance. This can cause millions of warning rows per day written to the system log and thereby affecting the overall performance of your instance.

- Jan

JC Moller · ‎06-08-2016

Hi,

I would also recommend these KB-articles:

"Instance Maintenance Recommended Practices" - Admin tips for daily, weekly and monthly activities

https://hi.service-now.com/kb_view.do?sysparm_article=KB0552943

"Troubleshooting inbound integrations performance"

https://hi.service-now.com/kb_view.do?sysparm_article=KB0564204

- Jan

denis_corja · ‎01-26-2017

Hello -

Are you using Chat Social, Connect or Live Feed? In our case Live Feed was the culprit of client response being slow. Live Feed data attached to the users would grow really fast (exponentially) and every time during the load of data we'll notice the lag.

Denis