Performance issues across entire system all users

Katie A · ‎03-16-2016

We are having some very impactful performance issues in our instance. We are concerned because we are on the verge of populating our CMDB with multiple data sources -- but we aren't comfortable proceeding considering the system performance is already terrible. Most of the time pages are just very slow to load, but other times the page does not load at all and the browser crashes.

The issue affects ALL users in ALL areas of the system. The issue is affecting production as well as dev and test instances.

We are seeing client transaction times around 15 seconds, some as high as 20 seconds.

I opened a ticket with support but they gave us the run-around with insufficient answers, such as checking homepage refresh times. They were dismissive and tried to mark the incident as resolved without giving us any real solutions.

They checked out the health of our nodes and explained that everything seemed normal. We moved data centers about a month ago and we have requested moving back to the old data center to see if that might have been the cause of the issue.

Still, we are skeptical and extremely unhappy.

The issue cannot be homepage refresh times since the same issue is impacting both Dev and Test where there is only ONE user working there on a daily basis.We are a small company and we only have about 40 users in the system.There is no way that homepage refresh times are causing such response issues.

I checked all of the performance graphs on CPU, JVM memory, SQL transactions, etc. I don't see anything obvious in those graphs to indicate a problem.

Attached is a screenshot of the client response times which is very slow.

We are on Fuji Patch 10. The issue started about a month ago and has gotten worse in the past few weeks.

Has anyone experienced similar issues? We are not sure how to proceed considering the lack of real help from support.

HugoFirst · ‎06-03-2016

Hello Greg,

With respect to your logic about setting the assignment_group to Service Desk if the assignment group is null, I have a question for you.

Why not do this in a "before" rule? You could skip the current.update altogether in that case. But in the case of updating the record in an "after" rule, unless you've gone out of your way to skip business rules, all the the BR's will run for that update too. If one of those rules result in another current.update, then the BR's could be invoked a third time. I suppose it's possible for the "infinite recursion" to occur here, I've not actually experienced it.

In scheduled jobs, I disable the running of business rules with "record.setWorkFLow(false)" prior to the update.

You might try "current.setWorkFlow(false)" and see if that helps.

koliva · ‎06-02-2016

We have had the same issue approximately since upgrading to Geneva from Fuji on April 22. We have gotten the same runaround about the homepages being the cause and having to be refreshed.

Steve McCarty · ‎06-02-2016

As Steve Driscoll said, you will definitely need to sleuth around. Some other information to check are the Active Transactions module and the Transaction logs and see if there are any currently running jobs that are eating up system resources. You can sort the Transaction logs by response time to see the jobs taking the longest.

We've had some performance issues related to some import sets that were taking a ridiculous amount of time to run. It turns out that those were due to the coalesce fields not being indexed.

Another culprit can be scheduled jobs that are running long. We had some trouble with the software counter scripts getting hung up. We had inadvertently created some departments that had themselves listed as a parent. When the software counters tried to group the details by department they got stuck in an infinite loop trying to find the top level department for the grouping. So every night when the software counter scheduled job kicked off it would slowly eat up a ton of resources and slowed everything else to a crawl.

The best thing is to keep poking around and see if there are patterns to the problems and see what else could be running and keeping the system busy.

Good Luck!

-Steve

Community Alums · ‎06-03-2016

Check the slow job log, check slow queries. look at the logs and look for errors.

I would look into the homepages like HI said, as well. If there is a poorly optimized report running on a very large table, and it's been made into a gauge, and it's been set to refresh regularly (and automatically), then if enough people use that report, you could experience issues.

We eliminated the "contains" operator from reports wherever possible, and we noticed a marked decrease in page load times. It actually was a relevant concern.

JC Moller · ‎06-03-2016

Hi,

Have you checked the system log or localhost log for slow running business rules? Export some logs from your server and grep for the 'Slow business rule' text string. This could give you some indication of the root cause for the performance issues.

2015-12-16 10:19:37 (871) Default-thread-10 DB0AAE6669DC9200076D6EF80856E4A6 Slow business rule 'Calc SLAs on Display' on incident:INC2127522, time was: 0:00:01.088

2015-12-16 10:19:38 (059) SOAPProcessorThread11ff72e26910d200076d6ef80856e4e6 71CFFAA26910D200076D6EF80856E4AE Slow business rule 'Run SLAs -Service offering' on incident:INC2132168, time was: 0:00:02.603

Have you checked the Slow Queries table for any new queries that have recently popped up on the list? Maybe a missing index is causing the slowness? Analyze anything that takes more than 3000 ms and has a high exection count.

Also check data in Table IOstats -table. Analyze "Selects over 1s" and "Select over 10s" columns. Are there any especially high volumes present? This could narrow down the issue to a specific table.

Regards,

Jan