Troubleshooting Stuck Default Semaphores

nidhivijay · ‎11-21-2017

A semaphore is used on the platform to limit the number of transactions that can occur on a node at one time in order to protect resources on the node. When a user submits a request, it is routed through a semaphore pool where it basically joins a queue. If there is a free semaphore in that pool, then the request will grab it and process. When the transaction is complete, the semaphore is released and is free to process other requests that are in the queue.

The default semaphore pool is used to process all UI transactions (for example, homepages, Service Portal pages, the catalog, etc.). Unless a request is specified to go through a different semaphore pool, it will end up in the default pool, which is the most critical pool for performance. If the default fills up and the semaphore waits, or rejected requests (queue depth is full) begin to occur, users will see an immediate impact.

Why do default semaphores get stuck?

Here are some of the things that can cause a default semaphore to get stuck:

Heavy load on the server is causing processes to delay the release of semaphores.
Two semaphores are waiting for the resources the other uses and neither can break the loop.
There is an infinite JavaScript loop in a scripted business rule.
The transaction being processed is extensive.

To verify whether a semaphore is stuck, start by going to stats.do and identifying if one or more semaphores are currently running for a long time. A long time generally is more than 2 hours. If the transaction time exceeds this time, it is highly possible that the semaphore is stuck.

How to troubleshoot a stuck semaphore

Identify the stuck semaphore

Go to stats.do and locate the long-running semaphore(s) by looking at the total execution time (highlighted in the example below).
0:EA04429355F83D00B029B0D9928DBEE3 #1556314 /u_hr_employee_import.do (Default-thread-633) (22:00:39.504)
0:786F3143DB7FB680BB6BF7951D9619BE #2925841 /service_catalog.do (Default-thread-10) (29:30:07.809)
Click the link to the thread (the blue links above) to see the stack trace of the thread. This will tell you what the code is doing that is taking so long.

Investigate the issue

It is important to understand what the thread is doing before charting the next steps.
- If you refresh the page, does the stack change or stay the same? If it stays exactly the same, then it is probably stuck.
- If the top part of the stack changes but the bottom part of the stack is always the same, then you've encountered either an endless loop (recursive logic) or a very slow loop.
- If the stack trace includes a line with sys_script_include.<sys_id>, that means a script include is being executed. sys_script.<sys_id> is a Business Rule. If you see a Business Rule or a Script Include in the stack trace, this will need to be reviewed for further optimization. If it is a custom script, please review. But if it is a ServiceNow script, I recommend opening a HI incident for review
Path of relief
- If the thread is not causing severe performance issues and will eventually complete, then just let it run.
- If the thread is causing severe performance issues, is stuck, or is in an endless loop, then you will have to terminate it somehow.

Kill a long-running transaction

Do not kill a long-running transaction without confirming that it is acceptable and safe.

The above warning is especially true when you cancel a sys.scripts.do transaction taking a long time. sys.scripts.do is the Scripts - Background module and is often used to do large operations that are known to take a long time to complete. To terminate a stuck transaction, follow these instructions:

Role required: Admin (If high security is enabled, elevate privileges to security_admin)

Log in to the instance that is experiencing performance issues.
Navigate to the appropriate Active Transactions module.
- To view and kill transactions on the current node for your instance, navigate to User Administration > Active Transactions.
- To view and kill transactions on all nodes for your instance, navigate to System Diagnostics > Active Transactions (All Nodes).
Kill the transaction by either right-clicking on the record and selecting Kill, or by selecting the check box next to the record and selecting Kill from the Actions on selected rows choice list. DO NOT select Delete as that will only remove the Active Transaction record and will not stop the transaction itself. This process may take a few minutes to complete.
If the above steps don't resolve the problem open a HI incident and schedule a restart of the node.

Troubleshooting Stuck Default Semaphores

Why do default semaphores get stuck?

How to troubleshoot a stuck semaphore

Identify the stuck semaphore

Investigate the issue

Kill a long-running transaction

Should I Customize ServiceNow to solve my Business Problem? A real example

Ask the Experts: UI Builder

Platform Academy: Unlocking the Power of Controller Building in UI Builder