- Post History
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
an hour ago - edited 58m ago
Are you scared of debugging ServiceNow bugs? Do you feel overwhelmed, not knowing where to even start looking?
I used to feel exactly the same way. I built this framework for myself. Whenever I asked AI to help me debug something, it solved the issue in a way I couldn't match, and I wanted to learn how. Before this, I was a guesser. When something broke, I couldn't pinpoint where the bug actually lived: the UI, the application layer, or the database? My mind would run through so many possibilities at once, and I'd waste time scrolling through forms, hunting in the wrong place. What shocked me was watching AI go straight to the answer, “it's in the business layer,” with no guessing at all. It wasn't smarter than me. It was just thinking systematically. That's when I decided I wanted to think like AI too.
What AI was doing that I wasn't
What struck me was how calm it was. It didn't panic the way I did. Instead, it asked me a series of questions: Did you reproduce the issue? When do you see the error? Where does the bug show up? It was systematic, and I realized I wasn't following any of that. I was reacting, not working through it in an architectural way. It also recognized the pattern instantly. It would say, “I recognize this pattern, it usually comes from here,” because it had seen so many ServiceNow issues before. The pattern was right there, and I had never even thought to look for it.
So I broke down what it was doing into five methods I could actually follow myself. I presented this framework at ServiceNow Knowledge 2026 with my dear friend Isela Phelps, and the response from other practitioners told me it wasn't just useful to me. Here it is.
The five methods
1. See the Problem
The first thing I tell myself now is that I have to see the problem for myself. Before anything else, I reproduce the issue. I want to see it with my own eyes. And I reproduce it the way the affected user experiences it, not as an admin. This matters more than people realize: admins bypass ACLs, so if I test as myself I might never see a permission problem at all. I impersonate the user who reported it and try to recreate the exact steps. Then I ask: when did this last work, and what changed since then? What broke this time? I'll check recent update sets, deployments, and platform upgrades, because the answer is very often hiding in something that changed.
2. Recognize the Pattern
Then I recognize the pattern, have I seen this problem before? If yes, what was it, and how did I fix it last time? Over time you build a small library of recurring failure shapes. A few I see again and again:
- Permission issues: it works for you, not for them
- Silent failures: no error, but the result is wrong
- Inconsistent behavior: works sometimes, fails other times
- Performance issues: correct, but slow
- Unexpected defaults: the system filled in a value you didn't expect
Naming the pattern tells you where to look next. Here's the permission one in action. On one instance, regular users suddenly couldn't see work notes or comments in the Service Portal, but admins could. That “works for admin, not for others” shape told me immediately it was an access-control problem, not a business rule or data issue. This is a diagnostic I lean on constantly: if it works for an admin but not for a user, suspect an ACL, because admins bypass ACLs. If it fails even for an admin, it is almost never a permission problem; it is a business rule or the data itself. That one distinction saves hours. I traced this one to a single ACL on the journal field table that had been left active with no roles, no script, and no condition, which silently denied access to everyone except admins. Adding the right role fixed it. The point is that recognizing the shape of the problem sent me straight to ACLs instead of letting me wander through business rules for an hour.
3. Identify the Layer
Then I tell my brain: okay, this is a problem, I need to fix it, and I'm going to approach it the way AI does. I ask myself where the bug lives. Which layer is it on?
ServiceNow has three layers, and almost every bug lives in one of them:
- UI layer: what the user sees and interacts with. Forms, fields, UI policies, client scripts, UI actions.
- Application / Business layer: the logic that runs when something happens. Business rules, ACLs, script includes, flows, scheduled jobs.
- Database layer: where the data actually lives. Tables, columns, and records.
The important part is understanding how data moves between them. When a user saves a record, the data flows down: from the UI layer, through the application/business layer, and into the database. Then it flows back up: the database returns the saved value, through the business layer, back to the UI for the user to see. A bug can break that flow at any point on the way down or on the way back up. That is why a problem that looks like a UI issue so often starts a layer or two deeper. In my round-robin bug, the field looked fine in the UI, but the save was being blocked down in the business layer before it ever reached the database, so the UI showed the old value again on the way back up.
It also helps to know the order things run in when a record is saved, because that order is your map. On the client, client scripts and UI policies run first. When the record is submitted to the server, before business rules run, then the record is written to the database, then after business rules run, and finally async business rules run in the background. ACLs are checked whenever a record is read or written. So if a value never makes it into the database, I look at client scripts, UI policies, and before business rules. If the data is saved correctly but looks wrong to the user, I look at after business rules, display rules, and ACLs on read. Knowing the sequence tells me which handful of things to check instead of all of them.
Figuring out the layer first is what stops me from searching everywhere at random.
4. Investigate the Layer
Once I know the likely layer, I investigate it with evidence instead of guessing, and ServiceNow gives you the tools to do that. For server-side logic, the first thing I do is simulate the bug using a background script, run as read-only on the record, because the script output shows me the truth about what is actually happening. Is a before business rule stopping the insert or silently aborting the action? One quick tell: gr.update() returns the sys_id on success and nothing when the save is blocked, so a blank result is a red flag. Beyond background scripts, I turn on session debugging for the layer I suspect: Debug Business Rule shows me every rule firing in order, and Debug Security Rules shows me exactly which ACL granted or denied access, which is how I confirmed the work notes problem earlier. For data, I use Show XML on the record to see the actual stored values rather than what the form is displaying, since the two are not always the same. For client-side issues, the browser console and the JavaScript debugger tell me what client scripts and UI policies are doing. And the system logs almost always have the warning or error that names the culprit. I don't change anything until one of these confirms the cause.
5. Fix and Learn
Once I know the cause, I fix it on the dev instance first, do a thorough investigation to make sure it's really resolved, and then document it, so the next time this bug shows up, I already know the answer.
A real bug I solved this way
Here's a real production bug that runs through all five methods.
In our round-robin assignment setup, users could exclude themselves from getting tickets using a true/false field. When someone was excluded, a business rule would skip them and pass the ticket to the next person. The problem: one person started receiving every ticket, because everyone else appeared to be excluded.
Here's what made it confusing. Users edited their exclusion setting through the Group Members tab, and the form said “Saved.” No error, nothing on the record. But the change silently wasn't sticking.
A colleague had opened a ServiceNow support case for this, and it sat for two days. The ServiceNow analysts were working on it, but they couldn't figure out the cause. That's when I decided to take it into my own hands.
I sat down that Saturday and worked through it with these methods:
- See the Problem. I reproduced it myself. The field said “Saved” but didn't hold.
- Recognize the Pattern. This was a classic silent failure: “Saved,” no error, but the wrong result. That told me to stop staring at the field and look for something aborting the save.
- Identify the Layer. It wasn't the UI and it wasn't the field. The real action was in the application layer, on save.
- Investigate the Layer. I didn't guess. I simulated the change directly on the user record with a background script:
var gr = new GlideRecord('sys_user');
gr.get('<user_sys_id>');
gr.u_round_robin_excluded = false;
gr.update();
Instead of saving, the script output told me the truth:
Attempt to insert/update an invalid value for sys_user.country: USA
Operation against file 'sys_user' was aborted by Business Rule
'Prevent invalid country code'
There it was. A business rule named “Prevent invalid country code” was silently aborting the entire update. When I opened it, the logic was simple, and that was the problem:
var country = current.country;
var coreCountryGR = new GlideRecord("core_country");
coreCountryGR.addQuery("iso3166_2", country);
coreCountryGR.query();
if (coreCountryGR.getRowCount() == 0) {
gs.warn("Attempt to insert/update an invalid value for " +
"sys_user.country: " + country);
current.setAbortAction(true);
}
The rule validated each user's country against the core_country table, and hundreds of users had "USA" stored instead of the valid ISO code "US". The trap is current.setAbortAction(true): it doesn't reject just the bad field, it aborts the entire record save. So any update on those users, including the round-robin change, silently rolled back with no error shown to anyone.
Two things made this rule dangerous, and they are the real lessons for anyone writing validation. First, it aborted without calling gs.addErrorMessage(), so the user got no feedback at all, which is exactly why the failure was silent and so hard to trace. Second, it ran on every update instead of guarding with current.country.changes(), so it punished records that weren't even touching the country field. A validation rule should fail loudly and only run when the field it cares about actually changes.
- Fix and Learn. After my architect reviewed and approved the fix, we applied it the following Monday. I asked ServiceNow to close the case, and confirmed with them that the rule was an out-of-the-box one tied to a security patch, and that it was safe to disable. We also corrected the bad country data at the source, updating the affected users from
"USA"to"US". Then I documented the pattern: when a save silently reverts with no error, suspect an abort-on-save in the business layer before you ever touch the field itself. That note means the next person never loses two days to it.
The ServiceNow case had been open for two days with no answer. Using these methods, I found the root cause in a single afternoon.
Why this matters
My hope is that you walk away knowing that debugging ServiceNow doesn't have to feel overwhelming. Even the most complex bugs can be cracked with these methods, and you can fix them in a few hours, the way I did. The more you use this, the more patterns you'll recognize, and the more you'll be able to help others on your team. We don't need to fear AI. We can learn to think systematically like it does, and use AI as a partner for the harder design and implementation work. That's the whole idea: think like AI, work with AI.