andrew_och
ServiceNow Employee
ServiceNow Employee

At Servicenow we use the SLA data collector script output to give us an overview of all of the SLA engine setup as well as giving us some common mis-configurations and general pointers to what is causing an issue for a particular running SLA. While this output is primarily geared to both support and development teams, in looking at its output I hope that it will give a deeper understanding of SLA configuration.

 

This post is not a generic guide to trouble shooting or debugging SLA issue per se, rather shows the first steps Support and Development teams take when working on SLA issues.

 

I present an example output of the script and talk about each section to give a bit more understanding of what the output is and how it helps us trouble shoot SLA issues. The entire script (which is attached) is cut and paste and run in System Definition  Scripts - Background.

 

The first piece of information is really for the authors of the script itself to know what version was run:
 
*** Script: SLA DataCollector version: 1.1.43
 
The next block of information collates all customization of logic that make up the SLA core feature. I have set this up to show that a customization was made to the script include: TaskSLAController sysId: 24a759e30a0a2c3960e024ad3b60d9e8. This enables the support and development teams to look at these customizations to see if they might have some impact on the issue being investigated.
 
*** Script: Customized SLA Script Includes: 
TaskSLAController-24a759e30a0a2c3960e024ad3b60d9e8
 
*** Script: Customized SLA Business Rules: 
 
 
*** Script: Customized SLA UI Actions: 
 
 
*** Script: Customized SLA Dictionary Records: 
 
This block gives us a quick understanding of the running versions and database name the instance is using and would give us the patch and hotfix level, however I ran this on my local instance running the current master release of our codebase.
 
*** Script: Build properties: 
glide.buildname[HEAD (Master)]
glide.builddate[Thu_Jul_06_21_10_22_BST_2023]
instance_name[glide_db_dump]
 
The SLA setup and engine being used differs based on instance level properties and this section therefore tells us what SLA engine is running and how it is configured to run. Some of these properties geared to providing a continuity of past behaviours even when these behaviours are seen as no longer desirable. Again the focus is to see the state of play of an instance rather than to tweak things.
 
*** Script: SLA properties: 
com.snc.sla.engine.version[2011]
com.snc.sla.schedule.source[null]
com.snc.sla.timezone.source[null]
com.snc.sla.calculation.percentage[1000]
com.snc.sla.engine.async[false]
com.snc.sla.compatibility.breach[false]
com.snc.sla.default_conditionclass[SLAConditionBase]
com.snc.sla.retroactive_pause[null]
com.snc.sla.condition.case_sensitive[false]
com.snc.sla.workflow.run_for_breached[true]
com.glideapp.workflow.duration.script_uses_schedule[null]
com.glideapp.workflow.duration.relative_uses_schedule[null]
com.snc.sla.calculate_planned_end_time_after_breach[false]
com.snc.sla.calculation.use_time_left[false]
glide.sla.calculate_on_display[true]
com.snc.sla.always_populate_business_fields[true]
com.snc.sla.get_task_from_db[sc_req_item]
glide.sys.audit_inserts[false]
com.snc.sla.async.job.limit[null]
com.snc.sla.async.job.priority[null]
com.snc.sla.async.job.claim_limit.min[null]
com.snc.sla.async.job.claim_limit.max[null]
com.snc.sla.async.deactivating[null]
com.snc.sla.contract.tables[null]
 
When investigating an issue we often ask for various parts of the SLA Engine to be set to debug level. Here we show the state of play of all the logging properties on a running instance:
 
*** Script: SLA Logging properties: 
com.snc.sla.task_sla_controller.log[notice]
com.snc.sla.task_sla.log[notice]
com.snc.sla.condition.log[notice]
com.snc.sla.flow.log[notice]
com.snc.sla.workflow.log[notice]
com.snc.sla.calculatorng.log[notice]
com.snc.sla.log.destination[db]
com.snc.sla.repair.log[notice]
com.snc.sla.breakdown_processor.log[notice]
glide.workflow.log.debug[false]
glide.escalation.verbose[false]
com.glide.relative_duration.log[null]
com.snc.sla.async.delegator.log[null]
com.snc.sla.time_line.log[null]
 
SLA Repair is a feature used to "fix" broken task_sla records. The feature has some properties to govern its behaviour and again here we want to see the state of play of how SLA repair is setup to run to give us an indication of it configuration.
 
*** Script: SLA Repair properties: 
com.snc.sla.repair.enabled[true]
com.snc.sla.repair.use_repair_workflow[false]
com.snc.sla.repair.workflow[Default SLA Repair workflow]
MSG[SLA is using Flow. See Flow section]
 
For customers that run with domain separation, this section gives us information on if it is enabled and how the SLA Definition, the task and the task_sla record relate i.e. are they all in the same domain.
 
*** Script: SLA Domain setup: 
domains_active[false]
 
Here is an example with domains active:
 
*** Script: SLA Domain setup: 
domains_active[true]
contract_sla[TOP/Acme/Acme EMEA/Finance]
task[TOP/Acme/Acme EMEA/Finance]
task_sla[TOP/Acme/Acme EMEA/Finance]
 
Contract Management integration is another way to group SLA definitions together, at the same time if mis-configured can prevent SLAs from attaching. When not used it is empty, but below I give an example of it being used.
 
*** Script: SLA Contract Management Integration setup: 
active[true]
contract[123397ff379b92402565304643990321]
contract_SlaDef[true]
 
OOB we do not audit task_sla records. These records are numerous and temporal (time specific) in nature. However auditing can be enabled and when it is, allows us to look at the history of what the SLA engine (or other things) did to the record.
 
*** Script: Task SLA audit: 
active[false]
 
SLAs use Schedules heavily and here is where a great deal of misconfiguration occurs. Altering a schedule or its spans "breaks" in flight task_sla records. As I added a Schedule to my SLA and include the results. It highlights the fact that the schedule was modified after the task_sla attached as well as giving us ideas on what else is configured. e.g. the timezone of a schedule:
 
*** Script: SLA Schedule: 
sysId[08fcd0830a0a0b2600079f56b1adb9ae]
scheduleSource[sla_definition]
timezoneSource[task.caller_id.time_zone]
now["2023-07-07 11:29:44"]
whenNext[409]
whenNextGdt["2023-07-07 11:29:44"]
modifiedAfterTaskSlaAttached[{"08fcd0830a0a0b2600079f56b1adb9ae":[]}]
 
Relative durations are a dynamic way of calculating SLAs. Here we see details on if they are used and how.
 
*** Script: SLA Relative Duration: 
isRelativeDurationUsed[false]
duration[1 Hour]
 
SLA Breakdowns show which user or group used the most amount of time of an SLA. Here we focus on their basic configuration:
 
*** Script: SLA Breakdown Definition: 
taskTable: incident
breakdownTable: sla_breakdown_by_assignment
breakdownFields: 0[{"sourceFieldName":"assignment_group","breakdownFieldName":"assignment_group"}]
1[{"sourceFieldName":"assigned_to","breakdownFieldName":"assigned_to"}]
 
Domain workflow and flow setup can be misconfigured and so we retrieve there setup.
 
*** Script: SLA Domain Worflow setup: 
 
 
*** Script: SLA Flow setup: 
name[SLA notification and escalation flow]
 
Just as when schedules are altered and break in-progress task_sla records, so too does modifying the SLA Defintion such as changing any of the conditions: start, stop, pause, etc
 
*** Script: SLA in progress modification: 
taskSlaCreated["2023-07-07 10:43:18"]
slaDefUpdated["2019-09-11 01:54:49"]
modifiedAfterTaskSlaAttached[false]
modifiedViaUpdateSet[false]
 
If an SLA is configured never to be cancelled, and yet it is in cancelled state we highlight it here:
 
*** Script: SLA cancellation invalid: 
 
OOB box we have tested extensively any and all Business Rules on the SLA Definitions, the task_sla and the associated task. Here is where the majority of the time an SLA issue's root cause is to be found. My instance is OOB and therefore empty, but compare it your instance and see the additional logic that is not therefore tested by Servicenow:
 
*** Script: SLA Business Rules: 
contract_sla[]
task_sla[]
incident[]
task[]
 
Calculation of SLA breach times are something that causes a lot of issues, the most common issue being something breached when it felt it should not. This section gives a comparison of the task_sla as is (before), compared with if we repaired it (after). However this recalcution has data it relies on and we have a summary section of that.
 
"hasAuditData: true", if there is no audit data, there can be no repair
"isAuditModInSync: true" does the number of updates to the task match the sys_mod_count, if not its broken
"auditCount: 6" and "modCount: 6" are in sync, when the above is false we know we are broken, but how broken. "auditCount: 3" and "modCount: 158" would indicate very broken indeed. 
"hasBadHistoryUpdate: false" What is bad history? Well in addition to the above, missing history, there are two more types of bad history data. Updates with differnt internal checkpoints and updates with different dates. Both reflect split transactions and will mean that the SLA Repair feature will behave differently to the SLA Engine when it ran in real time. More on this is a future blog.
 
*** Script: SLA recalculation: 
Task: INC0010001
SLA: Priority 1 resolution (1 hour)
hasAuditData: true
isAuditModInSync: true
auditCount: 6
modCount: 6
hasBadHistoryUpdate: false
===
Stage (before): Completed
Stage (after): Completed
Start Time (before): 2023-07-07 10:43:18
Start Time (after): 2023-07-07 10:43:18
Pause Duration (before): 00:00:43
Pause Duration (after): 00:00:43
Business Pause Duration(before): 00:00:43
Business Pause Duration(after): 00:00:43
Business Percentage(before): 2.19
Business Percentage(after): 2.19
Breach Time (before): 2023-07-07 11:44:01
Breach Time (after): 2023-07-07 11:44:01
End Time (before): 2023-07-07 10:45:20
End Time (after): 2023-07-07 10:45:20
Elapsed (before): 00:01:19
Elapsed (after): 00:01:19
Business Elapsed (before): 00:01:19
Business Elapsed (after): 00:01:19
Has Breached (before): false
Has Breached (after): false
Active (before): false
Active (after): false
===
 
Example of bad history:
 
hasBadHistoryUpdate: {
  "different_dates": {
    "-1": "2023-04-13 08:21:48-2023-04-13 08:21:47",
    "4": "2023-04-13 09:29:02-2023-04-13 08:53:04",
    "6": "2023-04-13 10:13:27-2023-04-13 10:13:28",
    "215": "2023-06-22 15:18:31-2023-06-22 15:18:22"
  },
  "different_internal_checkpoints": {
    "4": "0-12349d17ca50000001",
    "6": "1234a1b12590000001-1234a1b18510000002"
  },
  "hasBadUpdates": true
}
 
Finally the output in data format of the SLA Timeline, a feature that is very similar to the SLA Repair feature, doing an in-memory repair of a task_sla (in effect showing what would happen if the task_sla record was repaired). Its output is useful when trying to identify missing pause times or when an SLA should have gone out of schedule etc. Its output is in effect an more in depth version of the above recalculation.

Below is an example of a complete result set:
*** Script: SLA DataCollector version: 1.1.43

*** Script: Customized SLA Script Includes: 
TaskSLAController-24a759e30a0a2c3960e024ad3b60d9e8

*** Script: Customized SLA Business Rules: 


*** Script: Customized SLA UI Actions: 


*** Script: Customized SLA Dictionary Records: 


*** Script: Build properties: 
glide.buildname[HEAD (Master)]
glide.builddate[Thu_Jul_06_21_10_22_BST_2023]
instance_name[glide_db_dump]

*** Script: SLA properties: 
com.snc.sla.engine.version[2011]
com.snc.sla.schedule.source[null]
com.snc.sla.timezone.source[null]
com.snc.sla.calculation.percentage[1000]
com.snc.sla.engine.async[false]
com.snc.sla.compatibility.breach[false]
com.snc.sla.default_conditionclass[SLAConditionBase]
com.snc.sla.retroactive_pause[null]
com.snc.sla.condition.case_sensitive[false]
com.snc.sla.workflow.run_for_breached[true]
com.glideapp.workflow.duration.script_uses_schedule[null]
com.glideapp.workflow.duration.relative_uses_schedule[null]
com.snc.sla.calculate_planned_end_time_after_breach[false]
com.snc.sla.calculation.use_time_left[false]
glide.sla.calculate_on_display[true]
com.snc.sla.always_populate_business_fields[true]
com.snc.sla.get_task_from_db[sc_req_item]
glide.sys.audit_inserts[false]
com.snc.sla.async.job.limit[null]
com.snc.sla.async.job.priority[null]
com.snc.sla.async.job.claim_limit.min[null]
com.snc.sla.async.job.claim_limit.max[null]
com.snc.sla.async.deactivating[null]
com.snc.sla.contract.tables[null]

*** Script: SLA Logging properties: 
com.snc.sla.task_sla_controller.log[notice]
com.snc.sla.task_sla.log[notice]
com.snc.sla.condition.log[notice]
com.snc.sla.flow.log[notice]
com.snc.sla.workflow.log[notice]
com.snc.sla.calculatorng.log[notice]
com.snc.sla.log.destination[db]
com.snc.sla.repair.log[notice]
com.snc.sla.breakdown_processor.log[notice]
glide.workflow.log.debug[false]
glide.escalation.verbose[false]
com.glide.relative_duration.log[null]
com.snc.sla.async.delegator.log[null]
com.snc.sla.time_line.log[null]

*** Script: SLA Repair properties: 
com.snc.sla.repair.enabled[true]
com.snc.sla.repair.use_repair_workflow[false]
com.snc.sla.repair.workflow[Default SLA Repair workflow]
MSG[SLA is using Flow. See Flow section]

*** Script: SLA Domain setup: 
domains_active[false]

*** Script: SLA Contract Management Integration setup: 
active[true]

*** Script: Task SLA audit: 
active[false]

*** Script: SLA Conditions: 
start[Active = true .and. Priority = 1 - Critical]
pause[Incident state in On Hold .and. On hold reason = Awaiting Caller .and. or Incident state = Resolved]
stop[Incident state = Closed]
reset[]
cancel[]
whenToCancel[Start conditions are not met]
retroactiveStart[true]
retroactivePause[true]
retroactiveStartColumn[sys_created_on]
retroactiveValue[2023-07-07 10:43:18]

*** Script: SLA Contracts: 


*** Script: SLA Schedule: 
sysId[]
scheduleSource[no_schedule]

*** Script: SLA Relative Duration: 
isRelativeDurationUsed[false]
duration[1 Hour]

*** Script: SLA Breakdown Definition: 
taskTable: incident
breakdownTable: sla_breakdown_by_assignment
breakdownFields: 0[{"sourceFieldName":"assignment_group","breakdownFieldName":"assignment_group"}]
1[{"sourceFieldName":"assigned_to","breakdownFieldName":"assigned_to"}]


*** Script: SLA Domain Worflow setup: 


*** Script: SLA Flow setup: 
name[SLA notification and escalation flow]

*** Script: SLA in progress modification: 
taskSlaCreated["2023-07-07 10:43:18"]
slaDefUpdated["2019-09-11 01:54:49"]
modifiedAfterTaskSlaAttached[false]
modifiedViaUpdateSet[false]

*** Script: SLA cancellation invalid: 


*** Script: SLA Business Rules: 
contract_sla[]
task_sla[]
incident[]
task[]

*** Script: SLA recalculation: 
Task:	INC0010001
SLA:	Priority 1 resolution (1 hour)
	hasAuditData: true
	isAuditModInSync: true
	auditCount: 6
	modCount: 6
	hasBadHistoryUpdate: false
	===
	Stage (before):						Completed
	Stage (after):						Completed
	Start Time (before):				2023-07-07 10:43:18
	Start Time (after):					2023-07-07 10:43:18
	Pause Duration (before):			00:00:43
	Pause Duration (after):				00:00:43
	Business Pause Duration(before):	00:00:43
	Business Pause Duration(after):		00:00:43
	Business Percentage(before):		2.19
	Business Percentage(after):			2.19
	Breach Time (before):				2023-07-07 11:44:01
	Breach Time (after):				2023-07-07 11:44:01
	End Time (before):					2023-07-07 10:45:20
	End Time (after):					2023-07-07 10:45:20
	Elapsed (before):					00:01:19
	Elapsed (after):					00:01:19
	Business Elapsed (before):			00:01:19
	Business Elapsed (after):			00:01:19
	Has Breached (before):				false
	Has Breached (after):				false
	Active (before):					false
	Active (after):						false
	===

*** Script: SLA timeline: 
Stage-1
	(start): 2023-07-07 10:43:18
	(end): 2023-07-07 10:43:18
	active: true
	business_duration: 0 Seconds
	business_pause_duration: 0 Seconds
	business_percentage: 0
	business_time_left: 1 Hour
	duration: 0 Seconds
	end_time: 
	has_breached: false
	original_breach_time: 2023-07-07 11:43:18
	pause_duration: 0 Seconds
	pause_time: 
	percentage: 0
	planned_end_time: 2023-07-07 11:43:18
	read_allowed: true
	schedule: 
	sla: Priority 1 resolution (1 hour)
	stage: In progress
	start_time: 2023-07-07 10:43:18
	sys_created_by: 
	sys_created_on: 2023-07-07 10:43:18
	sys_domain: global
	sys_id: cf3a7447537b2110610bddeeff7b12f9
	sys_mod_count: 
	sys_tags: 
	sys_updated_by: 
	sys_updated_on: 2023-07-07 10:43:18
	task: INC0010001
	time_left: 1 Hour
	timezone: Europe/London
	variables: 
Stage-2
	(start): 2023-07-07 10:43:18
	(end): 2023-07-07 10:44:11
	active: true
	business_duration: 53 Seconds
	business_pause_duration: 0 Seconds
	business_percentage: 1.47
	business_time_left: 59 Minutes
	duration: 53 Seconds
	end_time: 
	has_breached: false
	original_breach_time: 2023-07-07 11:43:18
	pause_duration: 0 Seconds
	pause_time: 
	percentage: 1.47
	planned_end_time: 2023-07-07 11:43:18
	read_allowed: true
	schedule: 
	sla: Priority 1 resolution (1 hour)
	stage: In progress
	start_time: 2023-07-07 10:43:18
	sys_created_by: 
	sys_created_on: 2023-07-07 10:43:18
	sys_domain: global
	sys_id: cf3a7447537b2110610bddeeff7b12f9
	sys_mod_count: 
	sys_tags: 
	sys_updated_by: 
	sys_updated_on: 2023-07-07 10:43:18
	task: INC0010001
	time_left: 59 Minutes
	timezone: Europe/London
	variables: 
Stage-3
	(start): 2023-07-07 10:44:11
	(end): 2023-07-07 10:44:26
	active: true
	business_duration: 1 Minute
	business_pause_duration: 0 Seconds
	business_percentage: 1.89
	business_time_left: 58 Minutes
	duration: 1 Minute
	end_time: 
	has_breached: false
	original_breach_time: 2023-07-07 11:43:18
	pause_duration: 0 Seconds
	pause_time: 2023-07-07 10:44:26
	percentage: 1.89
	planned_end_time: 2023-07-07 11:43:18
	read_allowed: true
	schedule: 
	sla: Priority 1 resolution (1 hour)
	stage: Paused
	start_time: 2023-07-07 10:43:18
	sys_created_by: 
	sys_created_on: 2023-07-07 10:43:18
	sys_domain: global
	sys_id: cf3a7447537b2110610bddeeff7b12f9
	sys_mod_count: 
	sys_tags: 
	sys_updated_by: 
	sys_updated_on: 2023-07-07 10:43:18
	task: INC0010001
	time_left: 58 Minutes
	timezone: Europe/London
	variables: 
Stage-4
	(start): 2023-07-07 10:44:26
	(end): 2023-07-07 10:44:46
	active: true
	business_duration: 1 Minute
	business_pause_duration: 0 Seconds
	business_percentage: 1.89
	business_time_left: 58 Minutes
	duration: 1 Minute
	end_time: 
	has_breached: false
	original_breach_time: 2023-07-07 11:43:18
	pause_duration: 0 Seconds
	pause_time: 2023-07-07 10:44:26
	percentage: 1.89
	planned_end_time: 2023-07-07 11:43:18
	read_allowed: true
	schedule: 
	sla: Priority 1 resolution (1 hour)
	stage: Paused
	start_time: 2023-07-07 10:43:18
	sys_created_by: 
	sys_created_on: 2023-07-07 10:43:18
	sys_domain: global
	sys_id: cf3a7447537b2110610bddeeff7b12f9
	sys_mod_count: 
	sys_tags: 
	sys_updated_by: 
	sys_updated_on: 2023-07-07 10:43:18
	task: INC0010001
	time_left: 58 Minutes
	timezone: Europe/London
	variables: 
Stage-5
	(start): 2023-07-07 10:44:46
	(end): 2023-07-07 10:44:59
	active: true
	business_duration: 1 Minute
	business_pause_duration: 33 Seconds
	business_percentage: 1.89
	business_time_left: 58 Minutes
	duration: 1 Minute
	end_time: 
	has_breached: false
	original_breach_time: 2023-07-07 11:43:18
	pause_duration: 33 Seconds
	pause_time: 
	percentage: 1.89
	planned_end_time: 2023-07-07 11:43:51
	read_allowed: true
	schedule: 
	sla: Priority 1 resolution (1 hour)
	stage: In progress
	start_time: 2023-07-07 10:43:18
	sys_created_by: 
	sys_created_on: 2023-07-07 10:43:18
	sys_domain: global
	sys_id: cf3a7447537b2110610bddeeff7b12f9
	sys_mod_count: 
	sys_tags: 
	sys_updated_by: 
	sys_updated_on: 2023-07-07 10:43:18
	task: INC0010001
	time_left: 58 Minutes
	timezone: Europe/London
	variables: 
Stage-6
	(start): 2023-07-07 10:44:59
	(end): 2023-07-07 10:45:10
	active: true
	business_duration: 1 Minute
	business_pause_duration: 33 Seconds
	business_percentage: 2.19
	business_time_left: 58 Minutes
	duration: 1 Minute
	end_time: 
	has_breached: false
	original_breach_time: 2023-07-07 11:43:18
	pause_duration: 33 Seconds
	pause_time: 2023-07-07 10:45:10
	percentage: 2.19
	planned_end_time: 2023-07-07 11:43:51
	read_allowed: true
	schedule: 
	sla: Priority 1 resolution (1 hour)
	stage: Paused
	start_time: 2023-07-07 10:43:18
	sys_created_by: 
	sys_created_on: 2023-07-07 10:43:18
	sys_domain: global
	sys_id: cf3a7447537b2110610bddeeff7b12f9
	sys_mod_count: 
	sys_tags: 
	sys_updated_by: 
	sys_updated_on: 2023-07-07 10:43:18
	task: INC0010001
	time_left: 58 Minutes
	timezone: Europe/London
	variables: 
Stage-7
	(start): 2023-07-07 10:45:10
	(end): 2023-07-07 10:45:20
	active: false
	business_duration: 1 Minute
	business_pause_duration: 43 Seconds
	business_percentage: 2.19
	business_time_left: 58 Minutes
	duration: 1 Minute
	end_time: 2023-07-07 10:45:20
	has_breached: false
	original_breach_time: 2023-07-07 11:43:18
	pause_duration: 43 Seconds
	pause_time: 
	percentage: 2.19
	planned_end_time: 2023-07-07 11:44:01
	read_allowed: true
	schedule: 
	sla: Priority 1 resolution (1 hour)
	stage: Completed
	start_time: 2023-07-07 10:43:18
	sys_created_by: 
	sys_created_on: 2023-07-07 10:43:18
	sys_domain: global
	sys_id: cf3a7447537b2110610bddeeff7b12f9
	sys_mod_count: 
	sys_tags: 
	sys_updated_by: 
	sys_updated_on: 2023-07-07 10:43:18
	task: INC0010001
	time_left: 58 Minutes
	timezone: Europe/London
	variables: 

 

2 Comments