Advice: SLA on Incident Updates (in-flight updates)

Marek Malinowsk
Kilo Contributor

Hey All!

First of all: Nice to "Meet" you all"

I'm looking for an advice / guidance - hopefully this is a good place to post!

Scenario: Incident is being created by Service Desk agent - depending on PRIO we defined "In-flight" updates (def. The maximum permissible interval between progress updates for issues until formal closure). We would like to measure an SLA compliance for those updates. But we want to do it in a smart way 🙂

********

Example:

Incident P2 - in-flight update (INF) : 3 days
Day 1 - Ticket created (in-flight count started) (INF: 0%)

Day 2: No update (INF: 33%)

Day 3: No update (INF: 66%)

Day 4: Updates to customer (reset of INF, INF: 0%)

Day 5: No update (INF: 33%)

...

Day 7: No update (INF: 100%)

Day 8: No update (breach of INF: 133%)

Day 9: Update and closure

********

Now - INF for whole ticket is? 50% , 2 updates, 1 in time, 1 delayed ?

Does it make sense?

Do you have similar SLAs defined?

We want to avoid situation that if first update is breached then... no need to work on that, because if 2nd, 3rd, etc.., will be in time, SLA will be breached.

Any suggestions, reading, guidelines appreciated.

Thanks // Marek

1 ACCEPTED SOLUTION

Tony Chatfield1
Kilo Patron

Hi, OOB the SLA percentage notifications are 50%,75%,100% but there is no reason why you could not create a custom SLA workflow based on the default version and update your SLA configuration to include 33%,66% etc. You just need to evaluate ROI verses cost and technical debt.
(This would not have to be for all SLA)
I also use SLA workflow to set a 'range' field on task_sla of 0,50,75,90,100 so that I can use this to report SLA in 'bands'.

 

A reoccurring SLA is not a cumulative result, each (3 day) sla would be either met or breached and OOB you will not get a overall SLA value. So if you wanted to create a cumulative total then custom reporting would be required.

  • 1 option could be the creation of a metric record when the task is closed, using a script to calculate the overall update SLA.

 

But from my perspective a cumulative SLA result returns no value, as an individual SLA is met or breached. So if 1 update breaches then the result is an SLA breach, not a 66% percent success if 2 other updates were met.
As a result I think this sort of scenario would confuse both client and support teams and over time may also see real SLA's being devalued as people incorrectly interpret results based on the cumulative update sla.

 

With regard to an SLA breaching and there then being no need to work on the task as it is already breached.

This is a process issue and in a managed services operational environment if an SLA was breached, I would expect the task involved to be immediately escalated to higher level management (breach notifications etc) and that rather than getting less attention the task and staff involved would receive additional focus, 1st to ensure that appropriate level of service was delivered, and 2nd to investigate why SLA had breached.

 

 

 

View solution in original post

3 REPLIES 3

Tony Chatfield1
Kilo Patron

Hi, OOB the SLA percentage notifications are 50%,75%,100% but there is no reason why you could not create a custom SLA workflow based on the default version and update your SLA configuration to include 33%,66% etc. You just need to evaluate ROI verses cost and technical debt.
(This would not have to be for all SLA)
I also use SLA workflow to set a 'range' field on task_sla of 0,50,75,90,100 so that I can use this to report SLA in 'bands'.

 

A reoccurring SLA is not a cumulative result, each (3 day) sla would be either met or breached and OOB you will not get a overall SLA value. So if you wanted to create a cumulative total then custom reporting would be required.

  • 1 option could be the creation of a metric record when the task is closed, using a script to calculate the overall update SLA.

 

But from my perspective a cumulative SLA result returns no value, as an individual SLA is met or breached. So if 1 update breaches then the result is an SLA breach, not a 66% percent success if 2 other updates were met.
As a result I think this sort of scenario would confuse both client and support teams and over time may also see real SLA's being devalued as people incorrectly interpret results based on the cumulative update sla.

 

With regard to an SLA breaching and there then being no need to work on the task as it is already breached.

This is a process issue and in a managed services operational environment if an SLA was breached, I would expect the task involved to be immediately escalated to higher level management (breach notifications etc) and that rather than getting less attention the task and staff involved would receive additional focus, 1st to ensure that appropriate level of service was delivered, and 2nd to investigate why SLA had breached.

 

 

 

Hi Tony!

Thanks for taking time and replaying with detailed answer!

I agree that SLA breach is SLA breach - no matter if it was done on the first, second, third approach, and it's a management / process challenge rather than configuration in the tool!

Also you have a point about cumulative SLA and fact that it could be confusing - that's something to re-consider.

One more thing that I would like to ask - is it possible to setup a SLA that is being restarted once action has been taken: so once update has been sent INF starts "counting" from zero, since another 3 days are given. Could you please direct me to the documentation covering this topic?

Hi, you can add a repeat condition to an sla definition which results in a new task_sla record every time the 'repeat' condition is met, but it sounds like you are wanting to reset 1 task_sla record?
It could be possible to do this with some scripting but I have never actually tried and cannot be sure of the complexity (I think it would be high).

https://docs.servicenow.com/bundle/sandiego-it-service-management/page/product/service-level-management/concept/c_SLAConditions.html