- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-02-2024 12:54 PM
Seeking a solution when an offering is partially impacted. For example the outage was for 1 hour but during that time it was identified during post analysis that 20% of transactions failed. Team is desiring to adjust the end time of the outage to only count 20% of the minutes but that is fuzzy math. Anyone come across this?
For the past year I have been suggesting to define the level of service that when breached would constitute an outage and if it is under that it is a degradation. So you define in your offering and get agreement from both Tech and Business that a 10% failure rate is acceptable, but will be measured as a degradation but anything over that would be considered a disruption of service and thus an unplanned outage impacting availability.
This line of reasoning has not been popular, however I don't like adjusting minutes based on transactions due to not being apples to apples. Normalization doesn't seem right. Thoughts?
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-02-2024 01:41 PM
Hi, I would not think it appropriate to update a time metric based on a percentage of transactions within the time window. IE If 20% of transactions failed within an hour with the first failing @ 00:00.001 and the last @ 59:59.999
you still effectively had an hour in which transactions failed.
Also, I agree with you around the requirement to clearly define the service properly based on impact and so the service is either degraded or an unplanned outage, but it would be wrong to try and reduce the outage time window based on values that at not clearly identified in the definition.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-02-2024 01:41 PM
Hi, I would not think it appropriate to update a time metric based on a percentage of transactions within the time window. IE If 20% of transactions failed within an hour with the first failing @ 00:00.001 and the last @ 59:59.999
you still effectively had an hour in which transactions failed.
Also, I agree with you around the requirement to clearly define the service properly based on impact and so the service is either degraded or an unplanned outage, but it would be wrong to try and reduce the outage time window based on values that at not clearly identified in the definition.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-08-2024 07:54 AM
Thanks for validating my line of reasoning, just wondered if other Product Managers are getting a request to normalize outage times based on the percentage of transactions.