Standard Change: You’re doing it wrong!

ericledyard · ‎02-21-2020

Change management is one of the biggest hurdles in modern DevOps transformations. Traditionally, it is a very manual process that takes physical meetings of a Change Advisory Board (CAB) who would review all changes and decide whether to approve them into Production. This manual process slows down development teams and takes them away from writing applications or features that are valuable to their customers.

By ITIL standards, a “standard change” is - a pre-authorized change that is low risk, relatively common and follows a specified procedure or work instruction. This was great when organizations were still using manual change processes to approve work. However, modern customers are utilizing new methods of automated change management that utilize technology platforms to gather data from the tools themselves, make risk calculations in real-time, understand dependencies and change windows, and automate the approval workflow for a change to flow through the system as quickly as possible. This automated change management is the key to being able to dynamically throttle risk in the SRE governance models.

Many development teams have operationalized bad practices in the form of pre-approved changes that they can send through without normal change approval governance. While on the surface, it might appear to be a good idea to bypass the lengthy change process, in reality, this practice can keep companies from reaching the pinnacle of modern operational models for IT.

If you look how the industry is transforming, you will see that DevOps is the beginning of an overall transformation effort. The most mature organizations are reaching a state that incorporates Site Reliability Engineering (SRE) as the primary model for IT Operations. In this model, it is important to implement Dynamic governance to provide you the capability to throttle your risk exposure when needed.

The team at 451 Research posted this image, which we feel accurately summarizes the path of operating models for companies in 2020 and beyond:

Here, you can see that Modern and Future DevOps teams are utilizing SRE concepts as part of their primary motions. One of the main concepts of SRE is the concept of “error budgets.”

Error Budgets are defined as follows from Google:

Error budgets are the tool SRE uses to balance service reliability with the pace of innovation. Changes are a major source of instability, representing roughly 70% of our outages, and development work for features competes with development work for stability. The error budget forms a control mechanism for diverting attention to stability as needed.

An error budget is 1 minus the SLO of the service. A 99.9% SLO service has a 0.1% error budget. If our service receives 1,000,000 requests in four weeks, a 99.9% availability SLO gives us a budget of 1,000 errors over that period.

In order to implement this type of environment, where you can dynamically adapt to the error budget that you are operating with at any given time, you need to be able to throttle your risk appetite.

I think of this in the same way the same way that I view riding in a motorized vehicle – you adjust the amount of throttle (hitting the gas) you use based on the level of risk acceptable in the environment (speed limit, environmental variables, traffic, etc.).

In the picture above, you can see that the error budget is 35%. When that occurs, we need to dynamically change the policy that we have for accepting changes into production. We have the ability, using Change Automation technologies, to set policies that will either allow or disallows changes into production based on criteria that we define. The way we use these conditions is to say something like:

“If, error budget = >X% – Then, allow innovate and experiment changes to flow unhindered into Production. Else,

If error budget = 0% - Then, resiliency changes only – no innovation and experiment changes.”

What we are seeing in the industry that is a cause for concern is that many companies are having Dev or DevOps teams coming to Change Managers and requesting that they be allows to circumvent the Normal change process by implementing something typically referred to as “standard change”.

Standard Change was a workaround for an old, manual process that no longer belongs in today’s modern DevOps environment. Automated and Dynamic governance is the answer. We commonly say companies should be: “Doing Normal changes at the speed of Standard Changes.”

Another trick that Developers/DevOps Teams will try to leverage is that they will try to say: “We don’t need change management; we just want to do Change Registration.” This may be true for a handful of organizations that may not be regulated at all, but for most regulated organizations, they will never be able to eliminate governance, risk, and compliance. For these companies, Dynamic DevOps is the future and is what they should be trying to implement. In order to put themselves on the correct path, they need to implement a change automation solution that leverages data gathering, risk, and automated approvals instead of just change registration.

It's all about control and the ability to enable the SRE concept. The SRE must have the ability to slow things down or shut things down if they need.

The future of change is to be customer-centric, as with most things in the digital transformation age. The ability to meet or exceed customer SLO’s and SLA’s, by dynamically throttling your risk levels throughout the year is the future we should all strive to attain.