Risk management in the age of digital transformation means resilience, not avoidance

Spread the love

Risk has become an inevitable aspect of the modern business landscape. This assertion does not inherently mean that all businesses invite risk, but rather it is a universal truth that there is no 100% certain way to eliminate risk.

The two biggest tech companies in the world have, just recently, had to fend off major zero-day exploits in their software. A recent ransomware attack shut down a major part of the U.S. energy infrastructure, creating ripple effects that impacted the lives of consumers. And, most notably, the COVID-19 pandemic showed that risks can come from unexpected places, like public health, causing conditions that require companies to undergo dramatic shifts in how they operate.

These events show that total control of risk is an illusion. Organizations should, instead, be structured in a way that allows them to recognize objective risks, respond to them, and evolve alongside the shifting dynamics. 

When enterprises become more resilient, they discover that managing risk in digital transformation becomes much more feasible. After acknowledging the inevitability of risk, modern organizations undergoing digital transformation should take steps to:

  • Monitor and quantify risks
  • Put systems and processes in place to respond quickly to emergent risks
  • Develop a culture of adaptability, so that risk management can become a baked-in part of continuously changing processes, practices, and products

To be high velocity, organizations must be responsive to risks

Risk doesn’t necessarily come from attackers or disruption threats. Both pose serious threats, but the nature of reality itself means that risk can come from anywhere. One of the best ways to depict this multifaceted, omnipresent risk can is by using the “VUCA” model:
 

vuca

Because of VUCA, the speed of deployment cycle time or deployment frequency isn’t enough to maintain a “high velocity” digital organization. Like a certain blue hedgehog trying to speed towards the exit of a video game level, you may suddenly encounter unexpected threats that halt progress.

sonic

Maintaining velocity requires a generalized blanket of risk management. This blanket is made up of diligent, continuous investments combined with continuously revised best practices. When risk management is built into the infrastructure of an organization through both assets and practices, the result is that the organization can be more resilient in response to unanticipated threats. IT service velocity improves, lowering the mean time to restore (MTTR) any time threats or incidents emerge. IT can even begin to anticipate major incidents using machine learning and historical data, reducing the potential impact and sometimes preventing an incident altogether.

Resilience is also represented by collaboration between departments and within cross-functional teams. Broader knowledge and coordination among multiple disciplines reduce blind spots to risk while accelerating overall responsiveness.

Finally, organizations must recognize the persistent presence of risk and proactively manage it on a day-to-day basis. Assured conformance through automated compliance testing and shift left practices in development incorporates risk management into the development cycle without slowing velocity of creating gated approval bottlenecks.

it objectives

Invest in analytics to monitor, and quantify risks

One of the biggest factors in resilient organizations is that they actively quantify risks. While knowing all risks is impossible, machine learning and AI-based modeling can leverage historical data to determine the primary drivers of risk within an individual organization. In other words, one of the most potent weapons organizations have in combating risk is their own data.

Analytics can be used to model threats based on historic data of past incidents. This threat model reveals the main drivers of incident risks while monitoring for their presence. The effect is similar to a tornado prediction model. While not every tornado warning is 100% accurate, they are given when conditions are ripe for one to occur. Similarly, an incident prediction model can alert IT leaders, when conditions are present for an incident. It allows teams to isolate likely areas of impact and prepare to minimize disruptions. In some cases, they may even be able to avert an incident entirely.

Analytics can also allow teams to quantify risks and respond appropriately. A change risk prediction model, for example, scores expected risks based on their likelihood and impact. It also highlights the main drivers of risk, allowing teams to specifically address the factors contributing to a potential failed change. 

Using such a model, IT leaders can isolate recurring vulnerabilities and sources of change defects based on meta-data traits. Depending on the product environment, risks could be coming from particular feature areas, particular CIs, or even particular teams and individuals. Representing risk as a score allows change approval teams like CABs to quickly assess risks, then drill down to see more information. A risk scoring system can be used to create a rubric for appropriate responses:

  • Minimal risks with low possible impact can be remediated through automation or ignored
  • Medium-level risks can be explored through better testing coverage and addressed with minimal delays to CI/CD
  • High-level risks can call for a more thorough CAB review and a change freeze until the risk level is lowered

Shift left to make processes innately resistant to assurance risks

Making security, compliance, and governance (SCaG) priorities a single, gated step in the development process slows down change releases while often failing to provide the needed level of risk recognition and reduction. Instead, organizations can shift left SCaG efforts earlier into the development process, baking in compliance and other initiatives to the process itself.

As it stands now for many organizations, assurance/compliance reviews can halt a deployment, leading to significant delays. These reviews may also become a source of frustration for developers, contributing to unplanned work if the review uncovers a potential flaw that needs addressing.

Shift-left reduces the need for a single evaluation or gated approval stage. Shift left also bakes in SCaG goals to the DevOps process. Using cross-functional teams that include SCaG experts allows key engineering teams to gain core competency in the most important practices and disciplines of risk control and responsiveness. Product and release planning ends up better reflecting SCaG needs, for example, preventing a situation where a new feature is delayed for weeks or months because of a risk-generating oversight. Diffusing SCaG expertise throughout the development cycle prevents a situation where release builds are continually kicked back by assurance, iterated upon, and then have SCaG implemented on top of (rather than within) the product/feature design itself.

Accelerate IT service delivery to improve threat responsiveness

Analytics combined with continuous service improvement (CSI) initiatives can allow IT operations to respond nimbly and agilely to emergent threat situations. Proactive service improvement can be used to reduce drivers of risk as well as the sources of disruptions and persistent internal/external stakeholder dissatisfaction. 

The use of a Service Delivery Friction Index (SDFI) provides one example of IT service delivery improvement in action. An SDFI is created by multiplying a specific incident type’s number of appearances by that incident type’s MTTR. Because this calculation accounts for both how often a ticket type occurs and how long it takes to resolve, IT leaders are given an impact scope that quantifies the overall “pain” experienced from a user perspective. IT leaders can then identify “low-hanging fruit” for incident categories with a high SDFI factor. Resolving these incident types will dramatically improve service delivery while freeing up resources for innovation and proactive incident management.

AIOps can be used to model incident root causes and cluster incidents by similar qualities, allowing for more comprehensive responses to incident threats. Incidents that appeared unconnected reveal the pattern of cause-and-effect, allowing IT teams to resolve them once and for all. Development engineering teams can use this data to also work on technical debt reduction and also inform improved engineering practices for more resilient products overall. Together, development and operations can collaborate to chase after improved metrics like lowering escape defect ratio, etc. 

The main goal of leveraging analytics and AI is to be able to identify threats quickly, isolate them, and respond quickly to minimize impact — all while informing improved practices to eliminate root causes.

Seek continuous improvements to manage VUCA (since it cannot be avoided entirely)

Modern events have shown us that there is no such thing as avoiding VUCA. Individuals may bemoan the lack of protections for an energy pipeline’s digital infrastructure, for example, but there’s no taking these systems offline now. The fact that, initially, the Colonial Pipeline hackers got paid their ransom shows that some threats can’t be managed through traditional means.

Looking back at the birth of the modern industrialized movement, the goal of early 20th-century assembly lines was to add predictability and repeatable results to enterprise production. But now, a locked-in assembly system is too slow and unresponsive to realistically address the risks posed to the world. Even the most cautious, risk-averse businesses would have been caught off guard by something like COVID-19, forcing them to move at least some work to remote and some business to digital value streams. In other words, risk finds you, no matter what.

Since we cannot be oracles of risk, we must be prepared for it. Our organizations must be ready to adapt to emerging situations quickly. They must be also made capable of monitoring risk, quantifying it, and giving it context. They must integrate risk avoidance practices throughout the value stream, ensuring that compliance, security, and governance are not an afterthought nor a source of delays. Finally, they must be capable of reflecting upon historical data to evolve, improve, and position the organization to be better prepared to meet risk head-on than before.

Taking these steps positions a business to be better prepared for risks that lie ahead. After all, those businesses that are equipped to handle risk as it comes are more likely to survive the next time risk comes knocking on their door.

Analytics coupled with AI can play a powerful role in identifying risk and shielding your organization from it. Find out more by watching our recent webinar: “How to Achieve Resilient & High-Velocity IT Operations Through AI-Powered Analytics