Overcoming the liability and risk of software-based infrastructure management

In the previous installment of this series, I discussed why software-based control of applications and infrastructure was becoming a necessity. Now, let’s take a look at some of the risks of this approach, and what skills will be required to mitigate them.

Developing software engineering expertise for infrastructure

In a software-controlled infrastructure environment, sophisticated engineering skills are critical for IT engineers, mainly because of the risk exposure. At any large organization, the infrastructure management departments typically purchase managed services, which makes them extremely sensitive to:

Cost
Security
Availability and performance

Software-based control unleashes the power to manage their IT at scale, but with great power comes great responsibility. It also increases the security risk of human errors. For example, misconfigurations that expose the system to threats or simply by programming software with wrong instructions (bugs).

The potentially significant disruption to a large digital infrastructure by a small software bug exposes organizations to new liabilities. A simple mistake like adding an extra "0" in a script can generate 10x more servers than needed, with direct cost implications in the range of thousands of Euros.

According to an IDC survey, the top three cloud security threats are security misconfiguration, lack of visibility into the production environment, and improper identity and access management (IAM) and permission configurations.

The countermeasure for these is again software-driven. Predictable configuration pipelines and continuous real-time policy information enforcements are the most likely way to prevent mishaps from happening. Essentially, we have software (computers) controlling what other software (computers) do to manage digital assets (computers), and the liability of failure is enormous.

It should be clear now that this deep dependency on software to implement controls over the software that controls digital assets amplifies the liability exposure in software quality control defects. According to classic risk management practices, the calculated impact of an issue is its likelihood to happen, multiplied by the severity of the issue. The severity increases immensely if digital infrastructure is managed by a stack of controlling software. There is no software without bugs, but a single bug can now bring down hundreds of customer applications or servers.

We need a mission-critical class of software, so let’s focus a bit on what influences software quality:

Known risks in software engineering

The history of software engineering has taught us several non-quality root causes for software quality issues like:

Lack of effective testing or rigorous version control
Poorly defined requirements
Software bugs
Poorly designed software, with lack of appropriate reaction on unexpected behaviors and errors
Software not designed for supportability
Undocumented software which is untransferable to other teams
Uncontrolled growth of poorly engineered software, leading to a loss of control
Communication and difficulty distributing work across teams and geographies

Software expertise as a key success factor

The quality of the software being managed is not the same as the autopilot of a space shuttle. However, it is not given the exposure to error. It will require software engineering techniques that are sophisticated and mature to ensure the quality and predictability of the solution. Key success factors for that will be experience and expertise. Bad software engineering practices plagued the early days of software engineering (and still do sometimes), until the development community got some real help with the technology thanks to modern programming languages, object orientation, design patterns, and automated code validations.

To minimize infrastructure management risks, the workforce must have the skills to write robust, professional, world-class software.

The good news is that all the techniques for writing good quality software are known and directly applicable in the infrastructure management business, provided the workforce has the skills to write professional, world-class, robust software. It is about change management.

First, understanding the criticality of the problem, and second, making sure the teams master the technology required. Finally, business management will need to approve the time and cost needed to deliver for a given quality.

How to overcome infrastructure management risks

The software engineering business lives with the following elements deeply entrenched in their teams’ DNA and operating model. The same must now apply to the infrastructure and application service management domain.

Continuous integration and continuous delivery (CI/CD): The use of continuous integration is growing, as is the delivery orchestration to manage incremental software version deployments. If infrastructure becomes code, it must be managed as code, which is now done by a CI/CD platform. This impacts the governance of the release management, as release approval boards can now become software engineering code reviewers during a CI/CD branch merge and promotion process. This is being performed by different actors with varied knowledge controlling numerous aspects of the release, which is different from the classic ITSM change approval boards (CAB).

Software asset modularity and reusability: Developing complex, scalable software requires architecture patterns that promote modularity (the concept of independence of modules), production and maintenance of a reusable catalog of software assets. A complete software development kit with libraries of reusable modules enables complex deployments to be decomposed into smaller blocks, all pre-tested and qualified earlier. The complexity and growth of the software are managed by encapsulation into block box modules like a Russian doll, so the human mind may grasp it at all levels. Architectural patterns well known to application software engineers (like model view controllers or variations) are highly recommended, because we know that software cannot grow without the proper separation of concerns between modules. Software growth is a direct result of team structure and disciplines. In the infrastructure management domain, the risks include out-of-control software that results in hundreds of infrastructure issues, leading to thousands of angry customers.

Dependency injection and digital assurance testing: As per the ISO 9000 quality framework, qualifying the process by which you produce a result is more important than the result. If the process is qualified and guaranteed, the resulting product will also be guaranteed. That is almost the only way to qualify the system, since the number of combinations becomes too important to test exhaustively. This is nothing new to software engineering teams: testing is part of their DNA, and test-driven development (TDD) practices and automated testing are recommended. However, this requires mastering concepts like mocking frameworks or dependency injections, to name a few. This can be used to enable testing by injecting a mock-up module for a contracted interface (such as an interface in C# or a protocol in Swift). Automated assertions can trigger, test and verify that the code produces the expected outcomes. Thanks to a draft framework, this is verified without having to actually execute the actions on the cloud or generate the actual cost.

APIs: Enterprises are following the same technological trend curve, and their evolution to a software-based world is happening simultaneously. In the past, it was acceptable to have some form of batch file data transfer or rudimentary request/response type of interactions. However, with the advent of secure APIs, the demand is shifting to a much finer level of integration. To support event driven and/or asynchronous integrations, providers must expose secure APIs and events to their customers. These programmatic interfaces are well-known to software engineers who have been using them for many years, but are new to the infrastructure service provider’s workforce.

In the last chapter of this series, I will explain how enterprises need to help IT teams re-shape their infrastructure management skills profiles to support a software-driven model.

Posted on: January 12, 2023

Alexis Mermet-Grandfille

Global Head of Atos Cloud & Infrastructure Tooling and Automation Service LineFuture Makers Research Community

View detailsof Alexis Mermet-Grandfille >

Managing your cookies

Necessary cookies

Marketing cookies

Social media cookies

Overcoming the liability and risk of software-based infrastructure management

Developing software engineering expertise for infrastructure

Known risks in software engineering

Software expertise as a key success factor

How to overcome infrastructure management risks

Alexis Mermet-Grandfille

Categories

Related posts

Share this blog article