How to overcome the liability and risk impacts of software-based infrastructure management
In the previous installment of this series, I discussed why software-based control of applications and infrastructure was becoming a necessity. Now, let’s take a look at some of the risks of this approach, and what skills will be required to mitigate them.
Software engineering expertise
In a software-controlled infrastructure environment, sophisticated engineering skills are critical for IT engineers, mainly because of the risk exposure.
The development of risk exposure
At any large organization, the infrastructure management departments typically purchase managed services, which makes them extremely sensitive to:
- Availability and performance
Software-based control unleashes the power to manage their IT at scale, but with great power comes great responsibility. It also increases the security risk of human errors. For example, misconfigurations that expose the system to threats or simply by programming software with wrong instructions (bugs).
The potentially significant disruption to a large digital infrastructure by a small software bug exposes organizations to new liabilities. A simple mistake like adding an extra 0 in a script can generate 10x more servers than needed, with direct cost implications in the range of thousands of Euros.
According to a recent IDC survey, the top three cloud security threats are security misconfiguration, lack of visibility into the production environment, and improper identity and access management (IAM) and permission configurations.
The countermeasure for these is again software-driven. Predictable configuration pipelines and continuous real-time policy information enforcements are the most likely way to prevent mishaps from happening. Essentially, we have software (computers) controlling what other software (computers) do to manage digital assets (computers), and the liability of failure is enormous.
It should be clear now that this deeply recursive dependency on software to implement controls over the software that controls digital assets is an amplifying factor for liability exposure in software quality control defects. According to classic risk management practices, the calculated impact of an issue is its likelihood to happen, multiplied by the severity of the issue. The severity increases immensely if digital infrastructure is managed by a recursive stack of controlling software. There is no software without bugs, but a single bug can now bring down hundreds of customer applications or servers.
We need a mission-critical class of software, so let’s focus a bit on what influences software quality:
Known caveats in software engineering
The history of software engineering has taught us several non-quality root causes for software quality issues like:
- Lack of effective testing or rigorous version control
- Poorly defined requirements
- Software bugs
- Poorly designed software, with lack of appropriate reaction on unexpected behaviors and errors
- Software not designed for supportability
- Undocumented software which is untransferable to other teams
- Uncontrolled growth of poorly engineered software, leading to a loss of control
- Communication and difficulty distributing work across teams and geographies
Software expertise as a key success factor
The quality of the software being managed is not the same as the autopilot of a space shuttle. However, it is not given the exposure to error. It will require software engineering techniques that are sophisticated and mature to ensure the quality and predictability of the solution. Key success factors for that will be experience and expertise. Bad software engineering practices plagued the early days of software engineering (and still do sometimes), until the development community got some real help with the technology thanks to modern programming languages, object orientation, design patterns, and automated code validations.
The good news is that all the techniques for writing good quality software are known and directly applicable in the infrastructure management business, provided the workforce has the skills to write professional, world-class, robust software. It is about change management.
The workforce has to have the skills to write professional world-class robust software.
First, understanding the criticality of the problem, and second, making sure the teams master the technology required. Finally, business management will need to approve the time and cost needed to deliver for a given quality.
Some concrete examples
- The software engineering business lives with the following elements deeply entrenched in their teams’ DNA and operating model. The same applies to the infrastructure and application service management domain now.
- Continuous integration and continuous delivery (CI/CD): The use of continuous integration is growing, as is the delivery orchestration to manage incremental software version deployments. If infrastructure becomes code, it must be managed as code, which is now done by a CI/CD platform. This impacts the governance of the release management, as release approval boards can now become software engineering code reviewers during a CI/CD branch merge and promotion process. This is being performed by different actors with varied knowledge controlling numerous aspects of the release, which is different from the classic ITSM change approval boards (CAB).
- Software asset modularity and reusability: Developing complex, scalable software implies architecturing patterns that promote modularity (the concept of independence of modules), production and maintenance of a reusable catalog of software assets. A complete software development kit with libraries of reusable modules enables complex deployments to be decomposed into smaller blocks, all pre-tested and qualified earlier. The complexity and growth of the software are managed by encapsulation into block box modules like a Russian doll, so the human mind may grasp it at all levels. Architectural patterns well known to application software engineers (like model view controllers or variations) are highly recommended, because we know that software cannot grow without the proper separation of concerns between modules. Software growth is a direct result of team structure and disciplines. In the infrastructure management domain, out-of-control software will result in infrastructure issues by the hundreds, leading to thousands of angry customers.
- Dependency injection and digital assurance testing: As per the ISO9000 quality framework, qualifying the process by which you produce a result is more important than the result. If the process is qualified and guaranteed, the resulting product will also be guaranteed. That is almost the only way to qualify the system, since the number of combinations becomes too important to test exhaustively. This is nothing new to software engineering teams: testing is part of their DNA, and test-driven development (TDD) practices and automated testing are recommended. However, this requires mastering concepts like mocking frameworks or dependency injections, to name a few. This can be used to enable testing by injecting a mock-up module for a contracted interface (such asan interface in C# or a protocol in Swift). Automated assertions can trigger, test and verify that the code produces the expected outcomes. Thanks to a draft framework, this is verified without having to actually execute the actions on the cloud or generate the actual cost.
- APIs: Customers are following the same technological trend curve, and their evolution to a software-based world is happening simultaneously. In the past, it was acceptable to have some form of batch file data transfer or rudimentary request/response type of interactions. However, with the advent of secure APIs, the demand is shifting to a much finer level of integration. To support event driven and/or asynchronous integrations, providers must expose secure APIs and events to their customers. These programmatic interfaces are well-known to software engineers who have been using them for many years, but are new to the infrastructure service provider’s workforce.
Tune in next week for our last chapter, which will explain how enterprises need to re-shape the skills profile of their IT management teams to support a software-driven model.