Each year, DevOps Research and Assessment (DORA) within Google Cloud publishes the excellent State of DevOps report. The 2023 report published in Q4 was as good as ever and in particular documented some significant, research-based, implications for security.
The overall conclusions, though not security specific, are worth summarizing up front because as I suspect you will all know they are conclusions vital for secure software development as well as overall quality and reliability.
Culture. A culture of technical improvement, excellence and continuous learning increases performance. (30% higher).
User Focus. Building with the end user in mind, while an obvious goal, can sometimes be overtaken by other goals. Teams that keep the user at the center of their efforts have better performance. (40% higher).
Code Reviews. Speeding up and otherwise improving code reviews is an essential part of improving delivery performance (speed, quality, security). Teams with faster code reviews have better delivery performance. (50% higher).
Improve Documentation. High quality documentation increases organization performance especially on deeply technical matters of system construction and maintenance. (12x+ better in trunk-based development).
Utilize Cloud. Increased flexibility from cloud deployment models, from infrastructure-as-code to the innate tooling in cloud environments drives delivery performance. (30% higher).
Balance. Harmonize speed, operational performance and user focus. Balancing development and operations is essential.
Distribute Work. Taking a fair approach to distributing work to make the most of people’s skills, experience and relative workloads is a solid indicator of high performing teams.
This short post is not a replacement for the benefit and insights you will gain from reading the full report but here are some of the insights I found most interesting looking through the security lens.
User focus. DevOps started off as improving alignment between software development and operations practices. Now, it’s naturally just as much an imperative to bring the end user into sharper focus and the results of the research confirm that doing so has dramatic effects for delivery performance. This is mostly because what is delivered will more likely match expectations. Often security in the development cycle is focused extensively reducing the risk of vulnerabilities. When a user focus is prioritized this then more naturally extends into thinking about a broader range of control considerations. To neglect these could have just as many consequences as failing to identify and eliminate a classic security vulnerability. For example:
Ensuring an appropriate authorization architecture is built into the service including appropriate means of calling out to external authorization systems that adequately model the permission needs, both RBAC and ABAC, including required separation of duties permissions. Imagine failing to focus on the user's needs in, say, a high value payments system or to model the need for dual control in safety critical control systems.
Ensuring control flow and integrity checks are built into to monitor and reconcile invariants whether it’s in a financial application, an inventory / stock system or indeed anything where invariants need to be maintained for security, accuracy or auditing.
Loose Coupling. Loosely coupled designs enable specific teams to test, build and deploy without other teams being a potential bottleneck. Loosely coupled architecture leads to loosely coupled teams which permits the type of flexibility that can increase change throughput and reduce operating risk. This permits increased security not only because changes can be made quickly but also because more changes will be accepted as the risk of change will be lower than the risk of tolerating the vulnerability. Thus loose coupling changes the risk tolerance calculus in the favor of security. It also permits more efficiently scoped design reviews to permit better scaling of the security team which in turn drives up a higher coverage percentage than would otherwise be achievable. There are some significant caveats to this depending on how well-architected the services are, how logical their composability is, and the quality and security of the APIs between services - including how consistently security properties (typically permissions) flow through those structures. Loose coupling permits and amplifies the effectiveness of continuous integration and more efficient code reviews.
Releasability. This is a fundamental tenet of continuous delivery. That is to constantly keep software in a “golden” releasable state. To do this requires constant and fast feedback that is acted upon to sustain the releasability goal. Working to make sure changes are sufficiently tightly scoped such that it is more likely this goal is met is important. A key security criteria is to decide what are the security properties to be a gate for releasability including passing appropriate tests. These tests might not just be checks for vulnerabilities using a variety of techniques but, arguably more effectively, might also include regression tests to ensure specific libraries are used (e.g. for crypto, authentication, authorization, logging) and that tests are made for essential control logic.
Artificial Intelligence. There hasn’t, yet, been the expected (perhaps over-hyped) increase in performance associated with the use of AI tooling. I think we all expect this to come and there’s some debate as to whether the benefits we’ve seen so far are driven at an individual developer level vs. an overall team. Anecdotally, from my experience, there’s been productivity enhancements in generating “boilerplate” for software or more recently for generating infrastructure-as-code configurations. Perhaps this is not yet seen at a team level since for well-managed teams paying close attention to loose-coupling and continuous delivery / releasability there’s already some significant automation benefit from various IDE-plugins, APIs and team-specific libraries / patterns that make AI trained on more generic data, not give this such a boost. As AI develops we’ll see significant benefit in training / fine-tuning on one’s own team's code base, tooling and practices as well as general increase in broader security, architecture and control pattern logic both for software and controls-as-code. In other words, marginal use now, but significant benefit is coming fast. More detail on all of this from the DORA report:
Documentation. This probably isn’t high on everyone’s natural priorities to improve software delivery performance but the research results are quite clear that it does. There’s a few reasons for this ranging from the very act of documenting system designs, invariants and other key properties forces clarity of thinking which in turn improves the design. It also acts as a repository for knowledge such that newbies to a team can come up to speed more effectively and quickly and make less errors on their learning curve. Looking at this through the security lens, one of the big opportunities here is making sure a system specific threat model is included as part of the overall system documentation and that this is used to drive additional recorded invariants and principles. Again, while documentation is valuable as is, I’ve often found the value in producing a threat model is about developing a shared understanding of goals between the development team (and their embedded security engineers if they have them) and the security team itself.
Reliability. We can all intuit the intimate connection between reliability and security and there are many opportunities to align these disciplines across performance measurement and reliability feedback improvements. This can drive improvements in automation including de-risking operational privilege levels which not only reduces insider risk but also reliability issues coming from erroneous or negligent changes. In other words, automation reduces scope for human error and human malevolence - a significant security benefit. Adoption of SRE practices as part of this is a vital underpinning and brings security benefits from reducing toil and embracing the positivity of blameless postmortems.
Infrastructure Flexibility. The ability of teams to develop in and deploy to infrastructure that is responsive to their needs would also seem to be an intuitive driver of team performance and, again, the results show it is. Typically this implies cloud usage, but it can also be on-premise cloud like infrastructure. Additionally, the findings show how a team uses the cloud is a stronger predictor of performance than simply that they use the cloud. Using the cloud well (in a way that supports flexibility) is a boost but for those organizations naively “lifting and shifting” it seems to be less so. To quote the report directly:
“To maximize your potential for benefit, you must rethink how you build, test, deploy, and monitor your applications. A big part of this rethinking revolves around taking advantage of the five characteristics of cloud computing: on-demand self-service, broad network access, resource pooling, rapid elasticity, and measured service.”
Now from a security perspective the benefit of cloud or cloud-like on premise infrastructure is clear but especially so from the ability to rapidly update, to define and maintain a declarative infrastructure (infrastructure-as-code) as well as take advantages of more secure by design features such as end to end encryption and with certain application architectures use least privilege constructs like service meshes, binary authorization, and zero-touch production management.
Bottom line: effective DevOps practices drive increased organization performance and have significant adjacent benefits for security teams that align into this work. Indeed such practices not only improve security, but even better, the right approach by the security team can also benefit DevOps.
Comments