Measuring & Monitoring CI/CD Performance

Continuous improvement is one of the cornerstones of the DevOps philosophy.

It extends to every aspect of software development, from the product or service that you’re building to your organization’s culture and processes.

Continuous improvement involves collecting and analyzing feedback on what you’ve built or how you’re working in order to understand what is performing well and what could be improved. Having applied those insights, you collect further feedback to see if the changes you made moved the needle in the right direction, and then continue to adjust as needed.

A CI/CD pipeline plays a central role in enabling continuous improvement of your software. By shortening the time from development to deployment, you can release changes to users more frequently and so get feedback from use in production, which informs what you prioritize next. Likewise, the rapid feedback provided from each stage of automated testing makes it easier to address bugs and helps you to maintain the quality of your software.

But continuous improvement does not stop there. By applying the same techniques to the CI/CD pipeline itself, you can refine the process of building, testing and releasing your software, which amplifies the feedback loops you use to improve your product.

Understanding pipeline metrics

“You can’t manage what you don’t measure”, as the saying goes.

Metrics are an essential tool for improving system performance – they help to identify where you can add value and offer a baseline against which to measure the impact of any improvements you make.

Performance in CI/CD encompasses both speed and quality; there should not be a trade-off between deploying changes fast and delivering a robust and reliable product – a high performing CI/CD pipeline will allow you to deliver on both.

By measuring and monitoring the speed of activities, the quality of your software, and the extent to which you’re using automation, you can identify areas for improvement and then confirm whether the changes you’ve made have had a positive effect.

Top-level performance DevOps metrics

The following four metrics have been identified by DevOps Research and Assessment (DORA) as high-level metrics that provide an accurate indication of how well an organization is performing in the context of software development.

You can learn more about the research that informed these choices in the book Accelerate.

Lead time

Although lead time (also known as time to delivery or time to market) can be measured as the time from when a feature is first raised until it is released to users, the time involved in ideation, user research and prototyping tends to be highly variable.

For this reason, the approach taken by DORA is to measure the time from code being committed to deployment, which allows you to focus just on the stages within the scope of your CI/CD pipeline.

A long lead time means that you’re not getting code changes in front of users regularly and therefore not benefitting from feedback to refine what you’re building. That can be due to several factors.

A release pipeline that involves manual steps, such as large numbers of manual tests, risk assessments or change review boards, can add days or weeks to the process, undermining the advantages of frequent releases.

While investing in automated testing will address the former, the latter requires engagement with stakeholders to understand how their needs can be met more efficiently. Alternatively, if the automated steps are slow or unreliable, then build duration metrics can be used to identify the stages taking the most time.

Deployment frequency

Deployment frequency records the number of times you use your CI/CD pipeline to deploy to production. Deployment frequency was selected by DORA as a proxy for batch size, as a high deployment frequency implies fewer changes per deployment.

Deploying a small number of changes frequently lowers the risk associated with releasing (because there are fewer variables that can combine into unexpected results), and provides feedback sooner.

A low deployment frequency can signify that the pipeline is not being fed with regular commits, perhaps because tasks are not being broken down, or it can be the result of batching changes up into larger releases.

When changes need to be batched up for business reasons (for example, due to user expectations), measuring the frequency of deployments to staging sites instead will allow you to monitor batch size and assess whether you’re reaping the benefits of working in small increments.

Change failure rate

Change failure rate refers to the proportion of changes deployed to production, which result in a failure, such as an outage or bug that requires either a rollback or hotfix. The advantage of this metric is that it puts failed deployments in the context of the volume of changes made.

A low change failure rate should give you confidence in your pipeline; it indicates that the earlier stages of the pipeline are doing their job and catching most defects before your code is deployed to production.

Mean time to recovery

Mean time to recovery or resolution (MTTR) measures the time it takes to address a production failure. MTTR recognizes that, in a complex system with many variables, some failures in production are inevitable. Rather than aiming for perfection (and therefore slowing down releases and forfeiting the benefits of frequent releases), it’s more important to respond to issues quickly.

Keeping your MTTR low requires both proactive monitoring of your system in production to alert you to problems as they emerge, and the ability to either roll back changes or deploy a fix rapidly via the pipeline.

A related metric, mean time to detection (MTTD), measures the time between a change being deployed and your monitoring system detecting an issue introduced by that change. By comparing MTTD and build duration, you can determine if either area would benefit from some investment to reduce your MTTR.

CI & Operational metrics

In addition to these high-level measures, there is a range of operational metrics that you can use to better understand how your pipeline is performing and identify areas where you might be able to improve your process.

Code coverage

In a CI/CD pipeline, automated tests should provide the majority of your test coverage, freeing up your QA engineers to focus on exploratory testing and defining new test cases. The first layer of automated tests performed should be unit tests, as there are the quickest to run and provide the most immediate feedback.

Code coverage is a metric provided by most CI servers that calculates the proportion of your code covered by unit tests. It’s worth monitoring this metric to ensure that you’re maintaining adequate test coverage as you write more code. If your code coverage is trending downwards over time, it’s time to invest some effort in this first line of feedback.

Build duration

Build duration or build time measures the time taken to complete the various stages of the automated pipeline. Looking at the time spent at each stage of the process is useful for spotting pain points or bottlenecks that might be slowing down the overall time it takes to get feedback from tests or deploy to live.

Test pass rate

Test pass rate is the percentage of test cases that passed successfully for a given build. As long as you have a reasonable level of automated tests, it provides a good indication of each build’s quality. You can use this metric to understand how often code changes are resulting in failed tests.

While catching failures with automated tests is preferable to relying on manual tests or discovering issues in production, if a particular set of automated tests are regularly failing, it might be worth looking at the root cause of those failures.

Time to fix tests

Time to fix tests is the time between a build reporting a failed test and the same test passing on a subsequent build. This metric gives you an indication of how quickly you’re able to respond to issues identified in the pipeline.

A low resolution time shows you’re using your pipeline to best effect; by dealing with issues as soon as they are found, you can work more efficiently (as the changes are still fresh in your mind), and you avoid building more functionality on top of unstable code.

Failed deployments

Failed deployments that result in unintended downtime, require the deployment to be rolled back or require a fix to be released urgently. The count of failed deployments is used to calculate the change failure rate (discussed above).

Monitoring the proportion of failures out of the total number of deployments helps measure your performance against SLAs.

However, bear in mind that a target of zero (or very few) failed deployments is not necessarily realistic, and can instead encourage teams to prioritize certainty. Doing so results in longer lead times and larger deployments as changes are batched together, which actually increases the likelihood of failures in production (as there are more variables) and makes them harder to fix (as there are more changes to wade through).

Defect count

Unlike failures, a defect count refers to the number of open tickets in your backlog classified as bugs. It can be further broken down by issues found in testing or staging and issues found in production.

Like code coverage, monitoring the number of defects is useful for alerting you to a general upward trend, which can indicate that bugs are getting out of hand. Keep in mind, however, that making this metric a target can lead to your team focusing more on classifying tickets than on fixing them.

Deployment size

As a corollary to deployment frequency (see above), deployment size – as measured by the number of story points included in a build or release – can be used to monitor batch size within a particular team.

Keeping deployments small shows that your team is committing regularly, with all the benefits that entails. However, as story estimates are not comparable across development teams, this metric should not be used to measure overall deployment size.

Conclusion

Monitoring these metrics allows you to better understand how well your CI/CD pipeline performs and whether you are on an upward or downward trend.

You can use metrics to identify areas of your process that would merit further attention. Once you’ve made a change, it’s good practice to keep monitoring the relevant metrics to verify whether they had the intended effect.

However, while metrics can provide useful indicators of performance, it’s important to read the numbers in context and to consider which behaviors might be incentivized by focusing on a particular metric. Bear in mind that the goal is not the numbers themselves, but keeping your pipeline fast and reliable so that you can keep delivering value to users.