Agile Software Metrics

The rise of dynamic software development methodologies such as Extreme Programming or Agile Programming, reflect the inherent dynamism of modern software design.  The malleability of software, the rapid evolution of consumer and technology driven requirements, the difficulty of writing accurate specifications given all the unknowns, and the sheer complexity of the software ecosystem itself makes the ancient development waterfall from specification through execution and QA to release a hazardous and mostly futile affair.

Most software developments fail.  While the situation has improved over the last decade, this remains mostly true today.   Less than a third of all software projects meet their objectives in approximately the time expected.  Over 10% of all projects fail without deliver anything, and most of the rest under-deliver, are terribly late, or way over budget.

This blog post is a thinking-out-loud exploration of how modern Agile methods address these problems and how my thinking is evolving with regards to how success is defined and the probability of success maximized.

Historically, this stunning failure rate was mostly uncorrelated with the choice of methodology and only loosely correlated with team makeup.  The biggest gain has come over the past 10 years with the rise of different methodologies for rapid iteration:  build a working prototype, clarify objectives, plan the next concrete iteration, update long term budget forecasts and timeline, and repeat.  Why does this help?  The biggest benefits of rapid iteration are clarification of constraints that are ill-defined at the outset, improving the dialog between project owner and developers, and pulling the plug early on things that aren’t going to work.  Secondary benefits include a greater willingness to rewrite code that was initially ill-conceived, something harder to justify in pre-planned projects, and ongoing cost optimization; as development proceeds some features we thought were easy turn out to be expensive yet add less value than expected.  As we proceed we can optimize our investments to tradeoff value creation and costs.

There are a wide variety of ways to structure and manage a rapid iteration process so we can make these tradeoffs.  A recent project got me speculating about updating my thinking about metrics which formed in the context of embedded software almost 15 years ago.  Early in my career, one of the executives I worked with, Jim Miller, was fond of saying: “you get what you measure” which is now a well-known aphorism in the community.  A variation on this theme is that you can only regulate systems you can model.  Regardless of the clever ways we talk about it, development efficiency relies on the useful transparency of a system; useful transparency means features and metrics that help us understand how a system is working, and only enough to ensure we are meeting our goals.  The benefits of a culture of transparency applies equally to technology, team, and business planning.

Another good rule of thumb in managing is: measure outputs, not inputs.  Just about any process input, when measured and reinforced, encourages optimization of that measure.  If you measure lines of code, people will implement the same feature with more code, on average.  If you measure hours of work, you get more hours per unit of functionality.  If you measure predication accuracy, you get conservative, guaranteed-not-to-miss predictions.  If you measure test coverage, you get more tests.  All of these optimizations reduce quality, efficiency, and everyone’s satisfaction.

For example, if I write some tests on day 1, sit in a hammock for 3 days, and on the 5th day write 200 lines of code that implements an extremely tricky algorithm, passes all tests, and meets all business goals, then that hammock time should be heavily rewarded.  Many metrics penalize the formation of better ideas in the name of artificial “productivity”.  Better ideas pay off not immediately, but over the lifetime of a project in reduced bug rate, decreased future development time, and less testing infrastructure.

If we have working software delivered every 2-4 weeks, we can track progress by measuring output over multiple iterations of a project instead of relying on the ineffective tracking of inputs.  We can now separate the “what”, which is what we should be caring about, from the “how” which can and should vary widely among individuals, teams, and projects.  (Caveat: the first phase of a project does require some traditional design thinking including interviews, UX, architecture, strategy, etc.  Fodder for a future post.)

Dude, what metrics should we use?

At the highest level, the goal of a software development project is to deliver value to customers, with high quality, in a timely manner, with good efficiency.  Secondarily, we want to have a rolling forecast for the overall budget and timeline against the business value desired.  These are all facets of the output of a software development process, but not all of them can be easily or directly measured.

Value to Customers

How do we measure value to customers?  One method is to simply poll the project owners for how much value they think a given story/feature will have for their customers.  This can be in terms of revenue unlocked, if such forecasting is available, or simply a 1-5 rating scale.  To make tradeoffs during and between iterations, these subjective scores are needed anyway to help triage.  Developers at the same time are estimating the costs of implementing a story, so we can evaluate the cost vs. value empirically and shake loose any mis-estimated numbers that way.

Another method is to measure the satisfaction of various engaged stakeholders with the results of a sprint.  If it’s a manager, it’s how well did this iteration meet their expectations?  If a user, then how close is the product to something they would use?  Is it delightful?  Is it easy to use?  The questions asked are driven by the goals of the project.

High Quality

How do we measure the quality of a software deliverable?  I consider quality to be something along the lines of “works as advertised or implied”.  This means that the software does what it appears to be designed to do, and generates only a modest number of functional (I clicked send and nothing happened) or usability (I had to click send 3 times to get it to send) bugs.  Good agile deliverables will prompt a flood of feature requests and enhancements and these should not be considered bugs, but opportunities to focus the planning of the next iteration.

Internally, the team should be writing automated test suites or manual test plans to ensure that what is released is actually “done” and free of bugs.  Tracking the ongoing defect rate is helpful for evaluating the overall quality of the system and team.

  • New bug reporting rate.  Complex products or features may have a higher than normal bug rate that should decline over iterations as the design is improved.  The total new bug rate should decline as the project approaches completion.
  • Average bug longevity.  The quality of the software architecture and team is indirectly measured by the length of time bugs remain open and/or the ratio of bugs opened vs. bugs closed.  If the bugs are small “oops” bugs, they can be fixed in near real-time so that the open bug rate is near zero.  If the bugs are serious design flaws, then they will remain open longer and can take an iteration or longer to resolve.


The timeliness metric ensures that the team as a whole is credited for keeping the Agile iteration process going.  It’s easy to choose perfection or feature completion over timeliness, but developing trust between business owners and developers requires maintaining the transparency of delivered software on a regular schedule.  This means that features are dropped or not released to make sure the schedule is retained.

Is software released with most of the iteration features planned at the targeted time of release?  We can simply track this numerically: 2 points for releasing something exactly time, 1 point if it is delayed over a weekend.  0 points thereafter.


Perhaps the most difficult characteristic of a software project to capture is efficiency.  For the business owner, this means answering the question of whether development budget is being well-spent and whether the ultimate project ROI will make sense?

The history of metrics in software development mostly reflects a lack of trust of one part of the organization for another.  The transparency of deliverable software at the end of iterations helps build trust without using negative or micro-managing metrics.  Regular delivery demonstrates that the development team can meet its commitments, encourages developers to include project owners in the tradeoffs they make, and facilitates regular review of progress, goals, and incorporation of learning along the way.

However, this doesn’t answer the question of whether the team itself is setting appropriately ambitious goals.  One challenge is that the reward a developer receives for high efficiency is simply increased pressure to produce more software in less time.  Not terribly motivating.  The tradeoff is that good engineers have pride in their work product, and want to delight their customers whether they are contractors or building a product for a startup.  Long term, the highest productivity teams are driven by internal motivation, not external or monetary incentives.

We can engineer these motivations by connecting a development team more intimately to their customers.  Obviously a business owner, principle investigator, or project manager is one such class of customer.  It is even better if the organization can identify a few representative volunteer users who will test drive the results of an iteration and provide feedback. The team will be motivated to accomplish enough during an iteration to satisfy the user.

You can also engineer indirect incentives towards high productivity by increasing the transparency of internal processes via open source distribution of non-proprietary components of the work. If I have to document and sign my name to code I’ve produced for a wider audience and/or write a blog post about a solution as part of my day job, you can bet that the quality of that work will be higher than work only a manager might see (and be willing to overlook this time to make sure that story is marked completed for the current iteration!).

Finally, the ultimately question of whether a given iteration is efficient is that it is providing evidence as to the projected total time and cost of the project.  If a contractor or internal team performs too slowly, then the project may get cancelled or the the team replaced with another that has a more proven track record.  At the end of each iteration or two, the team should reconsider the number of remaining iterations to meet the overall project’s goals and ensure that the scope of the project and total budget make sense in light of the progress to date.

For the moment, I’ve come down against recommending any explicit, empirical measurements of efficiency outside of the team itself.  I believe it only serves to breeds distrust which in turn tends to reduces efficiency over the long term. Instead, the team should be encouraged to engage in an internal process of improvement and gain the benefit of that improved efficiency by investing excess time in fun, but useful projects like maintaining open source projects, working on a skunkworks project, or conferences / professional development.

Adaptivity is the measure of how well an organization adapts to the inevitable evolution of a specification, customer feedback, and internal learning that happens as a project proceeds.  Adaptivity is closely related to efficiency because the best time to make changes is while a feature is actively being worked on.  The longer we wait to make changes, the higher the cost of those changes.

These metrics include:

  • How many change requests did we triage?
  • Proportion absorbed into the current iteration, backlogged, iceboxed, or rejected outright?
  • What was the “point volume” (cost) or “value volume” (benefit) of the requested changes?

A healthy process with good dialog between business owner and developers should identify local opportunities for improvement and translate a good fraction of the change requests invisibly into active iterations or into the near-term backlog.   A very good team will independently identify opportunities for better value during development and can get a quick sign-off from the project owner to add something that benefits end customers while it is most efficient to do so. The team should build a culture of  finishing iteration demos with some small, “just one more thing” moments.  As Steve Jobs showed us, this delights both customers and developers.  The 80/20 rule for overall point volume balances delivery obligations with delight.

The ultimate in cost effectiveness is not features produced, it’s the maximization of the business value and reduction in long-term costs as measure by a low defect rate.  If a team is meeting most of their iteration obligations, adapting effectively to incoming change requests, and everyone is satisfied with the progress as the iterations unfold, then it is highly likely that money is being well spent.  By focusing on value and quality, tying a team’s pride to customer satisfaction, and creating indirect incentives such as public exposure and recognition for their work, a business owner should have no need to maintain detailed, performance-reducing metrics of day to day behavior.

What’s Next?

There are at least two clear follow-up posts (backlog!) from these initial thoughts:

  1. Up-front design.  The world is very excited about the notion of Design (with a big ‘D’).  Design often involves a waterfall like process of interviews, iterations of wireframes and other aspects of workflow resulting in a detailed specification.  The same is true of architecture, how do we tradeoff the long-term impacts of architectural choices with a desire to learn as we go and adapt the architecture, and design, over time?
  2. Internal team metrics and tracking.  Internal metrics help answer the “how” a team maintains awareness of what is going within and among iterations so there are no late-stage surprises that compromise quality or timeliness.  How do they accommodate feedback from earlier iterations to inform future iterations and long-term forecasting?  What is the role of velocity and how should it be tracked, especially given that it typically declines with time?


    • ianeslick

      Perhaps you could expand your comment, such as too long for what purpose? Perhaps after I’ve thought out loud over a couple of posts I can summarize my findings in a succinct post for those who aren’t interested in understanding all the thinking behind the specific ways of tracking progress?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s