Team’s Metrics: A Human-Centric Approach

Take a human-centered approach to evaluating your team's metrics and determining how to use them to drive growth.

Sep 06, 2022

Engineering team and thier shadow — Just a shadow of the team’s work

Last week I reread John Cutler's excellent article, "What Are Vanity Metrics and How to Stop Using Them", where he reflects on product vanity metrics and how to replace them with better KPIs. After reading it, I couldn’t help but think about the recent blooming of engineering measures.

"Vanity metrics are metrics that make us feel good, but don’t help us do better work or make better decisions. No one is immune to using vanity metrics! The key is ensuring you provide context, state the intent of the metrics you use, and clarify the actions and decisions that the metric (or metrics) will drive. — John Cutler, What Are Vanity Metrics and How to Stop Using Them

It is quite simple to set up a team dashboard with KPIs today. There are a lot of intuitive tools and solutions for gathering and processing your team's signals (i.e., DX, Athenian, Waydev). Consequently, engineering metrics gained popularity, and more teams and leaders are involved in the discussion. People started arguing whether DORA metrics are useful or useless and whether outputs or outcomes were the best way to measure your team.

I am glad to hear about the friction in our industry and how it forces our industry to measure and continue improving. However, I think some people miss a fundamental point. Metrics for your team shouldn't be viewed as goals; rather, they should be viewed as health measures that help identify issues.

System of people, not machines

Almost all engineering leaders come from engineering backgrounds where they have built and maintained systems of machines (i.e., servers, modules, functions). These systems are constantly measured and validated for performance, stability, security, and many other metrics. It's a winning system that allows teams to improve customer satisfaction while reducing operational costs.

Assume that customers start to complain about your product performance. You decide to improve them, and to achieve this, you set a requirement for all page loads to be performed within 300ms. Once you find out which queries are problematic, you can start figuring out why they are problematic. There is a solution out there, and it is only a matter of time.

It would be ideal if our dashboard looked as elegant as the one on our car. You can easily check your oil level and refill it if it's low, and if the tank fills up and gets empty, there is a rational explanation behind the problem that we can diagnose and solve.

So the thing is that.. humans are not processing units that you can debug. We are not pure input-output machines and are full of side effects. That's why we are capable of making beautiful and unexpected things that are also impossible to reason about. Therefore the way we reason and, more importantly, the way we act must be different.

Ultimately, engineering metrics can't tell your team the whole picture. They just provide a glimpse of what is actually going on. The good news is that they are still incredibly useful. By becoming more "data-informed" rather than "data-driven," you can use them to create experiments and define hypotheses instead. Focus on behaviors rather than shallow numbers.

Incentives shape behavior

It is also common for managers to set goals based on metrics' values, such as, "In Q4, we aim to reduce our lead time by 30%". Don't get me wrong, the reasoning behind it is solid. A smaller idea-to-production loop will enable the team to learn faster. However, do you believe reducing lead time by 30% will result in more learning? You're just trying to set a proxy metric because there is no feasible way to measure your learning rate. Reaching four hours lead time means you reached four hours lead time. You are likely to reach this goal if the team is motivated by this, but probably by doing things unrelated to learning. While the team might be shipping faster, they might not look back and learn.

“When a measure becomes a target, it ceases to be a good measure.” — Goodhart’s Law.

Imagine we want our city to be safer, so we set a target for our police department's case closure rate. This might result in the opposite. The department might choose to focus on easy-to-solve cases and avoid difficult ones that would significantly impact our community.

Lead time is a measure that can be used to assess the health of the team. Establish patterns for your team and start tracking anomalies. Has the lead time increased from 4 to 30 hours? Investigate, write your assumptions, and build hypotheses. Then validate them with qualitative research and keep measuring the impact of your changes.

Accept that your data is incomplete.

Although GitHub, Linear, or Jira can provide quite a lot of information about your team, these metrics are only a shadow of what actually happens. The rule of big numbers can help in large teams (100 or more engineers), but it is also more vulnerable to inconsistencies between departments and teams. Accepting this truth will help you to ensure you are leveraging the full potential of your tools and analytics.

Has the number of bugs increased? The problem could be caused by a buggy feature or by a new manager with a higher quality standard.

Has the lead time decreased? There could be several reasons for this: either your team is shipping smaller chunks, or maybe they started designing their plans on a new tool that isn't being tracked.

Measurement systems should allow you to reduce your search parameters and focus on a smaller interface. Health metrics can indicate where you should start looking and will help you validate the effects of your decisions.

But elite performers’ teams reach XYZ

True but lacking context. Logic states that A⇒B doesn't imply B⇒A. In other words, many outstanding teams indeed share similar metrics and patterns, but that doesn't mean that if your team reaches these results, it will make them outstanding.

Even if an article suggests presenting a team that merges 10 PRs a week is the new rule, take a moment before setting a new standard. Study the ins and outs of that difference and why your team is doing things differently. It might turn out that having 10 PRs a week comes with a weak code review process and a lot of follow-up work. Are you willing to make this tradeoff?

Instead of focusing on metrics, identify your team's strengths and weaknesses by talking to members and stakeholders (product, design, and customers). Develop theories by being curious and inspired. Don't adopt any system before testing if it fits your specific context.

Summary

The availability of data gives us managers a lot of new opportunities. If we know how to avoid our desire to rely on qualitative data and mix some qualitative approaches, we can unlock impressive growth for our team. Make sure you don't skimp on assumptions and hypotheses. Research and experiment properly.

Recommended follow-up readings:

Stuff on Engineering

Discussion about this post

Ready for more?