The Tyranny of Technical Debt – an Analogy

In conversations regarding technical debt (and technical improvement), I often hear that there is an understanding gap between technical people and business people. As an experiment, I thought it would be interesting to attempt to bridge that gap by illustrating what technical debt looks like, with an analogy.

The Conversation

Setting: “Widgets Inc.”, a manufacturing company that makes widgets

Characters:

  • Max – VP Sales – responsible for making sales commitments to customers
  • John – Plant Manager – responsible for manufacturing the widgets

January 2nd:

Max: Hi John, tell me where we are from a production perspective. I have two big customers breathing down my neck.

John: Our current capacity is 1000 widgets a month.

Max: John, I need you to do better than that. We’ve just committed to deliver 1500 units each to two customers by the end of February.

John: Well… our capacity of 1000 per month includes time for scheduled maintenance as well as time for improvement projects that will increase our capacity over time. We expect to be able to deliver 1500 widgets per month by July. If we deferred these items we’ll have to operate at reduced capacity until we have some downtime to catch up. Also the delayed maintenance may give us some quality issues.

Max: Do what you need to do. We don’t have any choice here.

Feb 28th:

# widgets delivered: 1400 to each customer (2800 total)

# widgets promised but not delivered: 100 to each customer (200 total)

Anticipated capacity for March: 800 widgets/month

# days downtime needed to restore capacity to 1000 / month: 5

March 1st:

Max: Hi John, we have another hot customer order. We can’t say no. We need to deliver 3000 units by the end of April. Plus the 200 units we are still short from the January order.

John: Well, with the deferred maintenance and deferred improvement projects, we’re down to 800 a month. We might be able to push it to 1,000 but we expect to start seeing more quality issues and more equipment outages.

Max: We don’t have a choice. You need to deliver 3000 units by the end of April.

April 30th:

# widgets delivered: 1200 to each customer (2400 total)

# widgets promised but not delivered: 400 to each customer (800 total)

Anticipated capacity for May: 600 widgets/month

# days downtime needed to restore capacity to 1000 / month: 15

May 1st:

Max: John, what is going on here? Customers are screaming at me because we are not meeting our delivery commitments! And they are complaining because our quality sucks! We’re getting a 10% reject rate!

How does this relate to software Development?

OK, my little story is so extreme that it is unbelievable that anyone would act that way in the situation described. My analogy is both biased and exaggerated – or is it?

The trouble with software development and technical debt is that:

  • The work is much more complex. It is harder to measure both capacity and throughput. We can’t simply count widgets produced per month.
  • Technical debt is invisible to the non-technical. We can’t walk around the factory floor and see the smoke coming out of machines that have not been maintained.

There are a couple of key points I’d like to make from my analogy…

Technical debt: speed up in the short term to go slow in the long term

Consider the comparison of widgets produced over time between the expedited and sustainable capacity scenarios:

Month Widgets / Month
Expedited
Widgets Total
Expedited
  Widgets/ Month
“Sustainable Capacity”
Widgets Total
“Sustainable Capacity”
Jan 1400 1400   1000 1000
Feb 1400 2800   1050 2050
Mar 1200 4000   1100 3150
Apr 1200 5200   1200 4350
May 600* 5800   1300 5650
Jun 600 6400   1400 7050
Jul 600 7000   1500 8550
Aug 600 7600   1600 10150

* Assumes: time is not granted to recover from the expedited delivery schedule.

Because of the capacity cost of the expedited work, eventually the total output in the “sustainable capacity” case (“includes time for scheduled maintenance as well as time for improvement projects that will increase our capacity over time”) will have been more than in the expedited case.

Organizations that truly prioritize and strive for continuous improvement will take care of not only the day-to-day cleanup of daily work (analogous to scheduled maintenance), but will continuously find ways to make the daily work easier and faster. As noted in the DevOps study Accelerate, top performing organizations deploy to production more than once per day per developer, and the number of deploys per day goes up exponentially as the number of developers increases.

Whose fault is it, anyway?

In my story, I’ve made Max appear more of the “bad guy” than John. What I’d like to point out is that this is a systemic problem: everyone is trying to do what they think is best for the organization and for the customer from their point of view.

It is only when we step back and look at the big picture – the long term cost of ignoring the maintenance and improvement work – that we see that the system is broken and we are creating long term results that no-one wants. It is the responsibility of leaders, especially senior leaders, to create the environment where short-term and long-term considerations are appropriately balanced: that maintenance and improvement do occur, even in an environment where there is always business pressure to deliver the next thing ASAP. 

How would we know?

In my mind, I can hear arguments between business and technical folks as to “whether this is happening in our organization”. What I’d challenge these folks with is to consider these two hypotheses (that perhaps represent ends of a spectrum):

  1. “We are appropriately balancing maintenance and improvement work, and are not building up technical debt – in fact we are continuously getting faster and improving quality at the same time”
  2. “We are building up technical debt and it is hurting our ability to continue to deliver value and quality”

Challenge question: What evidence would you look for to tell you which hypothesis is “more true”?

The Bottom Line

While technical debt is hard to see, its impact is real: declining capacity and declining quality, along with staff burnout and attrition. Technical debt is what accumulates when a technical team is asked or forced to work faster than its sustainable pace: the pace at which both maintenance work (technical debt cleanup) and improvement work (improvement of processes, tools, knowledge and skills) occur as part of daily work, so that capacity continuously improves rather than declining.


I’m curious: are you familiar with technical debt? How does my analogy resonate for you? 

Originally published February 13th, 2020 on the Innovative Software Engineering blog. Republished with permission.