The work-in-progress fallacy and the power of WIP limits

Jorge Alonso
Spotahome Product
Published in
7 min readFeb 18, 2022

--

More work in progress doesn’t mean more value provided

What if I told you that your team could deliver the same value by working fewer hours?

A common fallacy is to think that the more work we are putting in, i.e., the more tasks that each software team has ‘in progress’, the more value the team will deliver. Little’s law tells us this is wrong. Any system that can be measured by its throughput will only be as fast as the slower of its stages. This video illustrates the idea very well: The throughput of a linear system doesn’t increase by having more work in progress. It has an upper limit: The throughput of the slowest of its phases. So, stacking more WIP doesn’t increase throughput (speed of delivery), but rather increases cycle time (the time a given work element, a car on the video before, spends within the system).

Why is it bad for the cycle time to increase? In traditional industrial processes, a bigger cycle team for any manufacturing meant more storage needed between phases of the process, more likelihood of errors, more likelihood of waste.

A software team is not a ‘linear system’, you’d think. You are right, and that makes it even more complex and even more vulnerable to waste, as we’ll see.

Waste in a software development team

Let’s assume a simple software development process: We move tasks from a backlog into the implementation phase. After the implementation phase, there is a code review phase where other teammates review our code. After the review passes, our code is released.

In order to illustrate the example, let’s monitor work on a team of 2 people, John and Jane, with an ‘infinite’ backlog (in other words, with the capacity to continuously pick items from a backlog and start implementing them). We’ll start with a full backlog and 0 tasks in progress:

  • Touchpoint1: John and Jane pick a task from the backlog and start implementing it. IMPLEMENTATION: 2 REVIEW: 0 RELEASED: 0. WIP: 2
  • Touchpoint2: Jane finishes her implementation and requests for John’s review. But John is busy with his implementation and decides to wait until he finishes to review. Jane, as she doesn’t know about the WIP fallacy, picks up a task from the backlog and starts implementing it. IMPLEMENTATION: 2, REVIEW: 1, RELEASED: 0. WIP: 3
  • Touchpoint3: John finishes his implementation, and requests a review from Jane. Then, reviews Jane’s pending review, and adds his comments. Since there’s not much that he can do before Jane either responds to his comments or reviews his work (and oblivious to the WIP fallacy), again he picks up a task from the backlog. IMPLEMENTATION: 2, REVIEW: 2, RELEASED: 0. WIP: 4
  • Touchpoint 4: Jane is in a pickle now! She has 3 assigned responsibilities on her board: Her in-progress implementation, her previous code that was already reviewed, and a piece of code from John she needs to review. She has two options: 1) Continue with her implementation to avoid context switching. Would that make sense? At the end of the day, the top of the backlog is the most important to their customers, and the two tasks that were on top are now in review. Option 2: Jane changes the context and reviews the two tasks in review, before going back to her implementation. She has to change context at least twice.

As you can see, there is no efficient choice for Jane in touchpoint 4. If she chooses not to change context (the more efficient way for her to continue working), then work will continue to pile up. And even more worrisome: We’d be spending time on something that is not as important to Jane’s customer as those tasks in review. If, on the other hand, she chooses to err on the side of delivering the most important work for her customers, then she’ll have to change context twice, thus making her less productive.

In a software development process, there is an interesting nuance: It is not a linear process the moment several teammates need to participate in the same phase (Review phase). This process is usually asynchronous. Thus having unlimited WIP will not only increase the team’s cycle time but can also reduce the team’s throughput (unlike linear systems, in which ‘producing’ factors for each stage are independent).

So, not limiting WIP in a software team is much likely creating waste, either in the form of context switches or in the form of longer cycle times / reduced throughput

The power of WIP Limits

We’ve seen before that by limiting WIP limits on a linear system, we could keep the throughput constant and decrease the cycle time. In plain words, for a software team, this means that we could deliver value to our customers at the same rate working less time. That might seem unreal. How could this be possible? It is possible because we are reducing our waste to a minimum (context changes / too much burden on any teammate’s plate).

What is a good WIP limit? A good WIP limit is one that doesn’t limit the throughput but prevents waste as much as possible. A too restrictive WIP limit will limit throughput. A too ‘lose’ WIP limit will not constrain the process enough, thus waste will appear. We have created a software program that simulates how a software team moves tasks through a set of states (that we will open source eventually), and it has confirmed more or less our belief: The limiting factor is usually the code review phase due to the fact that more than one teammate participates on it. Optimal WIP limits depend on each team and the nature of its work, but a good rule of thumb is to limit the WIP of the in-review stage to a max of (number_of_team_members/number_of_reviewers_needed_including_the_original_coder). Additionally, you might want to limit your IN PROGRESS WIP to a number below the total number of teammates, so you build wiggle room for teammates to be attentive to participate in the bottleneck phase: the review phase.

Limiting WIP will also ensure the team focuses on the most important tasks. Under the assumption that each team would always have a clearly prioritized backlog where the most important tasks are on the top and are the ones that are kicked off sooner, having WIP limits will force the team to focus on those. Why? Simple. If I have an implementation task in progress that I finish and I want to move to review, but the WIP limit is consumed, I cannot! What should I do? I should help clear the review column so that I can move my task to review. Supposedly, since those tasks were in review before mine, those are more important to my customers, so I’m spending my time on the most important tasks for my customers, yay!

But wait for a second! If we limit the work in progress, there might be situations where people would sit idle, right? What if all WIP is consumed, and there’s nothing I can do to help any of the tasks in progress, no pair programming, no reviewing…? That can happen, yes, and that’s not bad. In fact, that is great! It means we are fully utilizing our people to achieve our highest throughput, so you have ‘slack time’.

Bonus — The power of slack time

Slack time is that time where any team member cannot contribute to pushing forward any task in progress nor cannot kick off any new ones due to WIP limits being consumed. It is what we “get back” from the waste we are saving due to limiting our WIP. And it is a blessing in disguise! Because this time can be spent in several valuable ways where context switching will not be so bad for the team’s progress:

  1. Think: Sometimes we are so busy delivering that we forget to think. Thinking will make us more effective. Will make our team’s outcomes more valuable. Take a look at your latest experiments, try to make sense of them. Think about the next steps that might have been overlooked by the team. Look at customer data, how are we performing against our OKRs? Thinking (slack) time is the key to innovation.
  2. Learn: Train yourself. Read. Do courses. Read other teams’ design documents or code.
  3. Pair: Sit down (virtually) with a teammate that is working on something on the board and shadow her. That will also speed up the review of that code!
  4. Groom: Take a look at the top of your backlog. That will need to be groomed eventually. Maybe a good moment to start?
  5. Help: There usually are several people in need of help, especially if your team has a big set of stakeholders. Review the public support chats, the bug tickets open to your team… Try to help where help is needed.

P.S: If Reviews are the bottleneck, why not just remove them?

There’s an interesting discussion that might be worth a different post on how code reviews (the same as groomings) are not just a quality assurance mechanism, but also an efficient mechanism for the team. While reviewing code that’s not ours, we not only ensure the code meets our standards and acceptance criteria, we are also gaining context about that particular piece of code, which will make us more effective when we have to work on top of that code again. PRs are a great mechanism for knowledge sharing, making our team more diverse in knowledge and thus more flexible, in turn increasing our throughput. PRs also contribute to what Anders Ericssen in his famous book ‘Peak’ coined as ‘deliberate practice’ (on which we’ll write another blogpost). It is an interesting dissonance (and debatable) that the thing that at the same time limits the team’s throughput is also helping increase it in the long run. Definitely, a great debate to have with a coffee (or Colacao in my case).

--

--