I’ve had some recent questions on product and program management at scale, so to address some of them, I’ve decided to push this early draft Chapter out for review and comments. It’s available at the book resource tab, too.
I just noticed that a recorded copy of my live Agile 2009 presentation is posted on line at http://www.infoq.com/presentations/scaling-best-practices. It’s my basic scaling software agility overview (not that scaling itself is basic, however) and it may add some depth and additional commentary to some of the content on this blog.
The Queue at Starbucks
I was sitting at a Starbucks in Munich – a rainy, snowy, Saturday morning – watching the queue of people looking for their caffeine kick-start. (Same reason I was there). For some reason, the queue of people ordering coffee reminded me of the question teams ask about “how big” and “how well formed” (or how well-elaborated) their product backlog needs to be. Of course, I always have the consultant’s-standard-ready-answer of “it depends”, hopefully followed by a more meaningful discussion.
I’m writing this post to force me to think through and document a better answer; one that includes the “what it depends on” part.
The size of the team’s backlog, the rate of backlog processing, and the ultimate rate of value delivery, is really a problem of queuing theory, just like at Starbucks. At Starbucks, I was trying to do Little’s Law in my head, but to not very good effect, so I had to write it out when I came back to my hotel. Little’s Law, the general-purpose theory for queuing and processing problems, is one of the fundamental laws of lean. It states:
Wq is the average time in the queue for a standard job
Lq is the average number of things in the queue to be processed
The denominator (Lambda) is the average processing rate for jobs in the queue
While I was drinking my latte, I noticed that the queue at the ordering counter varied from zero to as many as 12 people in line. With my iPhone timer, I was methodically trying to time the average time that it took the single barista to serve each customer. However, when the queue got long, as if by magic, another barista appeared from the back somewhere (Starbucks likely understands queuing theory). That confused my timing and I lost track.
However, let’s assume it takes about 45 seconds to serve a customer or a service rate of 1.33 customers per minute. We can use Little’s Law to calculate the average wait of someone in the queue on this particular Saturday morning as follows:
Wq = 6 ( Lq, average number of customers in line over the period) / 1.33 (Lamda, number of customers we could serve per minute)
So Wq is 4.5 minutes in this case (not too bad, even if you need that quick fix).
It’s important to note that this is the average case, and your wait could be shorter (I was the third person when I went in, and I only had to wait a minute or two to order) or longer (the person at the back of the 12 person queue had to wait about 9 minutes; yikes, he could easily go somewhere else).
Little’s Law and an Agile Team’s Backlog
Ok, fair enough, but what does the line at Starbucks have to do with agile development and the team’s requirements backlog? It is the same problem. Let’s assume:
- a single agile/scrum team, working in two week iterations
- The team averages about 25-30 story points per iteration, or a story completion rate of about 8 stories per iteration
- The team is justifiably proud of how well they are maintaining their backlog, and it contains about 100 stories.
The question is, how agile is this team? In other words, how long does it take, on average, for a new requirement (story) to get to the end of an iteration, where it can start to deliver value (assuming no delay in release, a topic for later)? First to calculate the processing rate per story, we convert 8 stories per iteration into .125 iterations per story , and plug this data into Little’s Law:
The answer is 12.5 iterations to get into the sprint, plus two weeks to get out, or 27 weeks on the average. More than half of a year! And, if the backlog varies in size, it could take even longer (remember the guy who had to wait nine minutes).
Wow! If it takes a single team a half a year, on average, to deliver a new requirement to the customer, that sure doesn’t seem very agile to me!
Plus, in the enterprise, there are multiple teams with interdependencies, and the individual results of the teams have to be aggregated, packaged, and validated in some kind of release envelope before distribution, so it can take longer still. Therefore, it’s understandable when we see an enterprise with 20, 50, or even 100 reasonably agile teams that still takes 300-500 days to move a new requirement from customer request to delivery.
So yes, it may be understandable, but it’s not acceptable. Let’s see what we can do about it.
Applying Little’s Law to Increase Agility and Decrease Time-to-Market
The formula is not complicated. If we are going to improve (decrease) time to market, we have to either increase the denominator or decrease the numerator.
And of course, if we can do both we will achieve even better results. Let’s look at each opportunity.
Increasing Lambda, the Rate of Story Completion.
Increasing the rate of story completion, and thereby the overall rate of value delivery, is the legitimate goal of every agile team. Of course, if we could simply add resources, we could probably increase Lambda, but for the purpose of this discussion, let’s assume that is impractical. Besides, while it’s the easy way out of the argument, it increases the cost of the value created, and decreases ROI. So, while you might get there faster, you may not make any money when you do. Let’s work within the constraints of our archetypical team and see what we can do.
The primary mechanism for increasing the rate of story completion is the team’s inspect and adapt process, whereby the teams review the results of each iteration and pick one or two things they can do to improve the velocity of the next. This is the long-term mission; a journey measured in small steps, and there is no easy mathematical substitute for such improvements. These improvements include better coding practices, unit testing and unit testing coverage, functional test automation, continuous integration, and other enhanced agile project management and software engineering practices.
In my experience, however, two primary areas stand out as the place where teams can get the fastest increase in Lambda. First, gaining a better understanding of the story itself before coding begins, and second, decreasing the size of the user stories contained in the backlog.
Gaining a Better Understanding of the Story – Acceptance Test-Driven Development
The fact is that the overall velocity of the team is not typically limited by the team’s ability to write, or even integrate, code. Instead, it is gated by the team’s ability to understand what specific code they need to write, as well as to avoid code they do not need to write. Doing so involves having a better understanding of the requirements of the story, before coding it.
However, this must be done on a just-in-time basis, just prior to the iteration boundary, or else the team’s backlog will get wider and the team will have too much requirements inventory. Some of it will likely decay before they get to it. However, a wider backlog is not nearly as bad as a longer backlog. The worst case is that a few team members have gone too far, too early, in elaborating a few backlog items, but it won’t slow value delivery nearly so badly as would a longer backlog. It’s a bit of waste, but it doesn’t really drive Little’s Law.
Therefore, once a story has reached a priority whereby it will be implemented in the next iteration or two, time spent in elaborating the story will pay dividends. Often, this is described as Acceptance Test-Driven Development, (ATDD) and, fortunately, it’s a little easier for teams to intellectualize and adopt than code-level TDD. ATDD involves two things: 1) writing better stories and 2) establishing the acceptance tests for the story before coding begins.
Fortunately, we don’t have to cover all that here, as I described writing better stories in the User Story whitepaper and how to develop Acceptance Criteria in the draft book chapter.
Increasing Lambda with Smaller Stories
If all the people ordering at Starbucks had ordered a tall, black coffee, rather than a Venti, non-fat, double shot, half-caff, no foam, vanilla latte, along with a heated bagel on the side, the length of the queue and the wait from the back of the line would have been much shorter. Small jobs just go through a system faster than large ones.
In the User Story whitepaper, I described the benefits of smaller user stories at length, and I won’t repeat them here. However it’s worth pointing out that decreasing the size of user stories has both a linear and exponential effect on Lambda, both of which are positive:
Linear effect. Smaller user stories are just that, smaller. They go through the iteration faster so teams can implement and test more small user stories in an iteration than large ones. And, while the total value of a small story can’t be as big as a large story, the incremental delivery hastens the feedback loop, improving quality and fitness for use.
Exponential Effect. Because they are smaller and less complex, small user stories decrease the coding and testing implementation effort exponentially. The coded functions are smaller and less complex, and the number of new paths that must be tested also decreases exponentially with story size.
So, even if the length of the backlog remains the same, a combination of better defined and smaller user stories has a positive effect on value delivery time.
Now that we’ve seen two ways to decrease time to market (Wq) by increasing the denominator (Lambda) of our equation, let’s look at the other half of our equation and see what opportunities we find there.
Decreasing Lq, the Length of the Queue
Clearly, increasing Lambda, the denominator of Little’s Law, is a prime opportunity for the team to increase agility and time to market. Every truly agile team continuously commits to doing so.
However, there are even faster ways to decrease time-to-market, while they are working on continuously improving development practices. That is by forcing a limit on the length of the queue.
We can see in our formula, that decreasing queue size has a directly proportional decrease in the wait time, Wq. Therefore, if we cut our queue size in half, we can halve our time to market without taking any further action. In our ultimate case, if we reduce the queue size to below eight or so, we could deliver every story in the next iteration.
As if this weren’t enough motivation, in his new book, The Principles of Product Development Flow, Don Reinertsen points out that there are a number of additional reasons why long queues are fundamentally bad in the product development process:
- Increased Risk. While a story is in the queue, there is some probability that the market or customer has changed their mind and the story is no longer valuable. The longer the queue, the higher the probability. When we develop an unneeded story, we waste valuable resources. Worse, the unneeded story has displaced some other story that would have had economic benefit.
- Increased variability. With a long queue, there is always way more than enough work to do so the team takes on everything they possibly can. Management supports this by driving teams to high utilizations (95% or better). In turn, high utilization drives high variability, as Figure 1 illustrates. High variability decreases reliability, causes stress in the organization, and perversely, drives even higher utilization due to fire fighting.
- Increased costs. Every story in the team’s queue was put in there somehow by someone. That takes labor. Once it’s in there, the team has to continue to account for it, prioritize it, and rearrange it as higher priority items come into the queue.
- Reduced Quality. The longer the queue, the longer it is before we get feedback from the customer (or product owner proxy). The longer the feedback, the more other developers may have invested in the nonconforming story, and the more expensive it is to rework.
- Reduced motivation and initiative. If it’s going to be a long time before a customer sees a story in the middle of the queue, there is little sense of urgency for the product owner or team. If they are going to see it soon, we better worry about getting it right, right now.
For these reasons, many teams have simply decided to place arbitrary limits on the size of the backlog, sized as necessary to create the desired time-to-market or internal feedback loop. Once the queue is full, they quit even thinking about new stories until there is room for more stories in the queue.
Software Kanban Systems
Indeed, the devastating effect of long software queue sizes is the primary economic and philosophical principle that drives the current lean software kanban movement.
As the (awfully cutely named) Limited WIP Society describes:
- Kanban manages the flow of units of value through the use of work in process (WIP) limits.
- Kanban manages these units of value through the whole system, from when they enter, until they leave.
- By limiting WIP, Kanban creates a sustainable pipeline of value flow.
- Further, limiting WIP provides a mechanism to demonstrate when there is capacity for new work to be added, thereby creating a pull system.
- Finally, the WIP Limits can be adjusted and their effect measured as the Kanban System is continuously improved.
At a further level of sophistication, David Anderson describes utilizing various classes of service as follows:
Expedite. Unacceptable cost of delay
Fixed delivery date. Step function cost of delay
Standard class. Linear cost of delay, and
Intangible. Intangible cost of delay.
Each class of service has its own work in process limits, which can be adjusted based on current context, and associated with different management policies. With kanban, teams have a structurally sound basis for decreasing Lq, the length of the backlog, and can thereby reap the benefits accordingly.
Prior to reading this post, I suspect that many readers thought that a moderately lengthy, well-articulated, agile product or project backlog was an asset to the agile team. Common sense and intuition may have led us to believe that was the case. Many believed that such a backlog increased, rather than decreased, the team’s agility and the rate of value delivery to the customer. Indeed, I suspect there was a time when my intuition (and perhaps the baggage of the waterfall model) told me that too.
But the economics and math behind queuing theory, lean manufacturing, and lean product development teach us otherwise. Instead, we’ve learned that agile teams need short backlogs of small items, a number of which are quite well articulated and socialized, but only just prior to the iteration boundary in which they will be implemented. Acceptance Test-Driven Development is one method that helps us accomplish that.
Forcibly limiting the length of the backlog helps immensely also, as smaller backlogs have smaller average wait times, and therefore our customers don’t have to wait as long for value delivery. In addition, since we know that the incremental value of a feature decreases with time, small backlogs are a primary mechanism with which to achieve maximum possible ROI. Plus, it’s more fun for the team to deliver fast, too.
Reinertsen, Donald. The Principles of Product Development Flow. Redondo Beach, CA: Celeritas Publishing, 2009.
Note: there are a number of interesting comments below. If you find the thread interesting, follow it to this follow-on post : https://scalingsoftwareagility.wordpress.com/2010/01/10/more-on-lean-backlog-and-littles-law/