I’ve been wanting to write a paper about how to use dynamical systems to analyze failure modes for transportation networks. So far I haven’t been able to analyze this more carefully, but there’s one relatively simple example, namely bunching along a single bus line. This intersects to some extent with what I did in math academia, although the mathematical tools I’m using are fairly primitive within dynamics, going back to the early 20th century and not to the advanced machinery that dynamicists have developed in the last forty years (like the Mandelbrot set). As a caution, despite the math jargon and the math paper structure, it’s a blog post, and not something I’d even be comfortable uploading to the arXiv.
The upshot of the mathematical model in this post is that several already-understood reforms can seriously reduce bus bunching: speeding up boarding through prepayment and all-door boarding, using bigger buses with many doors on the busiest routes, implementing signal priority and enforcing bus lanes better, and improving dispatching to tell bus drivers to maintain even headways leaving each terminus. Section 1 provides mathematical background, and people who know some dynamics can skip it; it’s meant to be accessible to a general audience (if you’ve heard of derivatives, you should be fine). Section 2 constructs the model for bus schedule variations, section 3 explains how the model predicts bunching, and section 4 goes into how the above interventions can improve the situation. The mathematics I’m using is not terribly advanced, but it may benefit from careful reading, especially around the formulas.
1. Background on dynamics and chaos
Before I left academia, when people asked me to explain my research, I’d use the following example. In dynamics, we study what happens when we take a function and iterate it many times. We are specifically interested in chaotic behavior, which arises when two very close numbers can end up widely separated after sufficient iteration. There is no chaos if we only look at linear functions, so the simplest example is quadratic:

The simplest behavior of any number when we apply a function many times is if nothing changes. A point where this happens is called a fixed point. Two numbers are fixed points for the function x^2: 0 and 1. But in practice, it’s useful to view infinity as a number, so that instead of being far away from each other, the numbers 1,000,000, 1,000,000,000, and -1,000,000 should be viewed as all very close to infinity. Under the squaring function, infinity is a fixed point as well.
The key to understanding the dynamics of a function is to look at the behavior of the function near a fixed point. Near the point 0, if we take the square of a number, it gets much smaller. For example, 0.1^2 = 0.01. This means that if a number x is close to zero, then as we iterate the function x^2 we will get closer and closer to 0, very quickly. This behavior is called attracting. Near 0, small changes in initial conditions don’t matter much: the numbers 0.1 and 0.11 are close, and if we keep squaring them, we will approach 0 either way. Infinity is attracting as well once you get used to thinking of very large (or very large with a negative sign) numbers as close to infinity: 1,000 is pretty close to infinity and 1,000^2 = 1,000,000, even larger, i.e. closer.
However, near the point 1, we get the opposite behavior: 1.1^2 = 1.21, which is about twice as far away from 1 as 1.1, and similarly 0.9^2 = 0.81, again about twice as far away from 1 as 0.9. We then say that 1 is a repelling point. Near a repelling point, we have chaotic behavior, because two points that start out close, like 0.9 and 1.1 or even 0.99 and 1.01, end up widely separated after iteration (0.99 eventually approaches 0, 1.01 eventually approaches infinity).
If you’ve learned calculus, your reaction to the line about how 1.21 is about twice as far away from 1 is “the derivative is 2!”. In general, the way to figure out whether a fixed point is attracting or repelling is to take the derivative f‘(x) at the point (technically it’s called the multiplier). If the absolute value of the derivative is less than 1, the point is attracting; if it’s more than 1, the point is repelling; if it’s exactly equal to 1, we say the point is indifferent and then the behavior near the fixed point depends on further details that I don’t want to get into. As a note of caution, taking the derivative anywhere except at a fixed point won’t tell you anything about the function – for example, the derivative of x^2 (which is 2x) is 4 when x = 2 but 2 is not repelling, it’s a non-fixed point that ends up going off to infinity.
(You may wonder what it exactly means to take the derivative at infinity. The answer is that if f is a polynomial, then the multiplier at infinity is always 0. If f is not a polynomial, there is a definition on Wikipedia.)
Attracting points are in every sense nicer to deal with than repelling points. Unfortunately, chaos is everywhere: most points of every nonlinear function are repelling. More precisely: in addition to fixed points, there are periodic points (i.e. fixed points of iterates). The periodic points of x^2 are a little hard to unpack if you’re not used to complex numbers: they’re solutions to equations like x^4 = x, x^8 = x, etc., and these are all complex numbers on the unit circle. We can compute multipliers there too (take the derivative of the iterate for which they’re fixed) and classify them as attracting or repelling. One of the foundational theorems of dynamics is that all but finitely many periodic points are repelling – and the number of non-repelling points is at most 2d-2 where d is the degree of the function (and if the function is a polynomial of degree d, then there are at most d-1, not counting infinity). My main contribution to math is to extend this result to a certain number-theoretic application.
2. Modeling buses using dynamics
The key insight of why buses bunch is that maintaining the exact schedule is a repelling fixed point, so small variations from the schedule (due to traffic, slow passengers, or random noise in passenger numbers) will compound over time, just as the variation of 0.9 or 1.1 from 1 compounds over time as you apply the squaring function.
More precisely, let’s say buses on a certain street run every 10 minutes. Eventually we will call the scheduled headway h, but to make this concrete, let’s say h = 10 minutes. Every few hundred meters, the buses stop to pick up and drop off passengers. Before San Francisco instituted prepayment, each additional passenger took on average 3.9 seconds to board and another 3.9 to disembark (link, PDF-p. 14); the TRB claims the average is 3 seconds to board (link, PDF-p. 20). We will call the extra boarding time per passenger b, and right now set b = 3 seconds = 0.05 minutes.
To understand why bunching occurs, let’s say that our bus falls behind schedule by a minute. It’s now 11 minutes behind the bus ahead (which we’ll assume is on schedule), not 10 minutes. On average, there will be 10% more passengers to pick up at each stop (passengers arrive at stops at a uniform rate). Let’s say the bus gets 60 boardings per hour (which is the Brooklyn-wide average). Typically we expect the bus to get 10 boardings in the next 10 minutes, but because there are 10% more passengers per stop, there will instead be 11 boardings. The one extra boarding will slow the bus down by 3 more seconds. The bus will then be 1:03 minutes behind. It’s a small difference, but over time it compounds.
There will also be more alightings as the bus gets more crowded, with a lag time equal to average passenger trip length. But in practice, to avoid introducing exponential factors, complicating the analysis, it’s best to just think of boardings plus alightings as a single metric, which if there are 60 boardings per hour equals 120 per hour or 2 per minute, and take note that a 1-minute delay only starts accumulating half a lag time in the future (e.g. 10 minutes if the average unlinked passenger trip is 20 minutes, as in New York). We call the number of boardings plus alightings per hour r, and in our example case r = 2.
If we choose our unit of time to be the minute, then the formula for the average delay a minute after our bus was x minutes behind the bus ahead is,

In the example we worked out above, it took 10 minutes to accumulate an additional 6-second delay (3 from boarding, 3 from alighting). Using the numbers h = 10, x = 11, b = 0.05, r = 2, verify that the formula spits out 0.01, or in other words an extra 0.6-second delay per minute. If x = h, that is if the headway between our bus and the bus ahead is exactly as timetabled, then there is no additional delay, making the correct headway a fixed point, but a repelling one. The multiplier is equal to,

Note that choosing units is important. The reason is that the mathematics I’m using assumes there are discrete steps: you apply the squaring function (or any other nonlinear function) once at a time. In reality, time is continuous. So to model it using discrete dynamics, it matters which unit of time we pick; this is the equivalent of choosing between x^2, x^4, x^16, or any other iterate. Fixed points will stay attracting or repelling (or indifferent) no matter what, but the exact value of the multiplier will change.
With this in mind, when our quantum of time is a minute, the multiplier with our usual values of h, b, and r is equal to 1.01. Every minute, a delay multiplies by a factor of 1.01. Within an hour, this factor grows to 1.82. This doesn’t seem too bad – it means a 1-minute delay turns into a 1:49-minute delay within an hour.
3. How bunching occurs
In section 2 we showed that if h is the scheduled headway, b is the average boarding time per passenger, r is the average number of boardings and alightings per unit of scheduled service time, and x is the current distance (in units of time) between our bus and the bus ahead of us, then within a minute we expect the distance to grow to

The multiplier is 1 + rb/h, which doesn’t seem too bad. However, there are complications. For one, the initial delay may be not 1 minute but longer. In Eric Goldwyn’s interviews with drivers, they cited traffic as the top reason why they believed bunching occurs, and barely mentioned passenger boardings. In math we cannot conflate popular perception with reality, but that the drivers complain about traffic suggests that there is widespread variation in the extent of the initial delays coming from missing a light, drivers blocking the bus lane, etc. If the initial delay is 2 minutes, then all delay numbers are naturally doubled over 1 minute.
But more importantly, our delayed bus will never bunch with the bus ahead. It will bunch with the bus behind. And the effect of the model of cascading delays on the bus behind us is exactly double what I described above. The reason is that if our bus is a minute behind – for example, 11 minutes behind the bus ahead when it should be 10 minutes behind – then the bus behind us, if it starts out on schedule, is now a minute ahead, only 9 minutes behind us when it should be 10 minutes. This means that within 10 minutes, we fall 6 seconds behind (and are thus 11:06 behind the bus ahead of us), but by the same token the bus behind us advances 6 seconds ahead (and is thus 8:48 behind us). In practice, the quantity relevant to bunching is the distance between two successive buses, and behind us, the multiplier is not 1.01 but 1.02. Within an hour a 1-minute delay reduces the gap between our bus and the bus behind us by 3:17, and a 2-minute delay reduces it by 6:34, almost two thirds of the way to catching up with us. If the initial delay is 1 minute, the bus behind us will actually catch up with us within
minutes. If the initial delay is 2 minutes, it will catch up within
minutes.
And third, the multiplier grows as r grows and h falls – that is, it’s higher when the frequency is higher and when there are more riders per service hour. Keeping r at 2 (again, this is 60 boardings and 60 alightings per hour) but lowering h to 5 raises the multiplier to 1.02 ahead of us and 1.04 behind us. A multiplier of 1.04 with a headway of 5 minutes means the bus behind us will catch us within
minutes with just a 1-minute delay.
The real limiting factor to the capacity of city buses is not minimum stopping distance, unlike with trains. It’s that as the headway h decreases, bunching becomes so routine that adding more buses does not actually add capacity. If a bus runs every 2.5 minutes, keeping b = 0.05 and r = 2 gives us a multiplier of 1 + 0.05*2/2.5 = 1.04 ahead of us and then 1.08 behind us; the bus behind us will catch our bus within 12 minutes.
4. How to reduce bunching
The formula for the multiplier of the dynamical system formed by bus performance is 1 + rb/h where r is the rate at which passengers board and alight, b is boarding time per passenger, and h is scheduled headway. However, since as our bus gets further and further behind, the bus behind us gets further ahead relative to schedule and certainly relative to us, the multiplier relevant to bunching is 1 + 2rb/h. The bus behind us will catch ours in

minutes, where d is the initial delay (so we start the calculation from x = h + d). On very frequent buses, this will happen very quickly: 2.5-minute headways with New Yorkish assumptions on passenger traffic density and conservative assumptions on boarding speed yield a catchup time of just 12 minutes. So how do we prevent this?
4.1. Reduce boarding time per passenger
Off-board fare collection allows passengers to board the bus more quickly, without paying the driver. This has the effect of greatly reducing b, from 3 seconds to about 1.2 per the TRB. Prepayment also allows all-door boarding, effectively halving the average boarding time per passenger at stops without large volumes of disembarking passengers.
But in addition to prepayment, there are other ways of reducing b. Low-floor buses allow passengers to get on and off more easily; the reason San Francisco’s numbers are higher than the TRB’s is that San Francisco assumes a mostly high-floor bus fleet, whereas on the low-floor fleets boarding is much faster (in fact, faster on low-floor buses without prepayment than on high-floor buses with).
Adding more doors is desirable as well. The typical 12-meter bus has two doors, but some cities have purchased three-door buses, such as Nice and Florence. The typical 18-meter accordion bus has three doors, but in Florence I have seen four-door accordion buses; in contrast, the older accordion buses in New York only had two doors, slowing down boarding and alighting at busy stations. Per TRB data, three-door buses reduce b from 3 seconds to 0.9, or 0.015 minutes. One third the multiplier means roughly three times the time it takes to bunch.
4.2. Use bigger buses
The multiplier depends only on the rate at which passengers wish to board our route. Adding more bus service will reduce r (by spreading boardings across more service-hours) and h (by adding more frequency) at the same rate, but make it take less time for bunching to occur. Just running less service means passengers take longer to get on each bus, but also means that the passenger load per stop is less sensitive to fixed delays occurring upstream.
Of course, running less service is cruel to passengers and can discourage ridership due to a negative frequency-ridership spiral or (on the busiest routes) inadequate capacity. But running bigger buses to compensate can provide the necessary capacity while also helping reduce b through faster access and egress. As noted in section 4.1, accordion buses should have four doors, to minimize loading time.
4.3. Reduce random variability
None of this discussion would matter if we were guaranteed that buses would run exactly on schedule. Of course, they don’t and we cannot get such guarantee. However, we can look for treatments that would make initial delays less common. All-door boarding is one such treatment, in addition to its effect on average boarding time per passenger, because one of the factors that can cause delays is an unexpected wave of passengers all getting on and having to queue one at a time, for example if they come off class or transfer from a full train. Schools with synchronized class times can overload transit networks for the first few minutes after classes end. And in Shanghai, I had to wait 20 minute to buy a metro ticket coming off a full intercity train since two of the three ticket machines were broken and the train was full of visitors who didn’t already have the Shanghai Public Transportation Card.
But as Eric’s interviews with drivers suggest, the biggest single source of variability is traffic and not unusual passenger loads. Bus lanes reduce the impact of traffic, but may not reduce variability, since cars may block them unexpectedly. This suggests that better enforcement of bus lanes could improve schedule regularity and reduce bunching further downstream.
Another source of variability is traffic lights. Traffic lights are discrete: they’re red or green, and a bus that misses the light will be delayed by a full phase, which in New York means about 45 seconds (and in Bangkok means 3 or even 5 minutes). Giving buses signal priority even in its weakest form entails lengthening the green phase in the direction the bus travels in for a few seconds to let a bus through and avoid making it wait a full red phase. This would keep a lid on the maximum variability that a single intersection can produce. Note also that it’s very easy for a bus to be delayed at two successive intersections, for example if traffic is such that it’s a hair slower than usual, forcing it to miss two lights in rapid succession. In this case, 1.5-minute initial delays are routine, setting up bunching later.
4.4. Dispatch buses to maintain even headways at terminals
The most brutal way to eliminate bunching is to have dispatchers tell bus drivers to sit still for a few minutes if the bus either behind or ahead of them is too far behind. The subway in New York would do that to the trains to maintain something that to a manager at control center looked like even headways (“wait assessment”), which multiple independent sources have told me is responsible for falling subway speeds and increased delays. This brutal approach is unlikely to provide better service to riders.
However, telling bus drivers to sit still to maintain even headways has no such downside when it is done at the terminus of the bus route. At most, agencies would have to pay bus drivers some overtime, which is probably swamped by the positive effect reducing bunching has on ridership (or for that matter the fact that reducing bunching permits the transit agency to provide the same effective frequency with fewer service-hours).
4.5. More empirical research is needed
This section is a lot less quantitative than sections 1, 2, and 3, owing to the fact that we are stepping away from strict modeling. While quantifying the effect of low floors, more doors, bigger buses, and prepayment is easy within the confines of the model, quantifying the initial shock to ridership discussed in subsection 4.3 is more difficult. There is a range of plausible shocks, and the serious questions to ask are along the lines of “what is the 90% confidence interval of the travel time on each segment?”.
The literature review I’ve done for signal priority in particular is not comprehensive, but suggestive that there is no research yet in that direction. Figuring out exactly how common initial delays are and which treatments can reduce them by how much must be the next step.