# Optimization

This post may be of general interest to people looking at optimization as a concept; it’s something I wish I’d understood when I taught calculus for economics. The transportation context is network optimization – there is a contrast between the sort of continuous optimization of stop spacing and the discrete optimization of integrated timed transfers.

**Minimum and maximum problems: short background**

One of the most fundamental results students learn in first-semester calculus is that minimum and maximum points for a function occur when the derivative is zero – that is, when the graph of the function is flat. In the graph below, compare the three horizontal tangent lines in red with the two non-horizontal ones:

A nonzero derivative – that is, a tangent line slanting up or down – implies that the point is neither a local minimum nor a local maximum, because on one side of the point the value of the function is higher and on the other it is lower. Only when the derivative is zero and the tangent line is flat can we get a local extreme point.

Of course, a local extreme point does not have to be a global one. In the graph above, there are three local extreme points, two local maxima and one local minimum, but only the local maximum on the left is also a global maximum since it is higher than the local maximum on the right, and the local minimum is not a global minimum because the very left edge of the graph dips lower. In real-world optimization problems, the global optimum is one of the local ones, rather than an edge case like the global minimum of the above graph.

First-semester calculus classes love giving simplified min/max problems. This class of problems is really one of two or three serious calc 1 exercises; the other class is graphing a function, and the potential third is some integrals, at universities that teach the basics of integration in calc 1 (like Columbia and unlike UBC, which does so in calc 2). There’s a wealth of functions that are both interesting from a real-world perspective and doable by a first-semester calc student, for example maximizing the volume of some shape with prescribed surface area.

My formulas for stop spacing come from one of these functions. The overall travel time is a function of walking time, which increases as stops get farther apart, and in-vehicle time, which decreases as stops get farther apart. A certain stop spacing produces the minimum overall trip time; this is precisely the global minimum of the travel time function, which is ultimately of the form where *a* and *b* are empirical parameters depending on walking speed and other relevant variables.

**Continuous optimization**

The fundamental fact of continuous optimization, one I wish I’d learned in time to teach it to students, is that **at the optimum the derivative is zero, and therefore making a small mistake in the value of the optimum is not a big problem**.

What does “mistake” mean in this context? It does not mean literally getting the computation wrong. There is no excuse for that. Rather, it means choosing a value that’s slightly suboptimal for ancillary reasons – perhaps small discontinuities in the shape of the network, perhaps political considerations.

Paul Krugman brings this concept up in the context of wages. The theory of efficiency wages asserts that firms often pay workers above the bare minimum required to get any workers at all, in order to get higher-quality workers and incentivize them to work harder. In this theory, the wage level is set to maximize employer productivity net of wages. At the employer’s optimum the derivative of profit is by definition zero, so a small change in wages has little impact to the employer. However, to the workers, any wage increase is good, as their objective function is literally their wage rather than profits. They may engage in industrial action to raise wages, or push for favorable regulations like a high minimum wage, and these will have a limited effect on profits.

In the context of transit, this has the obvious implication to wages – it’s fine to set them somewhat above market rate since the agency will get better workers this way. But there are additional implications to other continuous variables.

With stop spacing specifically, the street network isn’t perfectly continuous. There are more important and less important streets. Getting transit stops to align with major streets is important, even if it forces the stop spacing to be somewhat different from the optimum. The same is true of ensuring that whenever two transit lines intersect, there is a transfer between them. This is the reason my bus redesign for Brooklyn together with Eric Goldwyn involved drawing the map before optimizing route spacing – the difference between 400 and 600 meters between bus stops is not that important. For the same reason, my prescription for Chicago, and generally other American cities with half-mile grids of arterial roads, is a bus stop every 400 meters, to align with the grid distance while still hewing close to the optimum, which is about 500.

When I talked about stop consolidation with a planner at New York City Transit who worked on the Staten Island express bus redesign, the planner explained the philosophy to me: “get rid of every other stop.” In the context of redesigning a single route, this is an excellent idea as well: the process of adding and removing bus stops in New York is not easy, so minimizing the net change by deleting stops at regular intervals so as to space the remaining stops close to the optimum is a good idea.

The world of public transit is full of these tradeoffs with continuous variables. It’s not just wages and interstations. Fares are another continuous variable, involving particular tensions as different political factions have different objective functions, such as revenue, social rate of return, and social rate of return for the working class alone. Frequency is a continuous variable too in isolation. Top speed for a regional train is in effect a continuous variable. All of these have different optimization processes, and in all cases, it’s fine to slightly deviate from the strict optimum to fulfill a different goal.

**Discrete optimization**

Whereas continuous optimization deals with flat tangent lines, discrete optimization may deal with delicate situations in which small changes have catastrophic consequences. These include connections between different lines, clockface scheduling, and issues of integration between different services in general.

An example that I discussed in the early days of this blog, and again in a position paper I just wrote to some New Hampshire politicians, is the Lowell Line, connecting Boston with Lowell, a distance of 41 km. The line is quite straight, and were it electrified and maintained better, trains could run at 160 km/h between stops with few slowdowns. The current stop spacing is such that the one-way trip time would be just less than half an hour. The issue is that it matters a great deal whether the trip time is 25 or 27 minutes. A 25-minute trip allows a 5-minute turnaround, so that half-hourly service requires just two trainsets. A 27-minute trip with half-hourly service requires three trainsets, each spending 27 minutes carrying passengers and 18 minutes depreciating at the terminal.

A small deterioration in trip time can literally raise costs by 50%. It gets to the point that extending the line another 50 kilometers north to Manchester, New Hampshire improves operations, because the Lowell-Manchester trip time is around 27-28 minutes, so the extension can turn a low-efficiency 27-minute trip into a high-efficiency 55-minute trip, providing half-hourly service with four trainsets.

In theory, frequency is a continuous variable. However, in the range relevant to regional rail, it is discrete, in fractions of an hour. Passengers can memorize a half-hourly schedule: “the inbound train leaves my stop at :10 and :40.” They cannot and will not memorize a schedule with 32-minute frequency, and needing to constantly consult a trip planner will degrade their travel experience significantly. Not even smartphone apps can square this circle. It’s telling that the smartphone revolution of the last decade has not been accompanied with rapid increase in ridership on transit lines without clockface schedules, such as those of the United States – if anything, ridership has grown faster in the clockface world, such as Germany and Switzerland.

Transit networks involving timed connections are another case of discrete optimization in which all parts of the network must work together, and small changes can make the network fall apart. If a train is late by a few minutes and its passengers miss their connection, the short delay turns into a long one for them. As a result, conscientious schedule planners make sure to write timetables with some contingency time to recover from delays; in Switzerland this is 7%, so in practice, out of every 15 minutes, one minute is contingency, typically spent waiting at a major station.

But this gets even more delicate, because different aspects of the transit network impact how reliable the schedule is. If it’s a bus, it matters how much traffic there is on the line. Buses in traffic not reliable enough for tight connections, so optimizing the network means giving buses dedicated lanes wherever there may be traffic congestion. Even though it’s a form of optimization, and even though there’s a measure of difficulty coming from political opposition by drivers, it is necessary to overrule the opposition, unlike in continuous cases such as wages and fares.

Infrastructure planning for rail has the same issues of discrete optimization. It is necessary to design complex junctions to minimize the ability of one late train to delay other trains. This can take the form of flying junctions or reducing interlining; in Switzerland there are also examples of pocket tracks at flat junctions where trains can wait without delaying other trains behind them. Then, the decision of how much to upgrade track speed, and even how many intermediate stations to allow on a line, has to come from the schedule, in similar vein to the Lowell Line’s borderline trip time.

**Continuous and discrete optimization**

Many variables relevant to transit are in theory continuous, such as trip time, frequency, stop spacing, wages, and fares. However, some of these have discontinuities in practice. Stop spacing on a real-world city street network must respect the hierarchy of more and less important destinations. Frequency and trip times are discrete variables except at the highest intensity of service, perhaps every 7.5 minutes or better; 11-minute frequency is worse to the passenger who has to memorize a difficult schedule than either 10- or 12-minute frequency.

New York supplies a great example showcasing how bad it can be to slavishly hew to some optimal interstation and not consider the street network. The Lexington Avenue Line has a stop every 9 blocks from 33rd Street to 96th, offset with just 8 blocks between 51st and 59th and 10 between 86th and 96th. In particular, on the Upper East Side it skips the 72nd and 79th Street arterials and serves the less important 68th and 77th Streets instead. As a result, east-west buses on the two arterials cross Lexington without a transfer.

Just east of Lex, there is also a great example of optimization on Second Avenue Subway. The stops on Second Avenue are at 72nd, 86th, and 96th, skipping 79th. It turns out that skipping 79th is correct – the optimum for the subway is to the meter the planned stop spacing for the line between 125th and Houston Streets, so it’s okay to have slightly non-uniform stop spacing to make sure to hit the important east-west streets.

Frequency and trip times are subject to the Swiss maxim, **run trains as fast as necessary, not as fast as possible**. Hitting trip times equal to an integer or half-integer number of hours minus a turnaround time has great value, but small further speedups do not. Passengers still benefit from the speedup, but the other benefits of higher speed to the network, such as better connections and lower crew costs, are no longer present.

The most general rule here is really that continuous optimization tolerates small errors, whereas discrete optimization does not. Therefore, it’s useful to do both kinds of optimization in isolation, and then modify the continuous variable somewhat based on the needs of the discrete one. If you calculate and find that the optimal frequency for your bus or train is once every 16 minutes, you should round it to 15, based on the discrete optimization rule that the frequency should be a divisor of the hour to allow for clockface timetable. If you calculate and find that the optimal bus stop spacing is 45% of the distance between two successive arterial streets, you should round it to 50% so that every arterial gets a bus stop.

Getting continuous optimization right remains important. If the optimal stop spacing is 500 meters and the current one is 200 meters, the network is so far from the local maximum of passenger utility that the derivative is large and stop consolidation has strong enough positive effects to justify overruling any political opposition. However, it is subsequently fine to veer from the optimum based on discrete considerations, including political ones if removing every 1.7th bus stop is harder than removing every other stop. Close to the local maximum or minimum, small changes really are not that important.

Have you had a chance yet to look at Germany’s wishlist for nationwide integrated clockface schedules?

They call it Deutschlandtakt

For an overview about nationwide integrated clockface schedules in Switzerland and Germany you can take a look on my old article in German: https://www.zukunft-mobilitaet.net/42868/analyse/integraler-taktfahrplan-itf-schweiz-deutschland-deutschlandtakt-umsetzbarkeit-konzept. However, recent developments are not included yet. Just a few days ago, the German Federal Ministry of Transport has uploaded new slides and timetables for the Deutschland-Takt: https://www.bmvi.de//SharedDocs/DE/Artikel/E/zukunftsbuendnis-schiene.html

Note: Although integrated clockface schedules have some mathematical structure, in practice they are often not optimized algorithmically, but constructed manually and sometimes even discussed politically, like in the case of the Deutschland-Takt. However, there is research on mathematical optimization algorithms for timetabling and other planning problems in public transport. Here in Berlin, at the institutes where I’m studying math, there have been projects about the Berlin metro timetable, the Potsdam tram and bus line plan and the Karlsruhe tram line plan.

Max-plus.

So tropical!

Max-plus algebra is another mathematical concept connected with timetabling, but not for optimization but for stability analysis.

…wait, really? Do you have a reference to a paper on this? Because I’ve never thought of tropical geometry as an applied field to stability… (the paper I was in the middle of working on when I left academia made heavy use of the theory.)

I’m not so aware of tropical geometry in general, but during my exchange at TU Delft I had a lecture by Rob Goverde about the max-plus algebra approach for railway timetable analysis. Some papers:

https://www.sciencedirect.com/science/article/pii/S0191261506000208?via%3Dihub

Ooh, I emailed him asking for the full text of the second paper (whyyyyyyyy are things not on the arXiv?).

https://www.birmingham.ac.uk/schools/mathematics/news-and-events/events/2018/tropical-mathematics-optimisation-railways.aspx

Related, linear constraint solving is a lot easier on continuous real numbers than discrete integers. The former can be done in polynomial time by venerable and well studied algorithms, but to solve the latter optimally would theoretically require a computer with the ability to consider an unlimited number of possibilities simultaneously. Although in practice there are quite a few ‘good enough’ approximation methods for solving discrete linear constraint problems, these days, so it is a useful technique.

I was wondering whether to mention a randomly selected continuous optimization problem that I learned of because I went to college with someone who wrote a senior thesis proposing an approximate solution. It’s the smallest enclosing ball problem, and for fixed maximum error the approximate solution is O(nd) where d is the dimension and n is the number of ball; Wikipedia mentions a slightly earlier approximate algorithm in . So you can get pretty close to the optimum with approximation in this continuous case… which you sometimes can’t in discrete cases.

Now that you have a German takt, that pretty much dictates the takts of the nations surrounding it which have clock face Inter Cities with Germany. Something something degrees of freedom something.

It’s okay, DB has so many delays that surrounding countries can keep planning their operations independently ;). The fact that Germany is run by mildly competent individuals rather than the hacks who run transportation in America should not distract you from the fact that NS and SBB are both way more punctual than DB.

NS definitely works hard to improve punctuality but as someone who has lived in the Netherlands for one year I wouldn’t agree with a comparison that they are more punctual than DB … keep in mind that German regional trains are quite punctual, while the long-distance trains are quite unpunctual, but in the Netherlands long distances don’t play such a big role.

The Germans told a lot of lies about how reliable their trains were when I visited. They were all much less good than the official statistics.

The Netherlands and Switzerland are much smaller.

Mute opportunities for delay exist on 800 km than on 300.

Locally in Philadelphia, we have SEPTA, which runs trains on its own tracks except on one side of the S-Bahn, where it shares tracks with Amtrak (the former Pennsylvania RR side, but not the former Reading RR side). Effectively, these are three lines (Paoli Mainline and Northeast Corridor north and south to Trenton and Wilmington, respectively), of which Paoli and Trenton are four track lines, so SEPTA mostly keeps to itself on the outer tracks but Wilmington branch has only three, so more potential for conflict. This could be a problem if SEPTA eventually moves to a higher frequency service, coincidental with Amtrak doing the same.

For this reason, Alon, you’ll recall that in the past I have argued that SEPTA try to disentangle at least one of these, the Trenton line, from Amtrak to give it greater freedom in designing its takt around Amtrak’s. At the very least, the Chestnut Hill West line, should (and easily could) be shifted over to the Reading side. If Amtrak could be convinced to send its trains into a tunnel to avoid the slowdown of the curve at Frankford, this would give SEPTA even more freedom there. My guess is that southbound NEC Amtrak trains out of NY Penn Station are the most likely to be late…

Wait, why is the Trenton Line even entangled in the first place? It has two tracks for Amtrak and two for SEPTA; there is no conflict. And the slowdown at Frankford is imposed mostly by steam-era standards – the amount of eminent domain required to stretch the curve to 200 km/h is moderate, and at 180 I’m not even sure any takings are needed.

Platforms for local trains are on the outside, so trains branching on the flat junctions block the main (esp. at TTC, just south of North Philadelphia station, and at Shore Interlocking). It’s cheap to rebuild the platforms and rearrange Zoo, but it’s not on any agenda I’ve seen.

Indeed, my first reaction when I read the announcement was to wonder whether somebody had told Deutsche Bahn that they will have to run the trains on time to make this work. Or to use Goethe’s words “Die Botschaft hör’ ich wohl, allein mir fehlt der Glaube” (I hear the message well, but I lack faith). Especially since neither the federal government nor Deutsche Bahn have shown equal enthusiasm in coming up with the necessary funds to make Germany’s rail infrastructure fit for the Deutschlandtakt.

And the cynic in me wonders whether the whole thing is not motivated – at least in part – by a desire to raise market entrance barriers for competing train service operators.

There are a lot of question marks. Like the Erfurt-Nuremberg-Munich travel times that allow one of the three to be a hub. Or they’d have to reduce travel times on existing HSLs (maybe upgrade Ingolstadt-Munich?)

Punctuality, infrastructure funding and the right organization, in particular the conflict between an integrated timetable and open access, are certainly problems for which a solution still has to be found. Some of them might be addressed in the final “Schienenpakt” (Rail Pact) which is planned for 2020, so it’s perhaps too early to be pessimistic, but okay to be critical …

Deutschland-Takt is a good strategy for the countryside but not necessarily for the big agglomerations. What is still missing is a strategy to separate the infrastructure of the different train types – in particular ICE and S-Bahn, which will become more important as the population in the agglomerations and thus the demand within and between these agglomerations will grow – as much as possible in order to improve punctuality. Many current projects (Hamburg S4, Stuttgart 21, München 2. S-Bahn-Stammstrecke) even go in the wrong direction as they increase mixed traffic between suburban, regional and long-distance rail.

Formal entrance barriers for competitors have already been lifted for decades, but there a still only a few open access passengers trains, and there are good reasons for that: Considering the polycentric structure and the frequented railway network, they can complement the timetable on selected corridors, but not offer a second network with short transfers at hubs.

How does the second S-Bahn tunnel in Munich increase mixed traffic?

Not directly, but the express S-Bahn lines through the tunnel will mix both with classical S-Bahn lines as well as with regional lines when going further to the region. The second trunk route will surely relieve the first trunk route and third and fourth tracks will come on some other lines as well but on the other hand the line system will become more difficult and the problem of different platform heights in the system won’t be solved (unless they have some ideas to solve this issues which I don’t know).

Mixed traffic also increases at the S-Bahn station at the airport which is already used by regional trains from Regensburg which will be extended towards Erding and the southeast in the future.

Considering long-distance trains, there are still some high-speed line projects (e.g. between Frankfurt and Mannheim as well as between Hannover and Bielefeld), while others have been cancelled (e.g. between Hannover and Hamburg/Bremen as well as between Frankfurt and Würzburg).

Is there any easy disentanglement? In Paris there’s a really bad case of mixed traffic on the Cergy branch, with 6 tph all day on the RER A and 6 peak-only trains on Transilien L; they can disentangle this when the RER E extension to the west opens because then the RER E can take over the Poissy branch and create capacity for 12 peak RER A tph from Cergy, but so far the plan is not to do that but instead do a dopey system with 16 tph from the Gare de l’Est network to La Defense and 6 tph from Mantes-la-Jolie to Rosa Parks.

I haven’t thought about it deeply. Perhaps the mixed traffic in Munich turns out to be acceptable in practice. Perhaps it would be better to operate it as a regional rail tunnel not connected to the existing S-Bahn network.

As an alternative to the current project, a second S-Bahn trunk line using the southern ring railway was proposed. Another option would have been a diagonal line Obermenzing / Moosach – Nymphenburg – Hauptbahnhof -Sendlinger Tor – Giesing which would also have eliminated the turnaround at Ostbahnhof. However, both alternative routes would probably have been used less than the current project via Marienplatz.