New York Publishes a Bad Benchmarking Report
I’ve grown to intensely dislike benchmarking reports. It’s not that the idea of benchmarking bad. It’s that they omit crucial information – namely, the name of the system that one is compared with. The indicators always have a wide variety of values, and not being able to match them with systems makes it impossible to do sanity-checks, such as noticing if systems with high costs per car-km are consistently ones that run shorter trains. This way, those anonymized reports turn into tools of obfuscation and excusemongering.
The MTA in New York recently published such a report, including both US-wide and international benchmarking for the subway as well as commuter rail. The US benchmarking is with comparable American systems – exactly the ones I’d compare, with the systems listed by name as NTD data is wisely not anonymized. The international benchmarking for the subway is with CoMET, which includes most of the larger global systems as well as a handful of smaller ones, like Vancouver; for commuter rail, it’s with ISBeRG, which has an odd list of systems, omitting the RER (which is counted in CoMET), all of Japan except JR East, and any S-Bahn, skipping down to Australian systems, Cape Town, and Barcelona.
That, by itself, makes much of the international benchmarking worthless. The standard metric for operating costs is per car-km. This is covered in pp. 8-9, showing that New York has fairly average costs excluding maintenance, but the second highest maintenance costs. But here’s the problem: I’m seeing a comparison to an undifferentiated mass of other systems. One of them is an outlier in maintenance costs, even ahead of New York, but I do not know which it is, which means that I cannot look at it and see what it does wrong – perhaps it has an unusually old fleet, perhaps it is small and lacks scale, perhaps it is domestically viewed as scandal-ridden.
Far more useful is to look at complete data by name. For example, JICA has complete operating cost data for Japanese metro systems. Its tables are complete enough that we can see, for example, that overall operating costs are around $5/car-km for all systems, regardless of scale; so scale should not be too important, or perhaps Tokyo’s wealth exactly cancels out the scale effect. There are, on table 2.37 on PDF-p. 117, headcounts for most systems from which we can impute labor efficiency directly, using train-km data on PDF-p. 254; Yokohama gets 1,072 train-hours a year per driver at 35 km/h (the rough average speed I get from Hyperdia).
And here’s the thing: without the ability to fill in missing data like average speed, or to look at things the report didn’t emphasize, the report is not useful to me, or to other independent researchers. It’s a statement of excuses for New York’s elevated operating and maintenance cost, with officious proclamations and intimidating numbers.
For example, here’s the excuse for high maintenance costs:
High maintenance costs for NYCT are largely attributable to 24-hour service. Most COMET peer agencies shut down every night, allowing for four hours of continuous daily maintenance. In comparison, NYCT subway’s 24-hour service requires maintenance to occur within 20-minute windows between late night trains, reducing work efficiencies. Additionally, maintenance costs for NYCT have risen recently to support the improvements as part of the Subway Action Plan, which have led to a significant improvement to on-time performance year over year since inception.
Okay, so here we’re seeing what starts like a reasonable explanation – New York doesn’t have regular nighttime maintenance windows. But the other American systems studied do and they’d be above global average too; Boston has regular nighttime work windows but still can’t consign all track maintenance to them, and has almost the same maintenance cost per car-km as New York. Moreover, track maintenance costs per car-km should feature extensive scale effects – only at freight rail loads is the marginal track wear caused by each additional car significant – and New York runs long trains.
Then there is the Subway Action Plan line, which is a pure excuse. Other systems do preventive maintenance too, thank you very much. New York is not unusually reliable by global standards, and the benchmarking report doesn’t investigate questions like mean distance between failures or some measure of the presence of slow restrictions – and because it is anonymized, independent researchers can’t use what it does have and get answers from other sources.
The study has a section on labor costs, showing New York’s are much higher than those of some peer cities. Thankfully, that part is not anonymized, which means I can look at the cities with overall labor costs that are comparable to New York’s, like London, and ignore the rest; New York’s construction labor costs are higher than London’s by a factor of about 2, despite roughly even regionwide average wages. Unfortunately, a key attribute is missing: labor efficiency. The JICA study does better, by listing precise headcounts; but here the information is not given, which means that drawing any conclusion that is not within the purview of MTA’s endless cold war on its unions is not possible. As it happens, I know that New York is overstaffed, but only from other sources, never anonymized.
It’s worse with commuter rail. First of all, at the level of benchmarking, the study’s list of comparisons is so incomplete and so skewed (three Australian systems, again) that nothing it shows can be relevant. And second, commuter rail in North America comes with its own internal backward-looking culture of insularity and incompetence.
The report even kneecaps itself by saying,
While it is true that benchmarking provides useful insights, it is also important to acknowledge that significant differences exist among the railroads that pose challenges for drawing apples-to-apples conclusions, particularly when it comes to comparisons with international peers. Differing local economies, prevailing wages and collective bargaining agreement provisions can have dramatic impacts on respective labor costs. Government mandates, including safety regulations, vary widely, and each railroad exists in a unique operating environment, often with different service schedules, geographic layouts and protocols. Together these factors have also have a significant impact on relative cost structures.
To translate from bureaucratic to plain English, what they’re saying is that American (and Canadian) practices for commuter rail are uniquely bad, but controlling for them, everything is fine. The report then lists the following excuses, all of which are wrong:
• Hours of Operation: LIRR provides 24 hours of service 7 days per week, and MNR provides 20-22 hours of service 7 days a week
• Ungated System: Neither LIRR nor MNR operate gated systems, therefore they require onboard fare validation/collection
• Branch Service: Both LIRR and MNR run service to and from a central business district (New York City) and do not have ability to offer through-running service
• Electrification: Both LIRR and MNR operate over both electrified and non-electrified territory, thereby requiring both electric and diesel fleets
It’s impressive how much fraud – or, more likely, wanton indifference and incuriosity – can fit into just four bullet points. Metro-North’s hours of service are long, but so are those of the JR East commuter lines; the Yamanote Line runs 20 hours a day, which means the nighttime maintenance window is shorter. Ungated systems use proof-of-payment ticketing throughout Europe – I don’t know if Rodalies de Catalunya runs driver-only trains, but the partly-gated RER and the ungated S-Bahns in the German-speaking world do. Through-running is a nice efficiency but not all systems have it, and in particular Melbourne has a one-way loop system akin to that of the Chicago L instead of through-running. Finally, electrification on the LIRR and Metro-North is extensive and while their diesel tails are very expensive, they also sometimes exist in Europe, including in London on a line that’s partly shared with the Underground, though I don’t know if they do in the report’s comparison cases.
The report does not question any of the usual assumptions of American mainline rail: that it must run unusually heavy vehicles, that it run with ticket-punching conductors, etc.
For a much more useful benchmarking, without anonymization, let’s look at German S-Bahns briefly. There is a list of the five largest systems – Berlin, Munich, Hamburg, Frankfurt, Stuttgart – with ridership and headcounts; some more detail about Berlin can be found here. Those five systems total 6,200 employees; the LIRR has 7,671 and Metro-North 6,773. With 2,875 employees, the Berlin S-Bahn has more train-hours than the LIRR, Metro-North, and New Jersey Transit combined; about as many car-km pro-rated to car length as the LIRR times 1.5; and more ridership than all American commuter rail systems combined. The LIRR in other words has more workers than the largest five German S-Bahns combined while the Berlin S-Bahn has more riders than all American commuter rail systems combined.
The excuses in the report highlight some of the reasons why – the US sticks to ticket-punching and buys high-maintenance trains compliant with obsolete regulations – but omits many more, including poor maintenance practices and inefficient scheduling of both trains and crew. But those are not justifications; they are a list of core practices of North American commuter rail that need to be eliminated, and if the workers and managers cannot part with them, then they should be laid off immediately.

