New York Publishes a Bad Benchmarking Report

I’ve grown to intensely dislike benchmarking reports. It’s not that the idea of benchmarking bad. It’s that they omit crucial information – namely, the name of the system that one is compared with. The indicators always have a wide variety of values, and not being able to match them with systems makes it impossible to do sanity-checks, such as noticing if systems with high costs per car-km are consistently ones that run shorter trains. This way, those anonymized reports turn into tools of obfuscation and excusemongering.

The MTA in New York recently published such a report, including both US-wide and international benchmarking for the subway as well as commuter rail. The US benchmarking is with comparable American systems – exactly the ones I’d compare, with the systems listed by name as NTD data is wisely not anonymized. The international benchmarking for the subway is with CoMET, which includes most of the larger global systems as well as a handful of smaller ones, like Vancouver; for commuter rail, it’s with ISBeRG, which has an odd list of systems, omitting the RER (which is counted in CoMET), all of Japan except JR East, and any S-Bahn, skipping down to Australian systems, Cape Town, and Barcelona.

That, by itself, makes much of the international benchmarking worthless. The standard metric for operating costs is per car-km. This is covered in pp. 8-9, showing that New York has fairly average costs excluding maintenance, but the second highest maintenance costs. But here’s the problem: I’m seeing a comparison to an undifferentiated mass of other systems. One of them is an outlier in maintenance costs, even ahead of New York, but I do not know which it is, which means that I cannot look at it and see what it does wrong – perhaps it has an unusually old fleet, perhaps it is small and lacks scale, perhaps it is domestically viewed as scandal-ridden.

Far more useful is to look at complete data by name. For example, JICA has complete operating cost data for Japanese metro systems. Its tables are complete enough that we can see, for example, that overall operating costs are around $5/car-km for all systems, regardless of scale; so scale should not be too important, or perhaps Tokyo’s wealth exactly cancels out the scale effect. There are, on table 2.37 on PDF-p. 117, headcounts for most systems from which we can impute labor efficiency directly, using train-km data on PDF-p. 254; Yokohama gets 1,072 train-hours a year per driver at 35 km/h (the rough average speed I get from Hyperdia).

And here’s the thing: without the ability to fill in missing data like average speed, or to look at things the report didn’t emphasize, the report is not useful to me, or to other independent researchers. It’s a statement of excuses for New York’s elevated operating and maintenance cost, with officious proclamations and intimidating numbers.

For example, here’s the excuse for high maintenance costs:

High maintenance costs for NYCT are largely attributable to 24-hour service. Most COMET peer agencies shut down every night, allowing for four hours of continuous daily maintenance. In comparison, NYCT subway’s 24-hour service requires maintenance to occur within 20-minute windows between late night trains, reducing work efficiencies. Additionally, maintenance costs for NYCT have risen recently to support the improvements as part of the Subway Action Plan, which have led to a significant improvement to on-time performance year over year since inception.

Okay, so here we’re seeing what starts like a reasonable explanation – New York doesn’t have regular nighttime maintenance windows. But the other American systems studied do and they’d be above global average too; Boston has regular nighttime work windows but still can’t consign all track maintenance to them, and has almost the same maintenance cost per car-km as New York. Moreover, track maintenance costs per car-km should feature extensive scale effects – only at freight rail loads is the marginal track wear caused by each additional car significant – and New York runs long trains.

Then there is the Subway Action Plan line, which is a pure excuse. Other systems do preventive maintenance too, thank you very much. New York is not unusually reliable by global standards, and the benchmarking report doesn’t investigate questions like mean distance between failures or some measure of the presence of slow restrictions – and because it is anonymized, independent researchers can’t use what it does have and get answers from other sources.

The study has a section on labor costs, showing New York’s are much higher than those of some peer cities. Thankfully, that part is not anonymized, which means I can look at the cities with overall labor costs that are comparable to New York’s, like London, and ignore the rest; New York’s construction labor costs are higher than London’s by a factor of about 2, despite roughly even regionwide average wages. Unfortunately, a key attribute is missing: labor efficiency. The JICA study does better, by listing precise headcounts; but here the information is not given, which means that drawing any conclusion that is not within the purview of MTA’s endless cold war on its unions is not possible. As it happens, I know that New York is overstaffed, but only from other sources, never anonymized.

It’s worse with commuter rail. First of all, at the level of benchmarking, the study’s list of comparisons is so incomplete and so skewed (three Australian systems, again) that nothing it shows can be relevant. And second, commuter rail in North America comes with its own internal backward-looking culture of insularity and incompetence.

The report even kneecaps itself by saying,

While it is true that benchmarking provides useful insights, it is also important to acknowledge that significant differences exist among the railroads that pose challenges for drawing apples-to-apples conclusions, particularly when it comes to comparisons with international peers. Differing local economies, prevailing wages and collective bargaining agreement provisions can have dramatic impacts on respective labor costs. Government mandates, including safety regulations, vary widely, and each railroad exists in a unique operating environment, often with different service schedules, geographic layouts and protocols. Together these factors have also have a significant impact on relative cost structures.

To translate from bureaucratic to plain English, what they’re saying is that American (and Canadian) practices for commuter rail are uniquely bad, but controlling for them, everything is fine. The report then lists the following excuses, all of which are wrong:

• Hours of Operation: LIRR provides 24 hours of service 7 days per week, and MNR provides 20-22 hours of service 7 days a week

• Ungated System: Neither LIRR nor MNR operate gated systems, therefore they require onboard fare validation/collection

• Branch Service: Both LIRR and MNR run service to and from a central business district (New York City) and do not have ability to offer through-running service

• Electrification: Both LIRR and MNR operate over both electrified and non-electrified territory, thereby requiring both electric and diesel fleets

It’s impressive how much fraud – or, more likely, wanton indifference and incuriosity – can fit into just four bullet points. Metro-North’s hours of service are long, but so are those of the JR East commuter lines; the Yamanote Line runs 20 hours a day, which means the nighttime maintenance window is shorter. Ungated systems use proof-of-payment ticketing throughout Europe – I don’t know if Rodalies de Catalunya runs driver-only trains, but the partly-gated RER and the ungated S-Bahns in the German-speaking world do. Through-running is a nice efficiency but not all systems have it, and in particular Melbourne has a one-way loop system akin to that of the Chicago L instead of through-running. Finally, electrification on the LIRR and Metro-North is extensive and while their diesel tails are very expensive, they also sometimes exist in Europe, including in London on a line that’s partly shared with the Underground, though I don’t know if they do in the report’s comparison cases.

The report does not question any of the usual assumptions of American mainline rail: that it must run unusually heavy vehicles, that it run with ticket-punching conductors, etc.

For a much more useful benchmarking, without anonymization, let’s look at German S-Bahns briefly. There is a list of the five largest systems – Berlin, Munich, Hamburg, Frankfurt, Stuttgart – with ridership and headcounts; some more detail about Berlin can be found here. Those five systems total 6,200 employees; the LIRR has 7,671 and Metro-North 6,773. With 2,875 employees, the Berlin S-Bahn has more train-hours than the LIRR, Metro-North, and New Jersey Transit combined; about as many car-km pro-rated to car length as the LIRR times 1.5; and more ridership than all American commuter rail systems combined. The LIRR in other words has more workers than the largest five German S-Bahns combined while the Berlin S-Bahn has more riders than all American commuter rail systems combined.

The excuses in the report highlight some of the reasons why – the US sticks to ticket-punching and buys high-maintenance trains compliant with obsolete regulations – but omits many more, including poor maintenance practices and inefficient scheduling of both trains and crew. But those are not justifications; they are a list of core practices of North American commuter rail that need to be eliminated, and if the workers and managers cannot part with them, then they should be laid off immediately.


  1. William

    Surprised you didn’t mention the part where they claimed One person operation is infeasible in the subway because of train length. Off the top of my head, there are at least 7 companies (Deutsche Bahn, SBB, BART, Thameslink, RATP and SNCF) that operate trains longer than MTA’s longest subway trains with one person.

    The lack of knowledge about proof of payment fare collection, which one can find on Buffalo’s light rail, or commuter rail in Toronto, is astounding.

      • Nilo

        They actually mention proof of payment as another possibility somewhere in this report. They just keep it to a portion of a sentence, because otherwise it would be very embarrassing.

    • Henry

      I don’t think anyone at MTA doesn’t know the implications of OPTO, it’s been attempted in several contract negotiations in the past.

      Keep in mind that Hochul became governor by default, and so I imagine is not trying to rock too many boats before an actual election this November.

    • Henry

      I imagine Hochul is not trying to poke any bears before the election in November. She doesn’t have an electoral mandate, and the field is essentially wide open.

      The MTA management doesn’t have OPTO, but not for lack of trying to push it through contract negotiations.

    • Sascha Claus

      If there is any non-IC(E) train* in Germany with ‘conductor’, I’d be really surprised if they are doing anything mandatory like opening/closing doors. They are just checking tickets, maybe selling them, and could be dispensed with safely; so even the trains with ‘conductors’ are operationally OPTO. This allows to give them a fancy job name (Kundenbetreuer im Nahverkehr—KiN) and less money per shift.
      *—maybe excepting the Bavarian “RE200” with its IC carriages

      • Alon Levy

        I think I saw a conductor on some regional train in BW (Karlsruhe -> Neustadt, connecting to the Rhine-Neckar S-Bahn to Kaiserslautern, because my direct TGV from Paris had been canceled and the next one was four hours later).

      • Oreg

        I think German REs (and maybe RBs?) still have conductors, probably one per train. They check tickets and announce delays.

        • Sascha Claus

          Due to the fare/ticketing organisation, a conductor (which is not legally a conductor (Zugbegleiter) who opens and closes the doors) is required for trains which are crossing the boundaries between traffic associations (Verkehrsverbünde) or are venturing into the uncharted territory outside of them, because validators are only for traffic association tickets; therefore you need a human to punch non-Verbundtickets.

          • Sascha Claus

            Good to see that you understood it. 🙂 Seeing my convoluted language now from 20 days away … but for a convoluted topic!
            (Validators means the brightly-colores machines where you insert your Verbundticket to have it stamped with date, time and tariff zone/location.)
            There are also cases where conductors for lines solely within one Verkehrsverbund are required by (and paid for) the public body that is ordering and paying the train service (like Leipzig – Geithain, where they also specified the livery for the railcars).

  2. John

    It’s fraud more than incuriosity, because they know proof of payment fare collection is used on heavy rail elsewhere, and that it is is an alternative to gates or on-board fare collection. That sentence you quoted is a deliberate attempt (on MTA’s part) to mislead the reader about their ability to control costs. The sentence, in context is:

    “From the operational perspective, MNR and LIRR operate in an ungated environment, which requires additional onboard train crew staffing to validate and collect tickets. This contrasts to most of the ISBeRG peer agencies, which have gated or proof-of-payment systems that do not require this level of staffing. The two railroads fall more in line with peer agencies when factoring this out of benchmarked agency operating costs.”

    Whoever wrote this paragraph deserves to be fired, as it is clear they don’t care about cost control. They know they can reduce costs with proof-of-payment fare collection, but pretend it’s not an option.

  3. Borners

    Does anyone know of any benchmark studies for non-TFL rail services in the UK? I’ve seen ones where the Tube/Overground/DLR are compared to other systems, but not the mainline franchises.

      • Borners

        Well Southeastern Railways has around 4000 employees for double LIRR’s daily ridership (around 600,000). That looks better not great.

        • Sascha Claus

          AFAIK, in the UK every train needs a guard (aka conductor), by law. That alone requires twice the amount of employees per train than, for example, the German conductorless S-Bahn.

          • Borners

            Well not on the London Overground that’s for sure. And a bunch of other operators like c2c are DOO too. But yes the outer Britain is backward and stupid when it comes to productivity. It was weird coming to the North of England from Japan seeing conductors on everyday trains.

          • William

            All metro London train services (except southwest) are DOO. Thameslink and LU are too. DOO is permitted in the UK.

  4. AJ

    The one major outlier might be the Hudson–Bergen Light Rail? In the NTD data, the O&M costs include the full DBOM payments to the 21st Century Rail Corporation (the private operator) because that’s the way the contract is structured; in other words, the NTD’s O&M figures include all of the capital and finance costs to build the HBLR, spread out across the life of the contract. When I did O&M benchmarking at Sound Transit with NTD data, I would always exclude the HBLR data because it was meaningless without the capital & finance costs stripped out.

  5. adirondacker12800

    then they should be laid off immediately.
    And replaced with who? Substitute buses they don’t have driven by drivers they don’t have for a year or so while new employees get trained? On roads they don’t have to a bus terminal they don’t have?
    Losing the capacity of the World Trade Center for a few years was uncomfortable. While a lot of the jobs downtown were relocated someplace else. The LIRR carried roughly five times as many commuters. So did MetroNorth. Who knows how much of previous ridership will return.

    • Alon Levy

      Over a few months, with newly-trained workers, to be trained by people who don’t even know what AREMA is or that America once had non-takt timetables. In the meantime, ridership is still down because of corona; ideally this should have been done in 2020, but 2022 is a pretty good time for this too.

  6. df1982

    Out of interest, why are Australian suburban rail networks a bad benchmark for NY commuter rail? Pre-Covid, Sydney had similar ridership to the three commuter rail networks combined, and it doesn’t have ticket-punchers (it still has two-person operation though, and a combination of staffed gates and POP for ticket control). Melbourne has driver-only operation and very lean levels of station staffing (probably too lean if you ask me). Both systems are fully electrified with standardised high-platforms, they run all day to mostly clockface schedules with very frequent service in S-Bahn style cores (although Melbourne’s city loop is bizarrely operated, as you note).

    I wouldn’t call either world’s best practice and both networks should be heading more in the direction of metro-style operations, but US operators could learn a thing or two even from them.

    • Alon Levy

      I don’t think they’re a bad benchmark at all. They’re fine, it’s just, they shouldn’t be 3 out of 6 international benchmarking examples; 3 out of 20, sure. When they’re 3 out of 6, it’s a case study masquerading as a benchmark.

  7. Matthew Hutton

    I’m not sure how much mixing there is with diesel and electric trains in the UK. Either a route is electric or they use diesel trains the whole way – the only exception is the very new hybrid trains they have.

    • William

      Southern operates diesel into London Bridge from Uckfield (via East Croydon). Chiltern diesels also run on LU territory, as Alon mentioned. SWT appears to run some diesel from Waterloo station to Exeter, though I don’t think it’s that frequent.

    • fjod

      There’s loads! William has mentioned some in the south (incl. 2tph diesel from Waterloo to Basingstoke on to Exeter, so not infrequent at all) but pretty much all electrified lines in the north and midlands of England, including the WCML and ECML, operate diesels alongside electric trains.

    • Alon Levy

      The LIRR and Metro-North do the same, more or less. The dual-modes they use are diesel locomotives that run under third rail for a bit – and if I remember correctly they use diesel even on powered sections outside the Manhattan tunnels.

  8. Herbert

    What do you think of the somewhat common German practice of changing nothing but the name of a service to “expand an S-Bahn” like happened with the “S6” and “S5” of Nuremberg S-Bahn in the last two schedule changes…

    • Alon Levy

      Oh, God, the Rhine-Neckar. They can call that an S-Bahn, but it’s in no way an S-Bahn; it’s a RegionalBahn that pretends to be something else.

      • Sascha Claus

        AFAIK, the Rhein-Neckar-S-Bahn at least shuffled around the timetables to make an S-Bahn every 20min out of three hourly Regionalbahn routes.with irregular spacing. That’s nothing compared to a certain two-hourly S4 at the other end of Germany.

  10. Nathanael

    My reading of the politics is that change at NYC Subway may be possible if the antagonistic situation with the union can be calmed down and the union can be brought in on a politics of expansion (London eliminated excess workers by promising them new jobs arising during expansion). I also believe the Metro-North unions may be open to this.

    Some LIRR unions have been documented to be engaged in systematic timecard fraud in the past, so a more hostile approach may be necessary there, but LIRR’s the one where management has historically been in bed with the union leaders (literally at least once).

    • Alon Levy

      Union politics of driver-only trains in London are pretty contentious too. One of the franchises, I think Southern, had a protracted strike about it and could only go OPTO because the Tory cabinet had given management assurances that it would back it against labor.

      Metro-North is only a hair better than the LIRR – compared with anything else it’s the same low productivity. And it was a Metro-North VP of engineering who, a year and a half after it had happened, still had no idea that FRA regulations had been reformed.

      • Nathanael

        I think the difference between Metro-North and LIRR is this:
        — Metro-North has ignorance and habit.
        — but LIRR is full of people who absolutely do know better, and are refusing to improve things because they are literally stealing money using the current system.

        I fully expect LIRR to be much, much harder to fix. I can imagine hiring people who are aware of best practices in Metro-North, or even getting union leaders who are, and I can imagine win-win-agreements.

        At LIRR, they know better, they just don’t wanna because it would obstruct the grift.

  11. Harold

    Hi Alon, interesting analysis. Important to point out that S-Bahn are not responsible for infrastructure or station maintenance (all done by DB Netz) which is a large component of cost and labour for a railway. Hence it’s not exactly a like for like comparison there when you are comparing staff numbers, although S-Bahn is likely far more efficient for operations than LIRR or MNR

