In previous posts about modeling high-speed rail ridership, I used a gravity model for the estimation. While poking around with spreadsheets, I figured out that a good way to sanity-check the model is to run it on existing high-speed rail systems with known ridership. It turns out that the model fits the data decently but not amazingly, and tends to overestimate ridership at long distances (800 km+) and underestimate it at short ones.
The populations of metro areas A and B are in millions, distance is in km, and ridership is in millions per year in both directions combined.
I’ve tested the model on two datasets: Shinkansen, and Taiwan HSR. These are island systems with a finite, controllable number of stations; Taiwan, a single-line system, is especially easy to model. The km-points are taken from line lengths; but mini-Shinkansen lines have artificially inflated lengths to account for the greater travel time, by a factor of about 2.7, to be compatible with an average express train speed of about 220 km/h. This means the model will overrate their passenger-km, but it’s not a significant source of error as they are fairly small cities – were they bigger they’d get full Shinkansen.
Metro areas are combined, and when a metro area has several stations, they are merged and only the most prominent is depicted, such as Tokyo, Shin-Osaka, and Taipei.. In Japan I use the broader category of major metropolitan area wherever possible, with the exception of Shizuoka-Hamamatsu, which are not merged as they were distinct until recently and remain two separate city cores that only share suburbs on the margins. Otherwise I use the smaller metropolitan employment area, as the MMA is only defined for the largest cities, and not for (say) Aomori or Kanazawa.
In Taiwan there’s no real definition of metro area. The secondary cities are single-tier municipalities encompassing the metro area plus some rural areas; I take what Wikipedia calls the urban part, which is nearly the same as the municipality. Taipei and New Taipei are merged – there’s a stop in New Taipei but New Taipei is really a suburb of Taipei spreading in all directions; but Taoyuan is kept separate, as it tries to develop its own core and lies only in one direction from Taipei, to its west. Outside the cities I use county populations where the stop seems to serve the center of the county, but Chiayi is expansive and I focus on the independent Chiayi City plus the suburb the station is in, and Changhua’s station is very peripheral to the county, most of which is closer to Taichung.
Both countries charge similar fares – Wikipedia has Taiwan charging, in PPP terms, $0.25/p-km, which is close to the Shinkansen average, and compares with about $0.15/p-km in Continental Europe. In addition, both have linear population distribution, Japan along the Taiheiyo Belt and Taiwan along the west coast.
The model massively underrates the ridership of THSR. It believes ridership is 26 million a year, with a total of 4.465 billion p-km; the actual numbers are 67 million and 12 billion respectively as of 2019, per Wikipedia. I have not seen ridership by city pairs, only boardings per station. The numbers do not make it obvious if there is more very short-distance ridership than I expect. The average trip length I predict is 172 km; the actual average is 178. Taichung has slightly more ridership than Zuoying, where in reality Taichung and Kaohsiung have the same populations, but Zuoying is not quite at city center whereas Taichung also draws from Changhua County, whereas the Changhua station has very low ridership. Overall, to the extent the shape of the model is correct, the minimum of 500 km in the denominator cannot be too wrong – or, if it is, the minimum must be more than the Taipei-Kaohsiung distance of 339 km or not much less than it.
In Japan, the situation is less clear. Total Shinkansen ridership is 438 million as of financial year 2018-9, per Wikipedia; this is the last year before corona, as the years end on 3-31 and in March of 2020 Japanese ridership was already suppressed due to social distancing. Passenger-km on JR East, JR Central, and JR West totaled around 100 billion, with Hokkaido and Kyushu adding scant numbers, but these are railroad-km, and the Shinkansen charges based on the distance along the legacy line and not the Shinkansen, inflating p-km by somewhat less than 10%.
In contrast, my model thinks total Shinkansen ridership is 389 million and p-km sum to 170.815 billion. The 389 vs. 438 discrepancy is easy to explain – my model ignores intra-metropolitan trips, and we know that they exist because there are some Shinkansen commuters in towns like Mishima. However, 100 vs. 171 billion p-km is harder. For this, there are several explanations, all plausible, and yet none completely satisfactory:
- About 40 billion of the p-km involve riding through Tokyo, of which 21 billion are from the Tohoku Shinkansen and 19 from the Joetsu and Hokuriku Shinkansen. There are no through-trains, and the through-trips via Joetsu and especially Hokuriku are circuitous.
- Yamagata and Akita between them generate around 6 billion p-km per the model; this is an overestimate, as the spreadsheet does not distinguish km that are really stand-ins for trip time from km that are actually traveled.
- A total of 6.5 billion p-km per the model are diagonal between the Tohoku, Joetsu, and Hokuriku Shinkansen; in reality, connecting at Omiya or Takasaki is so circuitous that I expect nearly everyone drives.
- Inter-island trips are especially likely to be done by air. Tokyo-Fukuoka has a rail-air modal split of 7.4-92.6, over a distance of 5 hours, and Nagoya-Fukuoka is only 51-49, over a distance of 3:20. This is bad for rail by European standards, where 5 hours is typically 20-30% for rail and 3:20 is a clear majority, and even by intra-Honshu Japanese standards, where Tokyo-Hiroshima at 3:55 is 68-32 and Tokyo-Okayama at 3:15 is 70-30.
All trip categories above are disproportionately long, helping explain why the model underpredicts ridership while overpredicting p-km. Subtracting all of the above one gets to not much more than 100 billion.
The model does nail certain aspects of Shinkansen ridership. Tokyo-Sendai, Tokyo-Hiroshima, and Tokyo-Okayama are easy – the model was trained in part on those specific city pairs. But in adition, overall ridership out of Tokyo and Osaka is very close to total JR Central ridership in these two regions. The model slightly overpredicts Osaka but that is expected since it lumps the Keihanshin region together whereas JR Central would not count Kobe.
Nagoya is more overpredicted, and it is possible that it is uniquely auto-oriented and this slightly reduces rail ridership, by maybe 25% below modeled prediction. If that is what is happening, then the constant 500 in the denominator of the model as well as 75,000 in the numerator should be adjusted – the reason for the choice of 500 is that Tokyo-Nagoya and Tokyo-Osaka ridership levels both follow the same model if the exponent is 0.8 and distance is ignored; if in fact Nagoya has a 25% malus then to countermand it the constant in the maximum should be lowered slightly, to 430 or a little less.
It’s tempting to rewrite the model in terms of travel time and then set the constant at 2 hours (and not 2.5 hours as I did when trying to model Germany). But note that it’s far from enough to explain the model’s gross underprediction of Taiwanese HSR ridership, an underprediction that exists across all distances in Taiwan. Nor is it possible to lower the 75,000 constant in the numerator and address any of the underprediction of Taiwan.