tl;dr: Play with the app here
When you live in Paris, chances are you are (home or work) very close to a metro station, so when you want to meet with some friends, you usually end up picking another metro station as a meeting point. Yet, finding the optimal place to meet can easily become a complex problem considering the dense network we have. Now that the RATP (the public transport operator in Paris) had made some of their datasets available, this sounds like a good job to be solved with R and Shiny.
In the spirit of the current open data movement, the RATP has made available a number of datasets under the Etalab license, and among them, two are of a particular interest for us:
arret_ligne <- read.csv("ratp_arret_ligne.csv", header=F, sep="#", col.names=c("ID","Ligne","Type"), stringsAsFactors=F) arret_positions <- read.csv("ratp_arret_graphique.csv", header=F, sep="#", col.names=c("ID","X","Y","Nom","Ville","Type"), stringsAsFactors=F)
To state our problem more clearly, we are given initially a set of n metro stops among all N possible, and we want to find S the optimal stop where to meet. A first step will involve computing the distances among all metro stops (shortest path, preferably on a time scale rather than a space scale!), and the second step is to find some kind of “barycenter” of these n stops. For these purposes, we model our metro network as a graph. The shortest path among two stops can be found using the very common Dijkstra algorithm, while defining the “barycenter” can be a bit cumbersome. Using a geographic barycenter doesn’t make any sense (we might end up in a place with no stop, or even with the closest stop being physically far away from a duration perspective). The next thing could be to think of this problem as finding the centroid of the cluster formed by our n stops, using something in the spirit of k-means (which doesn’t need actual points in space but only distances), and mapping this centroid to our larger network, but empirically the results didn’t look sound. No point in complicating things, another way to think of this is merely as a minimax problem: finding the stop which minimizes the maximum distance from each of the n initial stops to S. And this is actually very easy to implement in R!
There are two technical problems worth highlighting:
So here we are, equipped with our distance matrix D between any two stops of the RATP metro network, ready to identify the optimal stop to meet when people come from, say, Barbès-Rochechouart, Bastille, Dupleix, and Pernety.
sources <- getIDFromStation(c("Barbès-Rochechouart", "Bastille", "Dupleix", "Pernéty")) target <- names(which.min(apply(D[,format(sources)], 1, max))) getStationFromID(target) # "Odéon"
That’s actually great because Odéon is a cool place to get together and have a drink. Thinking about it, it would actually be useful to integrate a penalty for metro stops without any nice bar around…
In case you have a similar problem one of these days, consider using that page, where a Shiny version of the app is available, along with a ggplot2 chart. I know this should be done in D3 for more interactivity, but that’s on my todo for sure!