OSRM route server is an extremely useful tool for getting the driving distance/time through multiple locations. The route server requires data that has to be downloaded and processsed before it can be used to serve routes from.
Processing OSRM data for large region like North America can be a real challenge due to memory and disk size requirements. It's also really time consuming. If you cut and try from scratch, you will repeatedly run into some constraints and fail after hours of running.
The following are summary notes from trying this with eventual success.
Since most people don't have a machine with huge amount memory sitting around, doing this on AWS EC2 is a natural choice. Using Docker image like below is the easiest, but on AWS you need an instance with at least 64GB memory (I used m4.4xlarge). Even with 64GB memory, things get tight, so you should have some swap space also.
Alternatively, you can build osrm-backend and run native (i.e. not in Docker). Unfortunately, this is even more of a hassle on EC2 because OSRM requires many tools and libraries to be installed, including newer version of GCC than default, etc.
In comparison, building osrm-backend and processing OSRM route data was very straightforward on a MacPro 6 core + 32GB memory. You get latest tool chain and macOS manages memory demand very well. But then again you can't run a MacPro in a AWS VPC.
If you get bad_alloc() error or Docker container blows up, or if your machine starts to swap heavily and go non-responsive.... you just don't have enough memory. You are better off overallocating from the start because peak memory demand happens way into the process.
North America osm.pbf is ~9GB. The output data size is another 47GB or so for a total of 56GB just to house the data. Then you need ~10GB for the OS, and another few GB for swapfile. Overallocating here is also a good idea because you run out of disk space just when you think you are about done.
Set up Docker
# (stand up m4.4xlarge instance first)
sudo yum update -y
sudo yum install docker -y
sudo service docker start
sudo usermod -a -G docker ec2-user
Log out Log back in
Make sure docker is up and memory available
docker info
Add swap space
# Add 10GB swap space
sudo /bin/dd if=/dev/zero of=/var/swapfile bs=1M count=10240
sudo /sbin/mkswap /var/swapfile
sudo chmod 600 /var/swapfile
sudo /sbin/swapon /var/swapfile
Fetch and Process North America OSM data
# Fetch data. This is about 8GB. Appreciate Geofabrik for making it this easy.
wget http://download.geofabrik.de/north-america-latest.osm.pbf
# Process with Docker.
docker run -t -v $(pwd):/data osrm/osrm-backend osrm-extract -p /opt/car.lua /data/north-america-latest.osm.pbf
docker run -t -v $(pwd):/data osrm/osrm-backend osrm-partition /data/north-america-latest.osrm
docker run -t -v $(pwd):/data osrm/osrm-backend osrm-customize /data/north-america-latest.osrm
OSRM routing server uses 24GB memory when serving North American routes.
docker run -t -i -p 5000:5000 -v $(pwd):/data osrm/osrm-backend osrm-routed --algorithm mld /data/north-america-latest.osrm
Simple benchmark of getting 100 or so trip routes:
- EC2 m4.4xlarge instance 16vCPU/64GB - 11m25.716s
- MacPro 6core/32GB - 4m1.637s
Thank you for this, it helped me with my Google Analytics case study.
I only used the processing and running parts of this HOW TO but would have been lost without it.
I could process and run the server on an old laptop since the only city I needed was Chicago.
So if there is anybody who wants to get distances for coordinates for the
Google Data Analytics - Case study 1 Cyclistic Bike-Share
here are some lines that can help you!
Fetch
Process with Docker
Hopes this helps people save a lot of money!