Skip to content

Instantly share code, notes, and snippets.

@bessangel
Forked from jsoma/.gitignore
Created March 13, 2023 23:08
Show Gist options
  • Save bessangel/c4d9a3cb51454cbdb43146515605da16 to your computer and use it in GitHub Desktop.
Save bessangel/c4d9a3cb51454cbdb43146515605da16 to your computer and use it in GitHub Desktop.
Tutorial on how to build a most-popular-thing-in-a-grid map in QGIS and pandas

This critique eventually became a tutorial on very specific kind of map. Sorry!

You can make a design critique request using this Google form

Critique: Convenience Stores of Japan (Map)

Maps are always difficult. Let's take a look at a map of convenience stores in Japan:

Cool, right? There's clearly a geographic pattern between the different kinds of stores, so of course you'll break them into small multiples.

But while that shows the distribution of stores across Japan, it's also a population map, right? More people means more convenience stores? So instead, let's go with a per-capita map.

You might think next to make a per-capita map for each different company, but... honestly, it's going to be pretty boring. Instead, we're going to do something a little different!

The interesting thing about the different chains is that they're in different parts of the country. Some places are bigger in Tokyo, some up north, some down south, etc. We could do a map of prefectures and point out the most popular chain in each....... but let's think bigger!

Doing something like this map by Nathan Yau of FlowingData is our new target.

Here's the plan: draw a grid over Japan. For each segment of our grid, find the most popular pizza place in it. Sound ok?

It's not as hard as you'd think, but we need to adjust your data first.

Adding a number column

The original dataset looks like this:

import pandas as pd
df = pd.read_csv("all_conbini_brand.csv")
df.head()
. address lat lon store_sub brand
0 北海道札幌市中央区南十九条西15丁目1-1 43.034146 141.3369963 えび膳 その他
1 北海道札幌市中央区南十条西12丁目1-31 43.0463297 141.3396258 おおい西屯田通 その他
2 北海道札幌市中央区北2条西4札幌三井JPビル1階 43.0638861 141.3503901 JPローソン赤れんがテラス ローソン
3 北海道札幌市中央区南四条東2丁目 43.0571332 141.3597215 スパー南4条店川村 その他
4 北海道札幌市中央区南四条西4丁目 43.0557821 141.3522956 セイコーマートあさの セイコーマート

The last column - brand - is the name of the chain. We're going to ask QGIS to find the most popular brand in each segment of the grid we make, but QGIS doesn't like strings very much. Before we import into QGIS we need to tell it to convert each brand into a number.

We'll take advantage of this by using the category datatype, which allows us to convert the column into a series of codes (we did this in the Buzzfeed homework about planes, I think!).

df['brand_number'] = df.brand.astype('category').cat.codes
df.head()
. address lat lon store_sub brand brand_number
0 北海道札幌市中央区南十九条西15丁目1-1 43.034146 141.3369963 えび膳 その他 0
1 北海道札幌市中央区南十条西12丁目1-31 43.0463297 141.3396258 おおい西屯田通 その他 0
2 北海道札幌市中央区北2条西4札幌三井JPビル1階 43.0638861 141.3503901 JPローソン赤れんがテラス ローソン 7
3 北海道札幌市中央区南四条東2丁目 43.0571332 141.3597215 スパー南4条店川村 その他 0
4 北海道札幌市中央区南四条西4丁目 43.0557821 141.3522956 セイコーマートあさの セイコーマート 1

Converting strings to numbers is kind of possible using QGIS, but way way way way easier in pandas.

But later on, how are we going to figure out what codes match what categories? We'll use this code and save a CSV for later.

codes = pd.DataFrame(
    enumerate(df.brand.astype('category').cat.categories),
    columns=['code', 'category']
)
codes.to_csv("code_match.csv", index=False)
codes

Building your grid in QGIS

Start up QGIS, and load in the CSV file you saved.

Make sure it has the number category we added!

Now we'll need to draw a grid across Japan. Use Vector > Research Tools > Create Grid...

Before you do anything, you'll need to Change your Grid CRS by clicking the globe in the bottom right-hand corner. By default you're probably looking at WGS84, which doesn't let you pick 'kilometers' as your size. Change your CRS to almost anything else (I'm using a version of Google Maps).

  • Grid type: I'm doing rectangle just to make it kind of match the other viz
  • Grid extent: Click the ... on the far right and pick "Use canvas extent"
  • Horizontal and Vertical spacing: How big do you want your shapes to be? Be sure to pick kilometers!!!!
  • Horizontal and Vertical overlay: How much of an overlap there is between neighborhoring shapes. I put 0 here, but I think the final colored map looks better if you add a little overlap.

Then click Run, and then Close. Tada! A beautiful, horrible grid.

Clearing out the grid

If you tried to find the most popular store in each grid point now, it would take a long long time, because it's a lot of grid points. Since most of the grid doesn't overlap with our points, we can delete them!

Follow this tutorial to delete the parts of the grid that do not overlap with our stores.

Find the most popular chain

Open up the Processing Toolbox by clicking Processing > Toolbox.

Search for "join". If you have a newer version of QGIS, you're looking for Join attributes by location (summary). If you have an older version, it's just Join attributes by location. Double-click it.

We want to join the grid layer with our point layer, and only count points contained within each grid shape.

You'll want to summarize the brand_number field, and you'll pick majority as the kind of summary to calculate. This will save the most popular brand_number to each of our new grid elements.

Once you hit run, it might take a long time. If it takes a really long time: did you remove the grid elements that don't overlap stores?

Once it's done, you have a new layer that looks just like your old one.

But open up the attribute table, and you'll see: it has a new column! This new column is the code of the most popular store in the area.

You can use this code to set a color scale.

Labels for our new graphic

You can also add the code_match.csv file we created long long ago, and add a join to our grid. That will allow us to have actual names of stores for with each code, instead of just the number.

Yes, I'm pointing at codes_match, but be sure you're doing right click > Properties for Joined Layer (the grid layer) in order to create the join.

Now you can use those codes to color things however you want!

Mine is pretty ugly, yes. I think I recommend:

  • Converting less common categories to "Other" before you start
  • Adding a litle bit of overlap between the shapes
  • After you are done, changing your projection to something better for your area (e.g. USA gets Albers) - if you do this after creating your grid, it twists and distorts the shapes, which I think looks fun.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment