Skip to content

Instantly share code, notes, and snippets.

@ldodds
Created May 15, 2013 10:22
Show Gist options
  • Save ldodds/5583004 to your computer and use it in GitHub Desktop.
Save ldodds/5583004 to your computer and use it in GitHub Desktop.
Summarise dbpedia geographic coverage by reverse geocoding the lat/lng data
#!/bin/bash
#
# Script to download English dbpedia lat/lng data and produce a
# summary of geographic coverage
#
# You'll need to have local-geocoder Ruby application installed:
#
# sudo gem install local-geocoder
#
# http://github.com/aishfenton/local-geocoder
#
#
# Script also assumes you have curl and bzip2 installed
mkdir dbpedia-geo-data
cd dbpedia-geo-data
echo "Downloading and unpacking dbpedia data"
curl -O http://downloads.dbpedia.org/3.8/en/geo_coordinates_en.nt.bz2
bzip2 -d geo_coordinates_en.nt.bz2
echo "Extracting georss:point predicates"
grep georss geo_coordinates_en.nt >dbpedia-points.nt
echo "Starting Geocoding"
# Reverse the default local_geocode template which assumes lng,lat
# whereas dbpedia has lat,lng
cat dbpedia-points.nt | local_geocode --template "(?<lat>\-?\d+.?\d+)[,\t ]\s?(?<lng>\-?\d+.?\d+)" >places
echo "Summarising"
cat places | cut -d"," -f1 | sort -n | uniq -c >results
cat results | sed -E 's/\s*([0-9]+) /\1,/g' >results.csv
cat results
450 Afghanistan (AFG)
462 Albania (ALB)
1221 Algeria (DZA)
205 Angola (AGO)
7525 Antarctica (ATA)
2278 Argentina (ARG)
1048 Armenia (ARM)
17575 Australia (AUS)
1531 Austria (AUT)
2226 Azerbaijan (AZE)
778 Bangladesh (BGD)
630 Belarus (BLR)
1679 Belgium (BEL)
95 Belize (BLZ)
186 Benin (BEN)
82 Bermuda (BMU)
96 Bhutan (BTN)
716 Bolivia (BOL)
3272 Bosnia and Herzegovina (BIH)
296 Botswana (BWA)
5991 Brazil (BRA)
56 Brunei (BRN)
1881 Bulgaria (BGR)
572 Burkina Faso (BFA)
59 Burundi (BDI)
265 Cambodia (KHM)
274 Cameroon (CMR)
18975 Canada (CAN)
89 Central African Republic (CAF)
117 Chad (TCD)
815 Chile (CHL)
6384 China (CHN)
1377 Colombia (COL)
205 Costa Rica (CRI)
1880 Croatia (HRV)
130 Cuba (CUB)
89 Cyprus (CYP)
3312 Czech Republic (CZE)
304 Democratic Republic of the Congo (COD)
1630 Denmark (DNK)
30 Djibouti (DJI)
147 Dominican Republic (DOM)
67 East Timor (TLS)
264 Ecuador (ECU)
656 Egypt (EGY)
182 El Salvador (SLV)
19 Equatorial Guinea (GNQ)
77 Eritrea (ERI)
2316 Estonia (EST)
1109 Ethiopia (ETH)
117 Falkland Islands (FLK)
138 Fiji (FJI)
1418 Finland (FIN)
7135 France (FRA)
42 French Guiana (GUF)
14 French Southern and Antarctic Lands (ATF)
91 Gabon (GAB)
127 Gambia (GMB)
344 Georgia (GEO)
11057 Germany (DEU)
371 Ghana (GHA)
1395 Greece (GRC)
229 Greenland (GRL)
453 Guatemala (GTM)
55 Guinea Bissau (GNB)
337 Guinea (GIN)
132 Guyana (GUY)
100 Haiti (HTI)
337 Honduras (HND)
1923 Hungary (HUN)
442 Iceland (ISL)
10157 India (IND)
839 Indonesia (IDN)
718 Iran (IRN)
420 Iraq (IRQ)
2165 Ireland (IRL)
1156 Israel (ISR)
4546 Italy (ITA)
511 Ivory Coast (CIV)
288 Jamaica (JAM)
13119 Japan (JPN)
162 Jordan (JOR)
255 Kazakhstan (KAZ)
486 Kenya (KEN)
192 Kosovo (XKX)
67 Kuwait (KWT)
102 Kyrgyzstan (KGZ)
82 Laos (LAO)
709 Latvia (LVA)
211 Lebanon (LBN)
54 Lesotho (LSO)
129 Liberia (LBR)
182 Libya (LBY)
728 Lithuania (LTU)
558 Luxembourg (LUX)
413 Macedonia (MKD)
1239 Madagascar (MDG)
101 Malawi (MWI)
1746 Malaysia (MYS)
312 Mali (MLI)
110 Malta (MLT)
189 Mauritania (MRT)
2757 Mexico (MEX)
755 Moldova (MDA)
462 Mongolia (MNG)
154 Montenegro (MNE)
446 Morocco (MAR)
271 Mozambique (MOZ)
386 Myanmar (MMR)
272 Namibia (NAM)
805 Nepal (NPL)
3499 Netherlands (NLD)
36 New Caledonia (NCL)
4285 New Zealand (NZL)
197 Nicaragua (NIC)
594 Nigeria (NGA)
342 Niger (NER)
26763 nil
283 North Korea (PRK)
4083 Norway (NOR)
116 Oman (OMN)
2127 Pakistan (PAK)
145 Panama (PAN)
225 Papua New Guinea (PNG)
125 Paraguay (PRY)
871 Peru (PER)
1693 Philippines (PHL)
46316 Poland (POL)
1523 Portugal (PRT)
696 Puerto Rico (PRI)
79 Qatar (QAT)
3701 Republic of Serbia (SRB)
68 Republic of the Congo (COG)
2803 Romania (ROU)
4995 Russia (RUS)
76 Rwanda (RWA)
268 Saudi Arabia (SAU)
159 Senegal (SEN)
69 Sierra Leone (SLE)
2510 Slovakia (SVK)
555 Slovenia (SVN)
52 Solomon Islands (SLB)
91 Somalia (SOM)
1233 South Africa (ZAF)
1266 South Korea (KOR)
98 South Sudan (SDS)
8794 Spain (ESP)
1787 Sri Lanka (LKA)
152 Sudan (SDN)
129 Suriname (SUR)
70 Swaziland (SWZ)
2612 Sweden (SWE)
2583 Switzerland (CHE)
330 Syria (SYR)
611 Taiwan (TWN)
125 Tajikistan (TJK)
807 Thailand (THA)
37 The Bahamas (BHS)
67 Togo (TGO)
98 Trinidad and Tobago (TTO)
290 Tunisia (TUN)
3565 Turkey (TUR)
95 Turkmenistan (TKM)
613 Uganda (UGA)
1685 Ukraine (UKR)
361 United Arab Emirates (ARE)
45917 United Kingdom (GBR)
441 United Republic of Tanzania (TZA)
122230 United States of America (USA)
60 Uruguay (URY)
187 Uzbekistan (UZB)
12 Vanuatu (VUT)
361 Venezuela (VEN)
982 Vietnam (VNM)
554 West Bank (PSE)
8 Western Sahara (ESH)
860 Yemen (YEM)
226 Zambia (ZMB)
276 Zimbabwe (ZWE)
450 Afghanistan (AFG)
462 Albania (ALB)
1221 Algeria (DZA)
205 Angola (AGO)
7525 Antarctica (ATA)
2278 Argentina (ARG)
1048 Armenia (ARM)
17575 Australia (AUS)
1531 Austria (AUT)
2226 Azerbaijan (AZE)
778 Bangladesh (BGD)
630 Belarus (BLR)
1679 Belgium (BEL)
95 Belize (BLZ)
186 Benin (BEN)
82 Bermuda (BMU)
96 Bhutan (BTN)
716 Bolivia (BOL)
3272 Bosnia and Herzegovina (BIH)
296 Botswana (BWA)
5991 Brazil (BRA)
56 Brunei (BRN)
1881 Bulgaria (BGR)
572 Burkina Faso (BFA)
59 Burundi (BDI)
265 Cambodia (KHM)
274 Cameroon (CMR)
18975 Canada (CAN)
89 Central African Republic (CAF)
117 Chad (TCD)
815 Chile (CHL)
6384 China (CHN)
1377 Colombia (COL)
205 Costa Rica (CRI)
1880 Croatia (HRV)
130 Cuba (CUB)
89 Cyprus (CYP)
3312 Czech Republic (CZE)
304 Democratic Republic of the Congo (COD)
1630 Denmark (DNK)
30 Djibouti (DJI)
147 Dominican Republic (DOM)
67 East Timor (TLS)
264 Ecuador (ECU)
656 Egypt (EGY)
182 El Salvador (SLV)
19 Equatorial Guinea (GNQ)
77 Eritrea (ERI)
2316 Estonia (EST)
1109 Ethiopia (ETH)
117 Falkland Islands (FLK)
138 Fiji (FJI)
1418 Finland (FIN)
7135 France (FRA)
42 French Guiana (GUF)
14 French Southern and Antarctic Lands (ATF)
91 Gabon (GAB)
127 Gambia (GMB)
344 Georgia (GEO)
11057 Germany (DEU)
371 Ghana (GHA)
1395 Greece (GRC)
229 Greenland (GRL)
453 Guatemala (GTM)
55 Guinea Bissau (GNB)
337 Guinea (GIN)
132 Guyana (GUY)
100 Haiti (HTI)
337 Honduras (HND)
1923 Hungary (HUN)
442 Iceland (ISL)
10157 India (IND)
839 Indonesia (IDN)
718 Iran (IRN)
420 Iraq (IRQ)
2165 Ireland (IRL)
1156 Israel (ISR)
4546 Italy (ITA)
511 Ivory Coast (CIV)
288 Jamaica (JAM)
13119 Japan (JPN)
162 Jordan (JOR)
255 Kazakhstan (KAZ)
486 Kenya (KEN)
192 Kosovo (XKX)
67 Kuwait (KWT)
102 Kyrgyzstan (KGZ)
82 Laos (LAO)
709 Latvia (LVA)
211 Lebanon (LBN)
54 Lesotho (LSO)
129 Liberia (LBR)
182 Libya (LBY)
728 Lithuania (LTU)
558 Luxembourg (LUX)
413 Macedonia (MKD)
1239 Madagascar (MDG)
101 Malawi (MWI)
1746 Malaysia (MYS)
312 Mali (MLI)
110 Malta (MLT)
189 Mauritania (MRT)
2757 Mexico (MEX)
755 Moldova (MDA)
462 Mongolia (MNG)
154 Montenegro (MNE)
446 Morocco (MAR)
271 Mozambique (MOZ)
386 Myanmar (MMR)
272 Namibia (NAM)
805 Nepal (NPL)
3499 Netherlands (NLD)
36 New Caledonia (NCL)
4285 New Zealand (NZL)
197 Nicaragua (NIC)
594 Nigeria (NGA)
342 Niger (NER)
26763 nil
283 North Korea (PRK)
4083 Norway (NOR)
116 Oman (OMN)
2127 Pakistan (PAK)
145 Panama (PAN)
225 Papua New Guinea (PNG)
125 Paraguay (PRY)
871 Peru (PER)
1693 Philippines (PHL)
46316 Poland (POL)
1523 Portugal (PRT)
696 Puerto Rico (PRI)
79 Qatar (QAT)
3701 Republic of Serbia (SRB)
68 Republic of the Congo (COG)
2803 Romania (ROU)
4995 Russia (RUS)
76 Rwanda (RWA)
268 Saudi Arabia (SAU)
159 Senegal (SEN)
69 Sierra Leone (SLE)
2510 Slovakia (SVK)
555 Slovenia (SVN)
52 Solomon Islands (SLB)
91 Somalia (SOM)
1233 South Africa (ZAF)
1266 South Korea (KOR)
98 South Sudan (SDS)
8794 Spain (ESP)
1787 Sri Lanka (LKA)
152 Sudan (SDN)
129 Suriname (SUR)
70 Swaziland (SWZ)
2612 Sweden (SWE)
2583 Switzerland (CHE)
330 Syria (SYR)
611 Taiwan (TWN)
125 Tajikistan (TJK)
807 Thailand (THA)
37 The Bahamas (BHS)
67 Togo (TGO)
98 Trinidad and Tobago (TTO)
290 Tunisia (TUN)
3565 Turkey (TUR)
95 Turkmenistan (TKM)
613 Uganda (UGA)
1685 Ukraine (UKR)
361 United Arab Emirates (ARE)
45917 United Kingdom (GBR)
441 United Republic of Tanzania (TZA)
122230 United States of America (USA)
60 Uruguay (URY)
187 Uzbekistan (UZB)
12 Vanuatu (VUT)
361 Venezuela (VEN)
982 Vietnam (VNM)
554 West Bank (PSE)
8 Western Sahara (ESH)
860 Yemen (YEM)
226 Zambia (ZMB)
276 Zimbabwe (ZWE)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment