Skip to content

Instantly share code, notes, and snippets.

@akx
Created July 11, 2016 12:41
Show Gist options
  • Select an option

  • Save akx/9cd2145eecf4c7bf48dc78491ef21562 to your computer and use it in GitHub Desktop.

Select an option

Save akx/9cd2145eecf4c7bf48dc78491ef21562 to your computer and use it in GitHub Desktop.
parse_migr_resvas.py
# download: http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?file=data/migr_resvas.tsv.gz
import gzip
from collections import defaultdict
geo_to_citizen = defaultdict(dict)
for line in gzip.GzipFile("migr_resvas.tsv.gz"):
line = line.decode("utf8")
if not line.startswith("T,TOTAL"): # Ignore non-total lines
continue
bits = line.split("\t")
sex, age, citizen, unit, geo = bits.pop(0).split(",") # Pop and split the key
if citizen == "TOTAL": # Ignore the citizen-total lines
continue
valid_numbers = [int(n.strip()) for n in bits if n.strip() != ":"] # ignore "data missing"; will be in newest-to-oldest order
geo_to_citizen[geo][citizen] = valid_numbers[0]
for geo, citizen_data in sorted(geo_to_citizen.items()):
print (geo, max(citizen_data.items(), key=lambda p: p[1]))
@akx
Copy link
Author

akx commented Jul 11, 2016

AT ('TR', 99312)
BE ('AFR', 152933)
BG ('RU', 12106)
CH ('XK', 105143)
CZ ('UA', 110712)
EE ('RU', 94207)
EL ('AL', 352101)
ES ('MA', 774890)
FR ('DZ', 562550)
HR ('BA', 4575)
HU ('CN', 11747)
IE ('BR', 15124)
IS ('PH', 496)
IT ('MA', 527376)
LI ('CH', 3581)
LT ('RU', 14072)
LU ('CV', 2662)
LV ('RNC', 266914)
NO ('SO', 9142)
PL ('UA', 39768)
PT ('BR', 87478)
RO ('MD', 9828)
SE ('SO', 48036)
SI ('BA', 49072)
SK ('UA', 5515)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment