Skip to content

Instantly share code, notes, and snippets.

@t0mst0ne
Created September 7, 2014 13:56
Show Gist options
  • Save t0mst0ne/e09f3b6f079ee7cb06c4 to your computer and use it in GitHub Desktop.
Save t0mst0ne/e09f3b6f079ee7cb06c4 to your computer and use it in GitHub Desktop.
Parse Poll station address from 13th TW president election Poll station data
#!/usr/bin/env python
#coding:UTF-8
import pandas as pd
HUA = pd.io.parsers.read_csv('https://raw.githubusercontent.com/g0v/cec/master/2014/booth/HUA.csv')
pre = pd.io.parsers.read_csv('https://raw.githubusercontent.com/g0v/cec/master/13th.csv',sep='\t')
pre['together'] = pre['縣市碼']+pre['投票所名稱']
mydict = pre.set_index('together')['地址'].to_dict()
HUA['Address2'] = '花蓮縣' + HUA['投開票所設置處所']
HUA.replace({"Address2": mydict})
@t0mst0ne
Copy link
Author

t0mst0ne commented Sep 7, 2014

Output like:

[ Unsuccessful parse / didn't match ]
投開票所編號 投開票所設置處所 所屬村里或鄰別 地址 Address2
0 1投開票所 環保局環境永續教育㆗心集賢館 民德里各鄰 花蓮縣花蓮市中美路68 號 花蓮縣環保局環境永續教育㆗心集賢館

[ Successful match ]
3 4投開票所 東華大學附設小學桌球教室 民勤里10鄰、12-17鄰、20鄰、21、24-26鄰、30鄰、31鄰 花蓮縣花蓮市永安街100號 花蓮縣花蓮市永安街100號

problem: some strange Chinese characters in HUA.csv made the match rate decreased

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment