Created
September 7, 2014 13:56
-
-
Save t0mst0ne/e09f3b6f079ee7cb06c4 to your computer and use it in GitHub Desktop.
Parse Poll station address from 13th TW president election Poll station data
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
#coding:UTF-8 | |
import pandas as pd | |
HUA = pd.io.parsers.read_csv('https://raw.githubusercontent.com/g0v/cec/master/2014/booth/HUA.csv') | |
pre = pd.io.parsers.read_csv('https://raw.githubusercontent.com/g0v/cec/master/13th.csv',sep='\t') | |
pre['together'] = pre['縣市碼']+pre['投票所名稱'] | |
mydict = pre.set_index('together')['地址'].to_dict() | |
HUA['Address2'] = '花蓮縣' + HUA['投開票所設置處所'] | |
HUA.replace({"Address2": mydict}) | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Output like:
[ Unsuccessful parse / didn't match ]
投開票所編號 投開票所設置處所 所屬村里或鄰別 地址 Address2
0 1投開票所 環保局環境永續教育㆗心集賢館 民德里各鄰 花蓮縣花蓮市中美路68 號 花蓮縣環保局環境永續教育㆗心集賢館
[ Successful match ]
3 4投開票所 東華大學附設小學桌球教室 民勤里10鄰、12-17鄰、20鄰、21、24-26鄰、30鄰、31鄰 花蓮縣花蓮市永安街100號 花蓮縣花蓮市永安街100號
problem: some strange Chinese characters in HUA.csv made the match rate decreased