Skip to content

Instantly share code, notes, and snippets.

@mizchi
Created November 4, 2010 13:55
Show Gist options
  • Select an option

  • Save mizchi/662488 to your computer and use it in GitHub Desktop.

Select an option

Save mizchi/662488 to your computer and use it in GitHub Desktop.
get_utf8.py
#!/opt/local/bin/python
# -*- encoding:utf-8 -*-
"""
encodingを判別してutf-8で返すだけ
要 pykf
"""
import pykf
import urllib2
def get_utf8(text):
c = pykf.guess(text)
if c is pykf.EUC:
try:
return unicode(text, 'euc-jp').encode('utf8')
except:
return text
elif c in (pykf.SJIS, pykf.JIS):
try:
return unicode(text, 'sjis').encode('utf8')
except:
return text
return text
url = "http://b.hatena.ne.jp/"
txt = urllib2.urlopen(url).read()
print pykf.guess(txt)
txtutf = get_utf8(txt)
print pykf.guess(txtutf)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment