Skip to content

Instantly share code, notes, and snippets.

@brendano
Created June 15, 2012 20:23
Show Gist options
  • Select an option

  • Save brendano/2938530 to your computer and use it in GitHub Desktop.

Select an option

Save brendano/2938530 to your computer and use it in GitHub Desktop.
#!/usr/bin/env python
"""
Convert STDIN to UTF-8
based on character encoding detection
"""
import sys, json, itertools
from chardet.universaldetector import UniversalDetector
detector = UniversalDetector()
lines = []
for line in sys.stdin:
lines.append(line)
detector.feed(line)
if detector.done: break
detector.close()
print>>sys.stderr, detector.result
encoding = detector.result['encoding']
for line in itertools.chain(lines, sys.stdin):
converted = line.decode(encoding, 'replace').encode('utf8')
sys.stdout.write(converted)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment