Last active
December 16, 2015 17:58
-
-
Save ivanyu/5473838 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
class UnicodeWriter: | |
"""A CSV writer which will write rows to CSV file "f", | |
which is encoded in the given encoding. | |
The standard `csv` module isn't able to handle Unicode. We can "cheat" it. | |
Firstly, we encode it into plain UTF-8 byte string and write into the | |
memory buffer (`StringIO`). Then we convert created CSV data back into | |
Unicode and write to the target file. | |
""" | |
def __init__(self, f, dialect=csv.excel, encoding="utf-8"): | |
self.buffer = StringIO() | |
self.writer = csv.writer(self.buffer, dialect=dialect) | |
self.target_stream = f | |
def writerow(self, row): | |
# Row elements may contain raw Unicode codepoints. | |
# We must encode them into UTF-8 (unicode string -> plain byte string). | |
encoded_row = [s.encode("utf-8") for s in row] | |
# Write encoded row with the standard CSV writer. | |
self.writer.writerow(encoded_row) | |
# Valid CSV row is now in the memory. Get it ... | |
data = self.buffer.getvalue() | |
# and convert back into Unicode. | |
data = data.decode("utf-8") | |
# Now we can easily write valid CSV row in Unicode | |
# into the target file. | |
self.target_stream.write(data) | |
# Empty the buffer. | |
self.buffer.truncate(0) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment