Last active
June 19, 2021 05:54
-
-
Save omsobliga/c9266f5e88be5c461c24 to your computer and use it in GitHub Desktop.
Python 在什么情况下会输出 Unicode 字符串
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
# -*- coding: utf-8 -*- | |
""" 测试 Python 在什么情况下会输出 Unicode 字符串 | |
需要首先理解在 Python 中 Unicode 类型和 Unicode 字符串指的不是同一个东西。 | |
Unicode 字符串是 str 类型,但它的值的表现形式是 Unicode 编码形式。 | |
""" | |
def printt(str): | |
""" 既输出 str 的值,同时打印 str 的类型 | |
""" | |
print type(str), str | |
# 分别对应 str 和 unicode 类型 | |
print type('你好') | |
print type(u'你好') | |
# string_escape 和 unicode_escape 是 Python 特有的编码形式 | |
# 参考:http://www.qmailer.net/archives/251.html | |
# https://docs.python.org/2/library/codecs.html#python-specific-encodings | |
# > string_escape: Produce a string that is suitable as string literal in Python source code | |
# > unicode_escape: Produce a string that is suitable as Unicode literal in Python source code | |
# 编码之后,都转换为 str 类型 | |
printt('你好'.encode('string_escape')) | |
printt(u'你好'.encode('unicode_escape')) | |
# 解码后,转换为原类型 | |
printt('你好'.encode('string_escape').decode('string_escape')) | |
printt(u'你好'.encode('unicode_escape').decode('unicode_escape')) | |
# dict 和 list 在 __repr__ 中默认会进行 `.encode('unicode_escape')` 转化为 str 类型 | |
# 需要同样的方式进行解码,`.decode('unicode_escape')` 后会被再转换为 unicode 类型 | |
d = dict({'a': u'你好'}) | |
printt(repr(d)) | |
printt(repr(d).decode('unicode_escape')) | |
l = list([u'你好']) | |
printt(repr(l)) | |
printt(repr(l).decode('unicode_escape')) | |
# 解码后,转换为原类型 | |
printt(repr(d).decode('unicode_escape').encode('unicode_escape')) | |
printt(repr(l).decode('unicode_escape').encode('unicode_escape')) | |
# 输出到文件 | |
f = open('a', 'w') | |
f.write('你好'.encode('string_escape')) | |
f.write(u'你好'.encode('unicode_escape')) | |
f.write(u'你好'.encode('utf8')) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
<type 'str'>
<type 'unicode'>
<type 'str'> \xe4\xbd\xa0\xe5\xa5\xbd
<type 'str'> \u4f60\u597d
<type 'str'> 你好
<type 'unicode'> Traceback (most recent call last):
File "c:\Users\chenzhendong01\Desktop\Code\pytest.py", line 30, in
printt(u'你好'.encode('unicode_escape').decode('unicode_escape'))
File "c:\Users\chenzhendong01\Desktop\Code\pytest.py", line 12, in printt
print type(str), str
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)
这个是报错信息,请问这种情况是我的环境有问题吗?