Last active
October 4, 2019 15:46
-
-
Save sivel/86e1cc5bdd7327ffee9f0f95d4c11dbd to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# -*- coding: utf-8 -*- | |
# Copyright (c) 2019 Ansible Project | |
# GNU General Public License v3.0+ (see COPYING or https://www.gnu.org/licenses/gpl-3.0.txt) | |
# Make coding more python3-ish | |
from __future__ import (absolute_import, division, print_function) | |
__metaclass__ = type | |
import ctypes.util | |
import locale | |
from six import text_type | |
libc_path = ctypes.util.find_library('c') | |
libc = ctypes.cdll.LoadLibrary(libc_path) | |
libc.wcwidth.argtypes = (ctypes.c_wchar,) | |
libc.wcwidth.restype = ctypes.c_int | |
libc.wcswidth.argtypes = (ctypes.c_wchar_p, ctypes.c_int) | |
libc.wcswidth.restype = ctypes.c_int | |
locale.setlocale(locale.LC_ALL, '') | |
def width(u_text): | |
"""This function is slower than just using libc directly. | |
I recommend not using this and just using ``libc.wcswidth`` | |
on a full string | |
A helper may still be useful, to do the isinstance check still | |
""" | |
if not isinstance(u_text, text_type): | |
raise ValueError('Value must be text type') | |
length = 0 | |
for c in u_text: | |
width = libc.wcwidth(c) | |
if width < 0: | |
raise ValueError('Something bad happened') | |
length += width | |
return length | |
if __name__ == '__main__': | |
print(libc.wcswidth(u'コンニチハ', 1024)) | |
l = width(u'コンニチハ') | |
print(u'コンニチハ') | |
print(l * '-') |
This gets it working on Python2:
--- print_wcwidth.py.orig 2019-10-03 15:56:34.498229625 -0700
+++ print_wcwidth.py 2019-10-03 15:57:17.929191395 -0700
@@ -9,6 +9,9 @@
from six import text_type
import ctypes.util
+import locale
+
+locale.setlocale(locale.LC_ALL, ('en_US', 'UTF-8'))
libc_path = ctypes.util.find_library('c')
libc = ctypes.cdll.LoadLibrary(libc_path)
libc.wcwidth.argtypes = (ctypes.c_wchar,)
Unfortunately we can't just go setting locale in our code. But perhaps it gives us a hint as to how we can fix it.
Okay, I think this works and we can use it:
`locale.setlocale(locale.LC_ALL, '')
Perf comparison between the custom width
function, and just using libc.wcswidth
directly. wcswidth
is much faster, which is to be expected.
ansibledev ▶ In [2]: %timeit print_width.width(u'コンニチハ')
3.91 µs ± 184 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
ansibledev ▶ In [3]: %timeit print_width.libc.wcswidth(u'コンニチハ', 1024)
963 ns ± 95.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
From the wcswidth man page: "If a nonprintable wide character occurs among these characters, -1 is returned." That might be part of the difference with kitchen.text.display.textual_width. Maybe we should have a width()
function as a front end but it first tries to run wcswidth()
on the string, then, if -1 is returned, have some slower code that steps through each character to determine what the width is.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Wild ass guess is that the reason this doesn't work on python2 is that python isn't translating the unicode string into wchar_t correctly. Maybe comparing ctypes code for python-3.0 vs python-2.7 will show if that's correct.