Skip to content

Instantly share code, notes, and snippets.

@g-k
Created April 18, 2018 19:53
Show Gist options
  • Save g-k/2d6a57b8153fcd8196c6adc2a8ef4e93 to your computer and use it in GitHub Desktop.
Save g-k/2d6a57b8153fcd8196c6adc2a8ef4e93 to your computer and use it in GitHub Desktop.
Alphanumeric ASCII homoglyphs / confusing character sets

ascii subsets of visually similar characters

Nice answers for unicode in: https://security.stackexchange.com/questions/128286/list-of-visually-similar-characters-for-detecting-spoofing-and-social-engineeri

filtered from the unicode list on codebox/homoglyph
» # char_codes.txt is https://github.com/codebox/homoglyph/blob/3f61c31f4bb01c5312a7763dc6e226729417c8e7/raw_data/char_codes.txt
» python ~/to_ascii_char_codes.py < ~/Downloads/char_codes.txt 
['0', 'O', 'o']
['1', 'I', 'l']
:hwine suggested looking at the pwgen -B filter for ambiguous chars too:
const char *pw_ambiguous = "B8G6I1l0OQDS5Z2";

https://sourceforge.net/p/pwgen/code/ci/c25787fce93b2c99efd87219585c0531e80b6d1c/tree/pw_rand.c#l20

import sys
import string
def main():
alnum = frozenset(string.ascii_letters + string.digits)
for line in sys.stdin.readlines():
if line.startswith('#'):
continue
# filter for ascii code points
code_points = [int(code_point, 16) for code_point in line.split(",") if int(code_point, 16) < 0x7f]
if len(code_points) < 2:
continue
# filter for alphanumeric chars
alnum_chars = [chr(c) for c in code_points if chr(c) in alnum]
if not alnum_chars:
continue
# print sets of confusing chars
print(alnum_chars)
# file is sorted low to high and we're out of ascii range
if code_points[0] > 0x7f:
break
main()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment