Skip to content

Instantly share code, notes, and snippets.

@synkarius
Last active November 6, 2015 01:03
Show Gist options
  • Save synkarius/7712c82cf2c6942764b7 to your computer and use it in GitHub Desktop.
Save synkarius/7712c82cf2c6942764b7 to your computer and use it in GitHub Desktop.
symbol matching algorithm comparison
ACTUAL COMMON ABBREVIATED SYMBOLS TEST
========================================
ALGO RANKS TARGET AS:
========================================
caster 1st (tied 2-way)
difflib/Levenshtein 1st
sift4 3rd (tied 4-way)
========================================
('caster', 'isctsh', 6)
('difflib: ', 'isctsh', 0.4)
('levenshtein: ', 'isctsh', 0.4)
('sift4: ', 'isctsh', 22.0)
('caster', 'isissr', 6)
('difflib: ', 'isissr', 0.26666666666666666)
('levenshtein: ', 'isissr', 0.3333333333333333)
('sift4: ', 'isissr', 22.0)
('caster', 'isbuf', 4)
('difflib: ', 'isbuf', 0.27586206896551724)
('levenshtein: ', 'isbuf', 0.27586206896551724)
('sift4: ', 'isbuf', 21.0)
('caster', 'islist', 4)
('difflib: ', 'islist', 0.26666666666666666)
('levenshtein: ', 'islist', 0.26666666666666666)
('sift4: ', 'islist', 22.0)
('caster', 'ismatd', 4)
('difflib: ', 'ismatd', 0.26666666666666666)
('levenshtein: ', 'ismatd', 0.26666666666666666)
('sift4: ', 'ismatd', 22.0)
('caster', 'issue', 5)
('difflib: ', 'issue', 0.3448275862068966)
('levenshtein: ', 'issue', 0.3448275862068966)
('sift4: ', 'issue', 19.0)
('caster', 'issue certificate shares', 24)
('difflib: ', 'issue certificate shares', 1.0)
('levenshtein: ', 'issue certificate shares', 1.0)
('sift4: ', 'issue certificate shares', 0.0)
RANDOM GARBAGE TEST
========================================
ALGO RANKS TARGET AS:
========================================
caster 2nd
difflib/Levenshtein 1st (tied 2-way)
sift4 3rd (tied 2-way)
========================================
('caster', 'isctsh', 6)
('difflib: ', 'isctsh', 0.4)
('levenshtein: ', 'isctsh', 0.4)
('sift4: ', 'isctsh', 22.0)
('caster', 'is_cat_shit', 7)
('difflib: ', 'is_cat_shit', 0.4)
('levenshtein: ', 'is_cat_shit', 0.4)
('sift4: ', 'is_cat_shit', 22.0)
('caster', 'ctqxr1', 3)
('difflib: ', 'ctqxr1', 0.2)
('levenshtein: ', 'ctqxr1', 0.2)
('sift4: ', 'ctqxr1', 24.0)
('caster', '321rsa', 3)
('difflib: ', '321rsa', 0.2)
('levenshtein: ', '321rsa', 0.2)
('sift4: ', '321rsa', 24.0)
('caster', 'i', 1)
('difflib: ', 'i', 0.08)
('levenshtein: ', 'i', 0.08)
('sift4: ', 'i', 23.0)
('caster', 'sh', 2)
('difflib: ', 'sh', 0.15384615384615385)
('levenshtein: ', 'sh', 0.15384615384615385)
('sift4: ', 'sh', 24.0)
('caster', 'issue certificate shares', 24) # caster-- higher better, no limit
('difflib: ', 'issue certificate shares', 1.0) # difflib - /1
('levenshtein: ', 'issue certificate shares', 1.0) # leven - /1
('sift4: ', 'issue certificate shares', 0.0) #sift4 - closer to 0 better
REALISTIC TEST (MIXED ABBREV AND NOT)
========================================
ALGO RANKS TARGET AS:
========================================
caster 4th (tied 3-way)
difflib/Levenshtein 4th (tied 3-way)
sift4 4th (tied 2-way)
========================================
('caster', 'isctsh', 6)
('difflib: ', 'isctsh', 0.4)
('levenshtein: ', 'isctsh', 0.4)
('sift4: ', 'isctsh', 22.0)
('caster', 'Issue', 4)
('difflib: ', 'Issue', 0.27586206896551724)
('levenshtein: ', 'Issue', 0.27586206896551724)
('sift4: ', 'Issue', 20.0)
('caster', 'issue_list', 7)
('difflib: ', 'issue_list', 0.4117647058823529)
('levenshtein: ', 'issue_list', 0.4117647058823529)
('sift4: ', 'issue_list', 18.0)
('caster', 'issues', 6)
('difflib: ', 'issues', 0.4)
('levenshtein: ', 'issues', 0.4)
('sift4: ', 'issues', 19.0)
('caster', 'isbksh', 4)
('difflib: ', 'isbksh', 0.26666666666666666)
('levenshtein: ', 'isbksh', 0.26666666666666666)
('sift4: ', 'isbksh', 22.0)
('caster', 'cert', 4)
('difflib: ', 'cert', 0.2857142857142857)
('levenshtein: ', 'cert', 0.2857142857142857)
('sift4: ', 'cert', 24.0)
('caster', 'ctshrs', 6)
('difflib: ', 'ctshrs', 0.4)
('levenshtein: ', 'ctshrs', 0.4)
('sift4: ', 'ctshrs', 23.0)
('caster', 'certificate', 11)
('difflib: ', 'certificate', 0.6285714285714286)
('levenshtein: ', 'certificate', 0.6285714285714286)
('sift4: ', 'certificate', 23.0)
('caster', 'Certificate', 10)
('difflib: ', 'Certificate', 0.5714285714285714)
('levenshtein: ', 'Certificate', 0.5714285714285714)
('sift4: ', 'Certificate', 23.0)
# perfect match for reference
======================================================
('caster', 'issue certificate shares', 24)
('difflib: ', 'issue certificate shares', 1.0)
('levenshtein: ', 'issue certificate shares', 1.0)
('sift4: ', 'issue certificate shares', 0.0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment