Last active
November 6, 2015 01:03
-
-
Save synkarius/7712c82cf2c6942764b7 to your computer and use it in GitHub Desktop.
symbol matching algorithm comparison
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ACTUAL COMMON ABBREVIATED SYMBOLS TEST | |
======================================== | |
ALGO RANKS TARGET AS: | |
======================================== | |
caster 1st (tied 2-way) | |
difflib/Levenshtein 1st | |
sift4 3rd (tied 4-way) | |
======================================== | |
('caster', 'isctsh', 6) | |
('difflib: ', 'isctsh', 0.4) | |
('levenshtein: ', 'isctsh', 0.4) | |
('sift4: ', 'isctsh', 22.0) | |
('caster', 'isissr', 6) | |
('difflib: ', 'isissr', 0.26666666666666666) | |
('levenshtein: ', 'isissr', 0.3333333333333333) | |
('sift4: ', 'isissr', 22.0) | |
('caster', 'isbuf', 4) | |
('difflib: ', 'isbuf', 0.27586206896551724) | |
('levenshtein: ', 'isbuf', 0.27586206896551724) | |
('sift4: ', 'isbuf', 21.0) | |
('caster', 'islist', 4) | |
('difflib: ', 'islist', 0.26666666666666666) | |
('levenshtein: ', 'islist', 0.26666666666666666) | |
('sift4: ', 'islist', 22.0) | |
('caster', 'ismatd', 4) | |
('difflib: ', 'ismatd', 0.26666666666666666) | |
('levenshtein: ', 'ismatd', 0.26666666666666666) | |
('sift4: ', 'ismatd', 22.0) | |
('caster', 'issue', 5) | |
('difflib: ', 'issue', 0.3448275862068966) | |
('levenshtein: ', 'issue', 0.3448275862068966) | |
('sift4: ', 'issue', 19.0) | |
('caster', 'issue certificate shares', 24) | |
('difflib: ', 'issue certificate shares', 1.0) | |
('levenshtein: ', 'issue certificate shares', 1.0) | |
('sift4: ', 'issue certificate shares', 0.0) | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
RANDOM GARBAGE TEST | |
======================================== | |
ALGO RANKS TARGET AS: | |
======================================== | |
caster 2nd | |
difflib/Levenshtein 1st (tied 2-way) | |
sift4 3rd (tied 2-way) | |
======================================== | |
('caster', 'isctsh', 6) | |
('difflib: ', 'isctsh', 0.4) | |
('levenshtein: ', 'isctsh', 0.4) | |
('sift4: ', 'isctsh', 22.0) | |
('caster', 'is_cat_shit', 7) | |
('difflib: ', 'is_cat_shit', 0.4) | |
('levenshtein: ', 'is_cat_shit', 0.4) | |
('sift4: ', 'is_cat_shit', 22.0) | |
('caster', 'ctqxr1', 3) | |
('difflib: ', 'ctqxr1', 0.2) | |
('levenshtein: ', 'ctqxr1', 0.2) | |
('sift4: ', 'ctqxr1', 24.0) | |
('caster', '321rsa', 3) | |
('difflib: ', '321rsa', 0.2) | |
('levenshtein: ', '321rsa', 0.2) | |
('sift4: ', '321rsa', 24.0) | |
('caster', 'i', 1) | |
('difflib: ', 'i', 0.08) | |
('levenshtein: ', 'i', 0.08) | |
('sift4: ', 'i', 23.0) | |
('caster', 'sh', 2) | |
('difflib: ', 'sh', 0.15384615384615385) | |
('levenshtein: ', 'sh', 0.15384615384615385) | |
('sift4: ', 'sh', 24.0) | |
('caster', 'issue certificate shares', 24) # caster-- higher better, no limit | |
('difflib: ', 'issue certificate shares', 1.0) # difflib - /1 | |
('levenshtein: ', 'issue certificate shares', 1.0) # leven - /1 | |
('sift4: ', 'issue certificate shares', 0.0) #sift4 - closer to 0 better | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
REALISTIC TEST (MIXED ABBREV AND NOT) | |
======================================== | |
ALGO RANKS TARGET AS: | |
======================================== | |
caster 4th (tied 3-way) | |
difflib/Levenshtein 4th (tied 3-way) | |
sift4 4th (tied 2-way) | |
======================================== | |
('caster', 'isctsh', 6) | |
('difflib: ', 'isctsh', 0.4) | |
('levenshtein: ', 'isctsh', 0.4) | |
('sift4: ', 'isctsh', 22.0) | |
('caster', 'Issue', 4) | |
('difflib: ', 'Issue', 0.27586206896551724) | |
('levenshtein: ', 'Issue', 0.27586206896551724) | |
('sift4: ', 'Issue', 20.0) | |
('caster', 'issue_list', 7) | |
('difflib: ', 'issue_list', 0.4117647058823529) | |
('levenshtein: ', 'issue_list', 0.4117647058823529) | |
('sift4: ', 'issue_list', 18.0) | |
('caster', 'issues', 6) | |
('difflib: ', 'issues', 0.4) | |
('levenshtein: ', 'issues', 0.4) | |
('sift4: ', 'issues', 19.0) | |
('caster', 'isbksh', 4) | |
('difflib: ', 'isbksh', 0.26666666666666666) | |
('levenshtein: ', 'isbksh', 0.26666666666666666) | |
('sift4: ', 'isbksh', 22.0) | |
('caster', 'cert', 4) | |
('difflib: ', 'cert', 0.2857142857142857) | |
('levenshtein: ', 'cert', 0.2857142857142857) | |
('sift4: ', 'cert', 24.0) | |
('caster', 'ctshrs', 6) | |
('difflib: ', 'ctshrs', 0.4) | |
('levenshtein: ', 'ctshrs', 0.4) | |
('sift4: ', 'ctshrs', 23.0) | |
('caster', 'certificate', 11) | |
('difflib: ', 'certificate', 0.6285714285714286) | |
('levenshtein: ', 'certificate', 0.6285714285714286) | |
('sift4: ', 'certificate', 23.0) | |
('caster', 'Certificate', 10) | |
('difflib: ', 'Certificate', 0.5714285714285714) | |
('levenshtein: ', 'Certificate', 0.5714285714285714) | |
('sift4: ', 'Certificate', 23.0) | |
# perfect match for reference | |
====================================================== | |
('caster', 'issue certificate shares', 24) | |
('difflib: ', 'issue certificate shares', 1.0) | |
('levenshtein: ', 'issue certificate shares', 1.0) | |
('sift4: ', 'issue certificate shares', 0.0) | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment