Created
April 12, 2022 15:22
-
-
Save Steboss89/f9d78469c8f18ccc32efbbd5aa620052 to your computer and use it in GitHub Desktop.
Second approach use regex
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # this is a test string with numbers to be found | |
| test_text = "this is a string with one number and then twenty thousand numbers and three thousand thirty four and three thousand five hundred forty five numbers" | |
| # firstly we could think of a simple regex to match numbers | |
| regex = r"\b(three thousand five hundred forty five|three thousand thirty four|twenty thousand|three thousand|forty five|thirty four|twenty|five|four|three|two|one)\b" | |
| re.findall(regex, test_text) | |
| # the result is not we were expecting | |
| # recalibrate the order from "rare" numbers to "frequent" ones | |
| regex = r"\b(three thousand five hundred forty five|three thousand thirty four|twenty thousand|three thousand|forty five|thirty four|twenty|five|four|three|two|one)\b" | |
| re.findall(regex, test_text) | |
| # better result, we got what expected, can we do even better? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment