Last active
March 23, 2020 19:34
-
-
Save halfak/656d4370b4583c2bd2bbb6836c4008b2 to your computer and use it in GitHub Desktop.
Extract count of idioms for Alan Turing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import time | |
| import mwapi | |
| from revscoring.dependencies import solve | |
| from revscoring.languages import english | |
| from articlequality.feature_lists import enwiki | |
| session = mwapi.Session("https://en.wikipedia.org") | |
| doc = session.get(action='query', prop='revisions', rvprop='content', titles='Alan Turing', formatversion=2) | |
| text = doc['query']['pages'][0]['revisions'][0]['content'] | |
| start = time.time() | |
| print(english.idioms.revision.matches, solve(english.idioms.revision.matches, cache={'datasource.revision.text': text})) | |
| print("Extracting idioms took {0} seconds".format(time.time() - start)) | |
| start = time.time() | |
| print("Features", list(solve(enwiki.wp10, cache={'datasource.revision.text': text}))) | |
| print("Extracting features took {0} seconds".format(time.time() - start)) | |
| features_wo_idioms = [f for f in enwiki.wp10 if not "idiom" in str(f)] | |
| idiom_features = [f for f in enwiki.wp10 if "idiom" in str(f)] | |
| start = time.time() | |
| print("Features w/o idioms", list(solve(features_wo_idioms, cache={'datasource.revision.text': text}))) | |
| print("Extracting features w/o idioms took {0} seconds".format(time.time() - start)) | |
| start = time.time() | |
| print("Idiom features", list(solve(idiom_features, cache={'datasource.revision.text': text}))) | |
| print("Extracting idiom features took {0} seconds".format(time.time() - start)) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| $ python demo_idioms_performance.py | |
| feature.len(<datasource.english.idioms.revision.matches>) 7.0 | |
| Extracting idioms took 0.7575492858886719 seconds | |
| Features [130094.0, 50640.0, 209.0, 0.004127172195892575, 463.0, 0.009142969984202212, 235.0, 0.004640600315955766, 7.0, 0.0001382306477093207, 20.0, 0.00039494470774091627, 12.0, 0.00023696682464454977, 54.0, 0.0010663507109004739, 209.0, 0.004127172195892575, 206.0, 0.004067930489731438, 0.9856459330143541, 3.0, 5.924170616113744e-05, 84.0, 0.0016587677725118483, 1.0, 2.0, 1.9747235387045812e-05, 2.0, 1.9747235387045812e-05, 3.0, 5.924170616113744e-05, 1.317654028436019, 7.045776576879511, 124.0, 0.00849780701754386, 7.0, 0.00047971491228070173, 131.0, 0.008977521929824562] | |
| Extracting features took 1.490027904510498 seconds | |
| Features w/o idioms [130094.0, 50640.0, 209.0, 0.004127172195892575, 463.0, 0.009142969984202212, 235.0, 0.004640600315955766, 7.0, 0.0001382306477093207, 20.0, 0.00039494470774091627, 12.0, 0.00023696682464454977, 54.0, 0.0010663507109004739, 209.0, 0.004127172195892575, 206.0, 0.004067930489731438, 0.9856459330143541, 3.0, 5.924170616113744e-05, 84.0, 0.0016587677725118483, 1.0, 2.0, 1.9747235387045812e-05, 2.0, 1.9747235387045812e-05, 3.0, 5.924170616113744e-05, 1.317654028436019, 7.045776576879511, 124.0, 0.00849780701754386] | |
| Extracting features w/o idioms took 0.6989080905914307 seconds | |
| Idiom features [7.0, 0.00047971491228070173, 131.0, 0.008977521929824562] | |
| Extracting idiom features took 0.9881906509399414 seconds |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment