Last active
November 9, 2017 03:56
-
-
Save numpde/0c0d9469fa0ffaae05f4f1b2fdec2dfb to your computer and use it in GitHub Desktop.
Segment extraction and expansion by words
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/python3 | |
# CC-BY-4.0 | |
import sys, argparse | |
from random import shuffle, randrange, choice | |
# Collate lines, separating them by a space | |
s = ' '.join(sys.stdin.readlines()) | |
# Remove non-text | |
s = "".join(c for c in s if (c.isalnum() or (c == ' '))) | |
# Split into a list of words | |
s = s.split() | |
parser = argparse.ArgumentParser(description='Generate a segment of certain word length from standard input.') | |
parser.add_argument('--length', type=int, help='contiguous segment length in words (all by default)') | |
parser.add_argument('--resample', type=int, help='resample from segment to get a text of this length') | |
parser.add_argument('--shuffle', action="store_true", help='shuffle output') | |
args = parser.parse_args() | |
if (args.length) : | |
L = args.length | |
assert (L <= len(s)), "Not enough words to generate segment of desired length." | |
i = randrange(0, len(s)-L+1) | |
s = s[i:(i+L)] | |
if (args.resample) : | |
s = [choice(s) for _ in range(args.resample)] | |
if (args.shuffle) : | |
shuffle(s) | |
# Collate the list of words, separating by a space | |
s = ' '.join(s) | |
print(s) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Help
python3 rndseg.py -h
Extract words
echo "A, B? C D." | python3 rndseg.py
A B C D
Extract and shuffle words
echo "A, B? C D." | python3 rndseg.py --shuffle
D B C A
Extract a random contiguous segment of 3 words
echo "A, B? C D." | python3 rndseg.py --length=3
B C D
Extract a random contiguous segment of 3 words; shuffle words
echo "A, B? C D." | python3 rndseg.py --length=3 --shuffle
B C A
Extract a random contiguous segment of 3 words; shuffle words
echo "A, B? C D." | python3 rndseg.py --length=3 | python3 rndseg.py --shuffle
B C A
Extract a random contiguous segment of 3 words; generate a text of 10 words by sampling words from the segment
echo "A, B? C D." | python3 rndseg.py --length=3 --resample=10
B B B B B A C A C C