I have a text file with a list of quotes (example excerpt below). Some of the lines are header lines for the section of quotes following, which begin with 3 or more hyphens. I want an awk
script that selects a random non-header line, determines its header (the nearest line above it that starts with 3 or more hyphens), and then outputs the header and the line.
---- Characteristics of Innovative People ----
Iterate between abstract and concrete thinking.
Acknowledge the limits of your knowledge.
Step out of your comfort zone.
---- ironies of automation ----
The designer who tries to eliminate the operator still leaves the operator to do the tasks which the designer cannot think how to automate.
The experienced operator makes the minimum number of actions, and the process output moves smoothly and quickly to the new level, while with an inexperienced operator it oscillates round the target value.
When manual take-over is needed there is likely to be something wrong with the process, so that unusual actions will be needed to control it, and one can argue that the operator needs to be more rather than less skilled, and less rather than more loaded, than average.
Efficient retrieval of knowledge from long-term memory depends on frequency of use.
Knowledge develops only through use and feedback about its effectiveness.
You can use the following awk
script to achieve the desired functionality:
awk -v seed=$RANDOM 'BEGIN{srand(seed)} /^-+/{header=$0; next} !/^$/{lines[header][length(lines[header])++]=$0} END{randHeaderIndex=int(rand()*length(lines)); randHeader=(randHeaderIndex in lines); randLineIndex=int(rand()*length(lines[randHeader])); print randHeader; print lines[randHeader][randLineIndex]}' quotes.txt
Here's a breakdown of the script:
- -v seed=$RANDOM sets the seed for the random number generator.
- BEGIN{srand(seed)} seeds the random number generator.
- /^[-]{3,}/{header=$0; next} matches lines with three or more hyphens at the beginning and saves them as headers. The next command skips processing the rest of the script for these lines.
- !/^$/{lines[header][length(lines[header])++]=$0} processes non-empty lines and stores them in a two-dimensional array, associating them with their headers.
- In the END block, the script selects a random header and a random line associated with the chosen header, then prints them.
To run the script, save the example excerpt in a file named quotes.txt
, and then run the command in your terminal. The output will be a random header and a non-header line associated with that header.
This might work, but the problem with this approach is that the sections with few quotes attributed to them will get disproportionately picked vs. the sections with many quotes attributed to them.
I understand your concern. To address it, you can modify the script as follows:
awk -v seed=$RANDOM 'BEGIN{srand(seed)} /^-+/{header=$0; next} !/^$/{lines[count++]=$0; headers[count-1]=header} END{randIndex=int(rand()*count); print headers[randIndex]; print lines[randIndex]}' quotes.txt
This updated script works as follows:
- -v seed=$RANDOM sets the seed for the random number generator.
- BEGIN{srand(seed)} seeds the random number generator.
- /^[-]{3,}/{header=$0; next} matches lines with three or more hyphens at the beginning and saves them as headers. The next command skips processing the rest of the script for these lines.
- !/^$/{lines[count++]=$0; headers[count-1]=header} processes non-empty lines and stores them in the lines array. The associated headers are stored in the headers array at the corresponding index.
- In the END block, the script selects a random index and prints the associated header and line.
This updated script ensures that each quote has an equal chance of being picked, regardless of the number of quotes in each section. To run the script, save the example excerpt in a file named quotes.txt
, and then run the command in your terminal. The output will be a random header and a non-header line associated with that header.