NOTE: Windows users will use scoop install
instead of brew
. Even if the docs say use choco
, scoop
is a better package manager than chocolatey!!!
- https://miller.readthedocs.io/en/latest/, with some examples:
- https://github.com/BurntSushi/xsv, with some examples:
- https://csvkit.readthedocs.io/, with some examples:
We did this in class!
htmltab --select .ReportResults "http://www.bigpumpkins.com/WeighoffResultsGPC.aspx?c=W&y=2022" --output watermelons.csv
- You can't use
head -n -1
in OS X, it's awful. You just need to do it manually or steal something from StackOverflow.
- No hints!
- GENERAL HINT: You can redirect the output into a csv using
> output.csv
at the end of your command - MILLER HINT: https://miller.readthedocs.io/en/latest/10min/#handling-field-names-with-spaces
- GENERAL HINT: It's okay for this to take two separate commands, and you manually divide it.
- MILLER HINT: Using
then
is optional but kinda fun - https://miller.readthedocs.io/en/latest/10min/#chaining-verbs-together
I guess the "GPC site" is the event where they showed off the watermelon. What are the top 3 events, and how many watermelons on the list are from each?
- CSVKIT HINT: You probably want
csvstat
. The output will look weird, but it's okay. - XSV HINT: By default,
xsv
'sfrequency
calculate calculates frequency for EVERY COLUMN. You'll probably only want to select one column - MILLER HINT: There are other ways to do it, but https://miller.readthedocs.io/en/latest/reference-verbs/#most-frequent
How many watermelons were over 300 pounds? (if automatically calculating this using the command-line tool doesn't work, maybe try manually counting)
- CSVKIT HINT:
csvsql
is a good one to try here. And colum names with spaces are talked about with brackets around them,\[like this\]
. - MILLER HINT: https://miller.readthedocs.io/en/latest/10min/#handling-field-names-with-spaces
- XSV HINT: it doesn't automatically include the median in statistical calculations!
- XSV HINT: piping it to xsv flatten makes it look a lot nicer (xsv table works, too, but only if your screen is wide)
- MILLER HINT: https://miller.readthedocs.io/en/latest/reference-verbs/#stats1
If you put all of the watermelons into big piles for each country, how much would each country's pile weigh?
- CSVKIT HINT: Use
csvsql
and write some SQL! Be sure to remember that columns with spaces get[]
around them (it also looks nicer if you pipe tocsvlook
) - MILLER HINT:
--opprint
makes the output look a lot nicer.