What is your favorite thing you learned yesterday about spreadsheets or OpenRefine?
Carpentries Brown Bag - Querying data from database API services with Python Requests: https://github.com/amdevine/cbb-python-requests
Carpentries Brown Bag - Scripting with OpenRefine: https://docs.google.com/presentation/d/1fwhSSJRrFB21KJwKVcut20x1YHU_u8Hj2h-63QiNqQY/edit#slide=id.p
Lesson: https://datacarpentry.org/sql-ecology-lesson/00-sql-introduction/index.html
Data: https://doi.org/10.6084/m9.figshare.1314459
Cheat sheet: https://gist.github.com/amdevine/9460baa6f0fc6525b16724947cfb20b4
Import the plots (plots.csv) and species (species.csv) tables. Make sure that the plot_id field in the plots table has the INTEGER data type.
Lesson: https://datacarpentry.org/sql-ecology-lesson/01-sql-basic-queries/index.html
Write a query that returns the year, month, day, species_id and weight in mg.
Produce a table listing the data for all individuals in Plot 1 that weighed more than 75 grams, telling us the date, species id code, and weight (in kg).
Write a query that returns year, species_id, and weight in kg from the surveys table, sorted with the largest weights at the top.
Using the surveys table, write a query to display the three date fields, species_id, and weight in kilograms (rounded to two decimal places), for individuals captured in 1999, ordered alphabetically by the species_id.
Lesson: https://datacarpentry.org/sql-ecology-lesson/02-sql-aggregation/index.html
- Write a query that returns the total weight, average weight, minimum and maximum weights for all animals caught over the duration of the survey.
- Modify the query so that it outputs these values only for weights between 5 and 10.
Write queries that return:
- How many individuals were counted in each year in total
- How many were counted each year, for each different species
- The average weights of each species in each year
Can you get the answer to both 2 and 3 in a single query?
Write a query that returns, from the species table, the number of species in each taxa, only for the taxa with more than 10 species.
Run these three queries and note the results from each. Do you see a discrepency? What is causing this discrepency?
SELECT COUNT(*)
FROM surveys
WHERE sex != 'F';
SELECT COUNT(*)
FROM surveys
WHERE sex != 'M';
SELECT COUNT(*)
FROM surveys;
Lesson: https://datacarpentry.org/sql-ecology-lesson/03-sql-joins/index.html
Graphics illustrating inner and left joins: https://datacarpentry.org/python-ecology-lesson/05-merging-data/index.html
Write a query that returns the genus, the species name, and the weight of every individual captured at the site
Re-write the original query to keep all the entries present in the surveys table. How many records are returned by this query?
Count the number of records in the surveys table that have a NULL value in the species_id column.
Write a query that returns the number of animals caught of each genus in each plot. Order the results by plot number (ascending) and by descending number of individuals in each plot.
Write a query that finds the average weight of each rodent species (i.e., only include species with Rodent in the taxa field).
Write a query that returns 30 instead of NULL for values in the hindfoot_length column. Then modify your query to return the average hindfoot length for all records, assuming unknown values are 30. Finally, modify your query again to find the average hindfoot length for each species, assuming unknown values are 30.
Write a query that returns the number of animals caught of each genus in each plot, using IFNULL to assume that unknown species are all of the genus “Rodent”.
Write a query that returns genus names (no repeats), sorted from longest genus name down to shortest.