Notes:
-
I've tried to break up in to separate pieces, but it's not always possible: e.g. knowledge of data structures and subsetting are tidy intertwined.
-
Level of Bloom's taxonomy listed in square brackets, e.g. http://bit.ly/15gqPEx. Few categories currently assess components higher in the taxonomy.
-
basic data structures (vector, matrix, list and data frame):
-
list and describe their differences (dimensionality, homogeneous vs. heterogeneous) [knowledge]
-
pick the best data structure for a given problem [application]
-
recall functions to coerce data structures between different forms [knowledge], and recognise which coercions are lossy [comprehension]
-
match data types and the functions that identify them, and remember common gotchas (is.vector, is.numeric etc.) [comprehension]
-
-
str
:-
interpret the output of
str
[comprehension] -
use
str
and subsetting to extract desired pieces from an arbitrary object (for example, extract the r squared value from a linear model) [application]
-
-
vectors:
-
recognise which types of data corresponding to the four common atomic vectors (character, double, integer, logical) [knowledge]
-
recognise the use of
L
to create integer vectors [knowledge] -
create new vectors with
c()
, and correctly predict vector type when multiple types are mixed (e.g. what is the type ofc(1, 1L, F)
) [application] -
create named vectors with
c()
, recognise how named vectors are printed and how to extract values with character subsetting [application] -
employ implicit logical to numerical coercion to compute number and proportion of TRUEs in a vector (e.g. what proportion of values are missing?) [application]
-
predict how missing values propagate [application], and discuss why
is.na()
is necessary [synthesis]
-
-
data frames:
-
use
data.frame()
to create a data frame from multiple vectors, and control the names of the generated columns [application] -
describe the situations under which strings are coerced to factors, and recall how to use
I
,asis = TRUE
orstringsAsFactors = FALSE
to prevent conversion [knowledge] -
combine two or more data frames with
cbind()
andrbind()
, and describe what conditions must be true for the combination to work [knowledge] -
use
head()
,tail()
,summary()
andstr()
to get an overview of a data frame [application] -
describe how 1d and 2d subsetting of data frame differ, and enumerate the circumstances under which subsetting a data frame will return a column instead of a data frame [comprehension]
-
-
matrices
-
contrast 1d vector operations and 2d matrix operations (e.g.
names()
vs.colnames()
&rownames()
,length()
vsnrow()
andncol()
). [analysis] -
predict the output when a matrix is coerced into a vector (i.e. remember that R matrices are stored col-wise)
-
-
lists
-
create a new list with
list()
, and selectively name components [application] -
convert a list into a vector with unlist, and apply implicit coercion rules to predict type of output [application]
-
-
NULL
-
strings vs. factors vs. ordered factors
-
recall the key differences (cardinality, ordering) between strings, factors and ordered factors [knowledge]
-
select the most appropriate type for a given variable [analysis]
-
describe the operation of
drop = TRUE
, when it is needed, and remedies if you are using it frequently [application] -
match data types with conversion and testing functions, and list common gotchas (e.g. converting an ordered factor to a factor) [knowledge]
-
-
know enough about floating point math to predict the output of
sqrt(2)^ 2 - 2 == 0
and spot potentially hazardous use of equality comparisons [application]
-
types of subsetting
-
match the six types of subsetting objects with their results [knowledge]
-
compare and contrast the use of subsetting,
match
and%in%
when looking for matching values across two vectors [application] -
use integer subsetting to order multidimensional structures [application]
-
apply De Morgan's rule to simplify a complicated double negation [application]
-
identify uses of
which()
that are redundant (i.e. only need which you want the position of nth TRUE) [analysis] -
use repeated values in numeric indexing to create a "subset" that is larger than the original set [application]
-
use character subsetting to create a lookup table [application]
-
-
understand how 1d subsetting generalises to 2d subsetting [comprehension]
-
describe the difference between simplifying and preserving subsetting (
[`` vs
[[, when
drop = FALSE` is necessary) [analysis] -
understand the difference between
x$y
andx[["y"]]
and know when to use each form [application] -
use subsetting with assignment to change multiple values in a data structure at once [application]
-
use subsetting with assignment and NULL to remove elements from a list/data frame [application]
-
identify when subsetting + assignment will fail because the number of values to assign does not match the number of values in the subset [analysis]
-
use R's boolean operators to recreate english expressions (e.g. x is less than 50 and more than 25). Recall the difference between R's or and or in regular English. [application]
-
compare and contrast
&
and|
with&&
and||
[analysis]
-
identify the correct function to read/write a data frame to/from disk (csv, tab delimited or fixed width file) [application]
-
use common arguments (
na.string
,sep
,header
) to deal with files that have unusual structure [analysis] -
recongise the lack of symmetry between
read.csv()
andwrite.csv()
, and describe which options should be used by default [knowledge] -
use subset & transform to reduce the amount of typing for common data manipulation operations [knowledge]
-
use
readRDS
/saveRDS
to cache binary R objects that were expensive to compute [application] -
understand what
save()
andload()
do, how they differ fromreadRDS()
andsaveRDS()
[knowledge] and when to use them instead of the single object variants [evaluation]
-
convert a simple script into parameterised functions [synthesis]
-
describe a simple R function in words [synthesis]
-
describe R's argument matching semantics (position, partial, exact) [knowledge], predict how they apply in a specific situation [application], and evaluate good and less-good use of the three different types [evaluation]
-
describe the parts of a function using correct terminology: body, formal arguments, return value [comprehension]
-
use scoping rules to predict how names are mapped to values [application]
-
describe short-circuiting and its impact on expressions like
is.null(x) || all(is.na(x))
orTRUE || stop("!")
-
execute a script of R code with
source())
-
describe the structure of an if statement [comprehension]
-
use a for loop to repeat the same operation on different elements of a data structure [application]
-
convert a for loop to a while loop [analysis]
-
illustrate why
1:length(x)
is dangerous and suggest a safer way [application] -
correct the identing and spacing of a piece of poorly formatted source code [application]
-
describe what vectorisation means, distinguish internal and external vectorisation, and the performance consequence of each functions [knowledge]
-
use vectorised operations instead of for loops to perform simple mathematical operations (log, addition, subtraction etc.) [application]
-
use
lapply()
,sapply()
andapply()
to vectorise operations that are not already vectorised. [analysis] -
convert an
lapply()
call to a for loop [application] -
recognise a for-loop that can be rewritten to use
lapply
[knowledge] -
match common non-vectorised equivalents to their vectorised equivalents (e.g.
min()
andpmin()
,sum()
tocumsum()
andcolSums()
) [knowledge] -
describe basic recycling rules, and know how to avoid them when necesary [knowledge]
-
recognise and remedy simple syntax errors (missing quotes, missing parentheses etc.) [comprehension]
-
use
try()
to recover from an error [application] -
interpret the output of `traceback()`` to identify where an error occured [application]
-
initiate an interactive debugger with
browser()
oroptions(error = recover())
[application] -
list the commands used to control
browser()
/recover()
[knowledge] -
use
options(warn = 2)
to convert warnings into errors for debug -
create a minimal reproducible example to get help from others [synthesis]
-
find help for a function, data set, and package [knowledge]
-
read and interpret the documentation of a function [analysis]
-
use google to identify the name of a function that performs a given task
-
install a packages with
install.packages()
[comprehension] -
load a package with
library()
orrequire()
[comprehension] -
determine which packages are out of date [application]
-
understand lifetime of
install.packages
/library
effects [comprehension] -
use
::
to refer to a function in a specific package
Awesome work! Maybe mention that data frames have lists as their foundation (even showing
is.list(data.frame()) == TRUE
), as I think this clears up why we can't use matrices for much of the data we work with (because there is type heterogeneity across column vectors).