- Program + Slides: https://user2019.r-project.org/talk_schedule/
- Collection during the conf: https://github.com/sowla/useR2019-materials
- Recordings: R Consortium Youtube (soon to be published)
- Julia Stewart Lowndes (open science): Recording + Slides
- Julie Josse (missing data): Recording + Slides
- Joe Cheng (Shiny): Recording + Slides
- Martin Morgan (BioC): Recording + Slides
- Bettina Grün (model based clustering): Recording + Slides
- Julien Cornebise (AI for good): Recording + Slides
- Slides: https://jules32.github.io/useR-2019-keynote/#1
- Video: https://www.youtube.com/watch?v=Z8PqwFPqn6Y
- Illustrations by @allison_horst
- Data Science = turning raw data into understanding
- Summary in a tweet
- new functions
pivot_long
andpivot_wide
in tidyr that makes it easier to get data into the tidy data framework - Slides: rstd.io/tidyhancements-2019
- Material: https://gist.github.com/hadley/eb5c97bfbf257d133a7337b33d9f02d1
- More info: https://twitter.com/hadleywickham/status/1148894754924564480
- advent of code for data science
- R package released soon
- material: https://github.com/isteves/ds-puzzles
teaching concepts around DS:
- how to name files
- use small test cases
- work with self-contained code
- use projects & version control
library(tidiesofmarch)
start_puzzle()
#'
#+
- facilitate ... making teaching assessment easy
- automate, cause many classes and semesters
- distribute assingents to a repo (mirroring), add a team or an indiivual to a repository
usethis::ui_*
- for each class an organisation, invites all students, one repo with templates
- wercker for automatic assesment
- feedback for style
- Slides: https://github.com/rundel/Presentations/blob/master/UseR2019/UseR2019.pdf
rundel/ghclass
- similar to github classrooms
- Slides + Material: https://twitter.com/rundel/status/1148934292589961216
- Summary in a tweet
- rstd.io/dsbox-slides
- three things: content, pedagogy, infrastructure = DS in a box
- back: rstudio-education/datascience-box, front: datasciencebox.org
- clientele: used teacher, new teacher,
- five design principles
- cherish day one = use R studio cloud
- start with cake = show the end result, then start manipulating
- skip baby steps = do drill exercises at home
- hide the veggies = broccoli is the analogy for reg expression wrapped in web scraping
- leverage the eco system = endulge in the R eco system (ghclass, blogdown, xaringan), learn use stuff.
- Slides + Material: https://twitter.com/minebocek/status/1148904275550121984
- Summary in a tweet
- different types of missing values: NA (forgotten to fill the form), imp (impossible to take measure), ...
- systematic values
- presenting alteratives to na.action = na.omit
- study NAs with VIM, naniar, FactoMineR
-
- handling missing values
-
- doing supervised learning with missing values
- modify estimation process to deal with missing values
- Imputation to get a complete dataset (e.g. with the mean)
misaem
package
- Consistency of supervised learning with missing values
- Theory whats consistent
- Then test all the algorithms what works, e.g. with EM
- publish imputation algorithm
- use a lot of data and any constant (e..g the mean or a value out of range)
crul, webmockr, vcr
- ropensci has lots of packages that do http requests
crul
: replaces httr and curl (friendlier)- mocking and caching
- forked form another language (perl?)
webmockr
: like unittesting / expectation > set what to match agains, only allowing http requests that match a certain patternvcr
: speeds up your test (caching)
- slides: https://scotttalks.info/user-http/#/intro
- packages:
- Summary in a tweet
- convencience function for workflows
- 3 main functions;
create_package()
create_project()
create_from_github()
use_*
>> add or modify something to a project or package
- devtools and usethis uncoupling
- devtools (meta package), broken up in small packages, e.g. usethis
- interactive way : add to .Rprofile or attach with devtools
- programmatic use: use with correct namespace
use_git
use_license()
check()
use_github()
install()
use_readme_rmd()
create_from_github()
use_pr_fetch({ISSUENUMBER})
pr_push()
pr_finish()
use_course("ZIP FILE")
or github repouse_zip
, less pedanticuse_course("USER/REPO")
- Slides + Material: https://github.com/jennybc/2019-07_useR-toulouse-usethis
- Summary in a tweet
- datamasking - like base::subset and base::transform or lm formulas are datamasking + data.table too.
- rlang package tidyeval
{{ arg }}
curly-curly- inspired by glue
{{ var }}
shorcut for !!enquo(var)
- rray stricter array class
- matrix are a specific case of array
- drop the dimenson of 1
bag[,1, drop = FALSE]
- increasing dimensionality
- recycling dimensions
- github.com/r-lib/rray
- slides: https://github.com/DavisVaughan/2019-07-09_useR-2019-rray/blob/master/useR-2019-rray.pdf
- Summary in a tweet
- Field epidemiologists are essentials for getting context!
- epidemiologists spend lots of time in data entry, data cleaning: frustrating, wasted time, repeated task.
- collect data + prepare report + operational decisions
- should spend the least amount of time during report preparation
Partners:
- @RECONEPI: https://www.repidemicsconsortium.org/
- @MSF
- #R4EPIs https://github.com/R4EPI
Done: take template reports + automate.
- package sitrep: gives epidemiologists templates: https://github.com/R4EPI/sitrep
- https://docs.google.com/presentation/d/1OeyEBEH9IHXtFtExiXk-JMxLsYWoNtQCRqh2YkpcnkM/mobilepresent#slide=id.p
- [Summary in a tweet]https://twitter.com/sinarueeger/status/1149323941984571392)
- how to convince ppl that
- package https://github.com/gowerc/diffdf inspired by SAS
- Summary in a tweet
KASA.AI
- uros 2020 in Vienna
- created at unconfUROS2018: https://github.com/uRosConf/voronoiTreemap
- Summary in a tweet
- cancer genomics viz
- package oncoplots
- https://rdrr.io/bioc/maftools/src/R/oncoplot.R
install.packages("pak")
Two main functions: pakk:::pkg_
and pakk:::proj_*
- lazy installation: only update if needed
- caching
- install al dependencies in the same library
- report conflicts up front
- CRAN BioC
- GH
pak::pkg_install()
pak::pkg_remove()
pak::pkg_install()
pak::pkg_install("r-lib/usethis")
To achieve this:
R/
DESCRIPTION
.Rprofile
r-packages/
.Rbuildignore
do this:
pak:::proj_create
pak:::proj_install
pak:::proj_install_dev()
- Slides: https://github.com/gaborcsardi/pak-talk
- https://github.com/r-lib/pak
- Summary in a tweet
- Summary of developments in R's data.table package
- 69 contributers to r-datatable.com
- 15th most dependent package
DT[ k, j, by]
- on which rows
- what to do
- grouped by what?
- speeding up i
- auto indexing
Using altrep functionality (R 3.5+) for data read in and read out.
- .1 sec : instatn
- 1 sec : some delay, keep flow
-
10 sec: stops
- memory mapped
- multi-threaded
- some C function
- altrep : alternative representation (since R 3.5+) for on demand parsing
- adv: less memory
- disadv: hash lookup
- disadv: single threaded
vroom fast, whats the price for this?
- operations after
- print, head, tail, sample, filter, agg > still faster
- ex: 1e6 x 25 (468 MB)
- data.table fastest
- vroom altrep full fastest
- vroom with select
vroom(PATH, col_select = list(medaillon, ...))
- with remove
vroom(PATH, col_select = -hack_license)
- on the fly renaming
- multiple dataset to one
vroom( c(PATH-1, PATH-2), id = "path")
(vroom altrep both faster) vroom_fwf
vroom_writer
(incl. gzip extension)
- slides: https://speakerdeck.com/jimhester/vroom
- screen cast: bit.ly/vroom-yt
- package: vroom.r-lib.org
- Summary in a tweet
- Model deployment similar to Travis CI
- DevOps: union of ppl, process and products, all together delivering value (bit.ly/WhatIs-DevOps)
- Uses YAML file like Travis CI
- try here: azure.com/pipelines (for free)
- https://github.com/revodavid/RMLops
- https://github.com/revodavid/RMLops/blob/master/user2019slides.pdf
- Summary in a tweet
- authentication: verifying an identity/credentials
- authorisation: verifying access rights / permissions
sealr
inspired by passport.js
- Demo: https://frie.codes/rstatsmemes/
- Slides: https://frie.codes/user2019_slides/#1
- Summary in a tweet
datakind.org
usethis::ui_*
- library(ghclass)
- harvesting data
- datasciencebox.org
- github/ rstudio-education/datascience-box
- exploration (wrangling, dataviz), making conclusions + playground
- learnr r package (interactive tutorials)
- https://cran.r-project.org/web/packages/genogeographer/index.html
- what is monkey patching?