| Base R | Tidyverse | What it does and why tidyverse | Comment |
|---|---|---|---|
| read.csv() | read_csv() | reads in a csv file, but its much faster, shows progress bar for large files, can automatically parse data types | also see read_delim(), read_tsv() and readxl::read_xlsx() |
| sort(), order() | arrange() | sort column(n) within a data frame | see also order_by() |
| mtcars$mpg = … | mutate() | modify a column | see also transmute() which drops existing variables |
| mtcars[,c(“mpg”, “am”)], subset() | select(), rename() | select or rename columns | see also pull() |
| mtcars[mtcars$am == 1,], subset() | filter() | select rows based on a criterion | |
| aggregate() | summarise(), summarize(), do() | reduce grouped values to a single value | see also varaints like summarize_if() |
| ifelse() | if_else(), case_when() | standand vectorized if else, but stricter than base version | see also near() |
| unique() | distinct() | finds unique rows in a data frame, but its much, faster |
| { | |
| "cells": [ | |
| { | |
| "cell_type": "code", | |
| "execution_count": 1, | |
| "metadata": { | |
| "collapsed": true | |
| }, | |
| "outputs": [], | |
| "source": [ |
| packages <- c( | |
| 'dplyr', 'plyr', 'Rcpp', 'chron', 'base64enc', 'data.table', 'reshape2', | |
| 'shiny', 'ggplot2', 'rstan', 'RMySQL', 'RPostgreSQL', 'ggmap', 'mapproj', | |
| 'curl', 'RGtk2', 'rattle', 'httr', 'devtools', 'RODBC', 'ibmdbR', 'rgdal', | |
| 'rmarkdown' | |
| ) | |
| install.packages( | |
| packages, | |
| repos = 'http://cran.r-project.org/', |
| atomicwrites==1.2.1 | |
| attrs==18.2.0 | |
| backcall==0.1.0 | |
| beautifulsoup4==4.6.3 | |
| bleach==3.0.2 | |
| bokeh==0.13.0 | |
| certifi==2018.11.29 | |
| chardet==3.0.4 | |
| Click==7.0 | |
| cycler==0.10.0 |
BigQuery, Google’s managed data warehouse for analytics.
Google Stackdriver, Google’s monitoring, logging, and diagnostics system
Cloud Dataprep by Trifacta is an intelligent data service for visually exploring, cleaning, and preparing structured and unstructured data for analysis. In this lab we will explore the Cloud Dataprep UI to build an ecommerce transformation pipeline that will run at a scheduled interval and output results back into BigQuery.
This query will process 5.63 GB when run.
When connecting devices to Google Cloud Platform, you will need to specify which communication protocol your devices will use. The choices are MQTT, HTTP, or both.
MQTT is an industry-standard IoT protocol (Message Queue Telemetry Transport). It is a publish/subscribe (pub/sub) messaging protocol.
The publish/subscribe model is event-driven. Messages are pushed to clients that are subscribed to the topic. The broker is the hub of communication. Clients publish messages to the broker, and the broker pushes messages out to subscribers.
# readr comes with 2 useful functions for exporting data
# write_csv() and write_tsv()
# Saving dates and date-times in ISO8601 format so they are easily parsed elsewhere.
write_csv(pu2, "pu2.csv")
