Skip to content

Instantly share code, notes, and snippets.

View mryap's full-sized avatar
💭
Seeking full-time employment

Mr. Yap mryap

💭
Seeking full-time employment
  • 10:02 (UTC +01:00)
View GitHub Profile
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
@mryap
mryap / install-packages.R
Created September 28, 2016 11:28
Install your favourite packages in one go.
packages <- c(
'dplyr', 'plyr', 'Rcpp', 'chron', 'base64enc', 'data.table', 'reshape2',
'shiny', 'ggplot2', 'rstan', 'RMySQL', 'RPostgreSQL', 'ggmap', 'mapproj',
'curl', 'RGtk2', 'rattle', 'httr', 'devtools', 'RODBC', 'ibmdbR', 'rgdal',
'rmarkdown'
)
install.packages(
packages,
repos = 'http://cran.r-project.org/',
@mryap
mryap / base2tidy.md
Last active December 12, 2018 11:34
Base R to Tidyverse
Base R Tidyverse What it does and why tidyverse Comment
read.csv() read_csv() reads in a csv file, but its much faster, shows progress bar for large files, can automatically parse data types also see read_delim(), read_tsv() and readxl::read_xlsx()
sort(), order() arrange() sort column(n) within a data frame see also order_by()
mtcars$mpg = … mutate() modify a column see also transmute() which drops existing variables
mtcars[,c(“mpg”, “am”)], subset() select(), rename() select or rename columns see also pull()
mtcars[mtcars$am == 1,], subset() filter() select rows based on a criterion
aggregate() summarise(), summarize(), do() reduce grouped values to a single value see also varaints like summarize_if()
ifelse() if_else(), case_when() standand vectorized if else, but stricter than base version see also near()
unique() distinct() finds unique rows in a data frame, but its much, faster

An h1 header

Paragraphs are separated by a blank line.

2nd paragraph. Italic, bold, and monospace. Itemized lists look like:

  • this one
  • that one
atomicwrites==1.2.1
attrs==18.2.0
backcall==0.1.0
beautifulsoup4==4.6.3
bleach==3.0.2
bokeh==0.13.0
certifi==2018.11.29
chardet==3.0.4
Click==7.0
cycler==0.10.0

BigQuery, Google’s managed data warehouse for analytics.

Google Stackdriver, Google’s monitoring, logging, and diagnostics system

Cloud Dataprep by Trifacta is an intelligent data service for visually exploring, cleaning, and preparing structured and unstructured data for analysis. In this lab we will explore the Cloud Dataprep UI to build an ecommerce transformation pipeline that will run at a scheduled interval and output results back into BigQuery.

This query will process 5.63 GB when run.

@mryap
mryap / iiot-google-cloud-platform.md
Last active April 4, 2019 12:51
Industrial Internet of Things on Google Cloud Platform

Communicating with devices

When connecting devices to Google Cloud Platform, you will need to specify which communication protocol your devices will use. The choices are MQTT, HTTP, or both.

MQTT Protocol

MQTT is an industry-standard IoT protocol (Message Queue Telemetry Transport). It is a publish/subscribe (pub/sub) messaging protocol.

The publish/subscribe model is event-driven. Messages are pushed to clients that are subscribed to the topic. The broker is the hub of communication. Clients publish messages to the broker, and the broker pushes messages out to subscribers.

@mryap
mryap / intro.md
Last active May 6, 2021 07:48
Predict Visitor Purchases with a Classification Model with BigQuery ML

Question: Out of the total visitors who visited our website, what % made a purchase?

#standardSQL
WITH visitors AS(
SELECT
COUNT(DISTINCT fullVisitorId) AS total_visitors
FROM `data-to-insights.ecommerce.web_analytics`
),

purchasers AS(
@mryap
mryap / sampling_saves_time.ipynb
Created April 5, 2020 14:36
Sampling_Saves_Time.ipynb
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@mryap
mryap / export_data.md
Last active May 6, 2021 07:45
Exporting Data with readr package
# readr comes with 2 useful functions for exporting data
# write_csv() and write_tsv()
# Saving dates and date-times in ISO8601 format so they are easily parsed elsewhere.

write_csv(pu2, "pu2.csv")