This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Can I use CSV files? What about streaming production data? So I have this SAS file... | |
You can use any kind of data ingest your heart desires. The built-in import stage comes with support for many common formats (link), but it is easy to add more (link). | |
If you wish to use your live production data, write a package and add an import adapter (link). | |
What is a mungebit and why are you making up words? | |
A mungebit is the correct mathematical abstraction for wrangling a data set in a way that you won't have to bug a software or data engineer to make it "production ready" or live in "the data pipeline." It means you can turn the 90% of time data scientists spend on data wrangling into 10%. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// -*- C++ -*- | |
//===--------------------------- __debug ----------------------------------===// | |
// | |
// The LLVM Compiler Infrastructure | |
// | |
// This file is dual licensed under the MIT and the University of Illinois Open | |
// Source Licenses. See LICENSE.TXT for details. | |
// | |
//===----------------------------------------------------------------------===// |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2010-07-29 19:56 526202880 s3://arxiv/pdf/arXiv_pdf_0001_001.tar | |
2010-07-29 20:08 138854400 s3://arxiv/pdf/arXiv_pdf_0001_002.tar | |
2010-07-29 20:14 525742080 s3://arxiv/pdf/arXiv_pdf_0002_001.tar | |
2010-07-29 20:33 156743680 s3://arxiv/pdf/arXiv_pdf_0002_002.tar | |
2010-07-29 20:38 525731840 s3://arxiv/pdf/arXiv_pdf_0003_001.tar | |
2010-07-29 20:52 187607040 s3://arxiv/pdf/arXiv_pdf_0003_002.tar | |
2010-07-29 20:58 525731840 s3://arxiv/pdf/arXiv_pdf_0004_001.tar | |
2010-07-29 21:11 44851200 s3://arxiv/pdf/arXiv_pdf_0004_002.tar | |
2010-07-29 21:14 526305280 s3://arxiv/pdf/arXiv_pdf_0005_001.tar | |
2010-07-29 21:27 234711040 s3://arxiv/pdf/arXiv_pdf_0005_002.tar |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# A ternary operator in R. | |
if (!is.element("super", installed.packages()[,1])) { devtools::install_github("robertzk/super") } | |
`?` <- function(expr1, expr2) { | |
if (missing(expr2)) { | |
super::super(expr1) | |
} else { | |
expr2 <- substitute(expr2) | |
if (!(is.call(expr2) && identical(expr2[[1]], as.name(":")))) { | |
super::super(expr1, expr2) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env bash | |
set -o nounset | |
set -o errexit | |
readonly default_env="${0#*-}" | |
readonly ENV=${1:-$default_env} | |
readonly JAR_NAME="your-analytics.jar" | |
readonly UPLOAD_JAR=`dirname $0`/../target/scala-2.10/$JAR_NAME |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
mattr_accessor - https://github.com/rails/rails/blob/master/railties/lib/rails/generators.rb#L24 | |
levenshtein_distance for suggestions - https://github.com/rails/rails/blob/master/railties/lib/rails/generators.rb#L160 | |
# https://github.com/rails/rails/blob/master/railties/lib/rails/generators.rb#L258 | |
# Rescue from LoadError | |
# https://github.com/rails/rails/blob/master/railties/lib/rails/generators.rb#L334 | |
[1] pry(main)> ["a","b","c","d"].to_sentence(last_word_connector: " and ") | |
=> "a, b, c and d" | |
# https://github.com/rails/rails/blob/master/railties/lib/rails/generators.rb#L162 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
fn2 <- function(id, model_version, type = 'loan_id') { | |
Reduce(rbind, lapply(id, function(id) { | |
seed <- as.integer(paste0("0x", substr(digest::digest(paste(id, model_version, type)), 1, 6))) | |
set.seed(seed) | |
data.frame(id = id, x = runif(1), y = rnorm(1)) | |
})) | |
} | |
cache <- function(uncached_function, prefix, key, salt) { | |
cached_function <- new("function") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
> methods(median) | |
[1] median.default | |
> methods(class = 'factor') | |
[1] [.factor [[.factor [[<-.factor [<-.factor all.equal.factor as.character.factor as.data.frame.factor as.Date.factor as.list.factor | |
[10] as.logical.factor as.POSIXlt.factor as.quoted.factor* as.vector.factor droplevels.factor format.factor is.na<-.factor length<-.factor levels<-.factor | |
[19] Math.factor Ops.factor plot.factor* print.factor relevel.factor* relist.factor* rep.factor summary.factor Summary.factor | |
[28] xtfrm.factor | |
Non-visible functions are asterisked | |
> trace(run_model, edit = TRUE) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
```R | |
> getGroup("==", recursive = TRUE) | |
[[1]] | |
[1] "Compare" | |
[[2]] | |
[1] "Ops" | |
``` | |
Allows determining group of an operator, useful when wanting to override arithmetic and others and not knowing where to look in the documentation. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
function(model, data, factor_variable, top = 4, stat = function(y) median(y, na.rm = T)) { | |
library(productivus) | |
if (is(model, 'tundraContainer')) { | |
data <- model$munge(data) | |
model <- model$output$model | |
} | |
stopifnot(is(model, 'gbm')) | |
tbl <- names(table(data[[factor_variable]])) | |
tops <- tbl[seq_len(min(top, length(tbl)))] | |
~{as.character(data[[factor_variable]])} |