Skip to content

Instantly share code, notes, and snippets.

View GuiMarthe's full-sized avatar

Guilherme Marthe GuiMarthe

View GitHub Profile
@GuiMarthe
GuiMarthe / kill_tmux.sh
Created October 24, 2017 21:27
kill all tmux sessions
tmux ls | grep : | cut -d. -f1 | awk '{print substr($1, 0, length($1)-1)}' | xargs kill
@GuiMarthe
GuiMarthe / table_sizes.sql
Created November 22, 2017 16:36
query for table sizes estimates in MB on oracle databases
-- Tables + Size MB
select owner, table_name, round((num_rows*avg_row_len)/(1024*1024)) MB
from all_tables
where owner not like 'SYS%' -- Exclude system tables.
and num_rows > 0 -- Ignore empty Tables.
order by MB desc -- Biggest first.
;
@GuiMarthe
GuiMarthe / little_modify.R
Created December 20, 2017 13:38
a simple gist I made for understanding how to use purrr's modify_depth function.
library(purrr)
list(
c(1,2,3,4, NA),
c(5,6,7, NA, NA),
c(12,12, 12, NA, NA),
c(3, NA)
) %>% modify_depth(1, ~keep(.x = ., .p = ~!is.na(.)))
@GuiMarthe
GuiMarthe / multiclass_decision_tree.R
Created January 22, 2018 19:12
A multiclass decision tree example
library(rpart)
library(tidyverse)
library(ggdendro)
ggplot(data = iris,
aes(Sepal.Length, Petal.Length, color = Species))+
geom_point()
dt <- rpart(Species ~ Sepal.Length + Petal.Length,
data = iris,
@GuiMarthe
GuiMarthe / add_lagged.R
Created April 11, 2018 16:36
Little function I created in R for adding all lagged values up to n of a variable to a df. Can be improved for handling more than one variable.
add_lagged <- function(df, var, n = 1) {
var <- enquo(var)
names <- map(1:n, ~ paste0(quo_name(var), '_lag_' ,.))
lagged_cols <- map2(1:n, names, ~ df %>% transmute(!!.y := lag(!!var, n = .x))) %>%
bind_cols()
df %>% bind_cols(lagged_cols)
}
We can make this file beautiful and searchable if this error is corrected: No commas found in this CSV file in line 0.
col
1
2
2
2
2
2
2
3
2
@GuiMarthe
GuiMarthe / pandas_caching_decorator.py
Last active November 15, 2023 19:10
This decorator caches a pandas.DataFrame returning function. It saves the pandas.DataFrame in a parquet file in the cache_dir.
import pandas as pd
from pathlib import Path
from functools import wraps
def cache_pandas_result(cache_dir, hard_reset: bool):
'''
This decorator caches a pandas.DataFrame returning function.
It saves the pandas.DataFrame in a parquet file in the cache_dir.
It uses the following naming scheme for the caching files:
@GuiMarthe
GuiMarthe / subsampling_by_kde.R
Created July 20, 2019 20:36
A simple procedure for sampling a distribution to look like another. A method through binning and another by kde estimation. The binning idea came from this stats exchange question and the kde method came from other studies of mine.
library(tidyverse)
library(broom)
df <-
tibble(
label = factor(c(rep("group1", 8E4), rep("group2", 1E4))),
var = c(rnorm(n = 8E4, mean =2, sd= 5), c( rnorm(n = 5E3,mean =-2, sd= 0.5), rnorm(n=5E3, mean = 1, sd = 0.5)))
)
@GuiMarthe
GuiMarthe / subsampling_by_kde.R
Created July 20, 2019 20:36
A simple procedure for sampling a distribution to look like another. A method through binning and another by kde estimation. The binning idea came from this stats exchange question and the kde method came from other studies of mine. https://stats.stackexchange.com/questions/286062/distribution-matching-by-subsampling
library(tidyverse)
library(broom)
df <-
tibble(
label = factor(c(rep("group1", 8E4), rep("group2", 1E4))),
var = c(rnorm(n = 8E4, mean =2, sd= 5), c( rnorm(n = 5E3,mean =-2, sd= 0.5), rnorm(n=5E3, mean = 1, sd = 0.5)))
)
@GuiMarthe
GuiMarthe / www.bclplaw.com.litigation.R
Created September 16, 2019 20:03
A nice chart by @hrbrmstr on twitter that I want to save for later
library(ggalt)
library(hrbrthemes)
library(tidyverse)
structure(list(district = structure(13:1, .Label = c("E.D. New York",
"D. New Jersey", "W.D. Wisconsin", "D. Delaware", "S.D. Florida",
"N.D. Illinois", "M.D. Florida", "S.D. New York", "D. Connecticut",
"D. Maryland", "N.D. California", "N.D. Georgia", "C.D. California"
), class = "factor"), `2017` = c(0.14, 0.16, 0.14, 0.01, 0.01,
0.04, 0.04, 0.04, 0.03, 0.01, 0.01, 0.06, 0.03), `2018` = c(0.26,