GuiMarthe’s gists

GuiMarthe / kill_tmux.sh

Created October 24, 2017 21:27

kill all tmux sessions

tmux ls | grep : | cut -d. -f1 | awk '{print substr($1, 0, length($1)-1)}' | xargs kill

GuiMarthe / table_sizes.sql

Created November 22, 2017 16:36

query for table sizes estimates in MB on oracle databases

	-- Tables + Size MB
	select owner, table_name, round((num_rowsavg_row_len)/(10241024)) MB
	from all_tables
	where owner not like 'SYS%' -- Exclude system tables.
	and num_rows > 0 -- Ignore empty Tables.
	order by MB desc -- Biggest first.
	;

GuiMarthe / little_modify.R

Created December 20, 2017 13:38

a simple gist I made for understanding how to use purrr's modify_depth function.

	library(purrr)

	list(
	c(1,2,3,4, NA),
	c(5,6,7, NA, NA),
	c(12,12, 12, NA, NA),
	c(3, NA)
	) %>% modify_depth(1, ~keep(.x = ., .p = ~!is.na(.)))

GuiMarthe / multiclass_decision_tree.R

Created January 22, 2018 19:12

A multiclass decision tree example

	library(rpart)
	library(tidyverse)
	library(ggdendro)

	ggplot(data = iris,
	aes(Sepal.Length, Petal.Length, color = Species))+
	geom_point()

	dt <- rpart(Species ~ Sepal.Length + Petal.Length,
	data = iris,

GuiMarthe / add_lagged.R

Created April 11, 2018 16:36

Little function I created in R for adding all lagged values up to n of a variable to a df. Can be improved for handling more than one variable.

	add_lagged <- function(df, var, n = 1) {
	var <- enquo(var)
	names <- map(1:n, ~ paste0(quo_name(var), '_lag_' ,.))

	lagged_cols <- map2(1:n, names, ~ df %>% transmute(!!.y := lag(!!var, n = .x))) %>%
	bind_cols()

	df %>% bind_cols(lagged_cols)
	}

GuiMarthe / fun.csv

Last active June 27, 2018 21:33

We can make this file beautiful and searchable if this error is corrected: No commas found in this CSV file in line 0.

	col
	1
	2
	2
	2
	2
	2
	2
	3
	2

GuiMarthe / pandas_caching_decorator.py

Last active November 15, 2023 19:10

This decorator caches a pandas.DataFrame returning function. It saves the pandas.DataFrame in a parquet file in the cache_dir.

	import pandas as pd
	from pathlib import Path
	from functools import wraps

	def cache_pandas_result(cache_dir, hard_reset: bool):
	'''
	This decorator caches a pandas.DataFrame returning function.
	It saves the pandas.DataFrame in a parquet file in the cache_dir.
	It uses the following naming scheme for the caching files:

GuiMarthe / subsampling_by_kde.R

Created July 20, 2019 20:36

A simple procedure for sampling a distribution to look like another. A method through binning and another by kde estimation. The binning idea came from this stats exchange question and the kde method came from other studies of mine.

	library(tidyverse)
	library(broom)


	df <-
	tibble(
	label = factor(c(rep("group1", 8E4), rep("group2", 1E4))),
	var = c(rnorm(n = 8E4, mean =2, sd= 5), c( rnorm(n = 5E3,mean =-2, sd= 0.5), rnorm(n=5E3, mean = 1, sd = 0.5)))
	)

GuiMarthe / subsampling_by_kde.R

Created July 20, 2019 20:36

A simple procedure for sampling a distribution to look like another. A method through binning and another by kde estimation. The binning idea came from this stats exchange question and the kde method came from other studies of mine. https://stats.stackexchange.com/questions/286062/distribution-matching-by-subsampling

	library(tidyverse)
	library(broom)


	df <-
	tibble(
	label = factor(c(rep("group1", 8E4), rep("group2", 1E4))),
	var = c(rnorm(n = 8E4, mean =2, sd= 5), c( rnorm(n = 5E3,mean =-2, sd= 0.5), rnorm(n=5E3, mean = 1, sd = 0.5)))
	)

GuiMarthe / www.bclplaw.com.litigation.R

Created September 16, 2019 20:03

A nice chart by @hrbrmstr on twitter that I want to save for later

	library(ggalt)
	library(hrbrthemes)
	library(tidyverse)

	structure(list(district = structure(13:1, .Label = c("E.D. New York",
	"D. New Jersey", "W.D. Wisconsin", "D. Delaware", "S.D. Florida",
	"N.D. Illinois", "M.D. Florida", "S.D. New York", "D. Connecticut",
	"D. Maryland", "N.D. California", "N.D. Georgia", "C.D. California"
	), class = "factor"), `2017` = c(0.14, 0.16, 0.14, 0.01, 0.01,
	0.04, 0.04, 0.04, 0.03, 0.01, 0.01, 0.06, 0.03), `2018` = c(0.26,

Guilherme Marthe GuiMarthe