Skip to content

Instantly share code, notes, and snippets.

View Black-Milk's full-sized avatar
🐇
turtles all the way down…

Edwin M. Black-Milk

🐇
turtles all the way down…
View GitHub Profile
@jennybc
jennybc / 2014-10-12_stop-working-directory-insanity.md
Last active May 1, 2025 19:00
Stop the working directory insanity

There are packages for this now!

2017-08-03: Since I wrote this in 2014, the universe, specifically Kirill Müller (https://github.com/krlmlr), has provided better solutions to this problem. I now recommend that you use one of these two packages:

  • rprojroot: This is the main package with functions to help you express paths in a way that will "just work" when developing interactively in an RStudio Project and when you render your file.
  • here: A lightweight wrapper around rprojroot that anticipates the most likely scenario: you want to write paths relative to the top-level directory, defined as an RStudio project or Git repo. TRY THIS FIRST.

I love these packages so much I wrote an ode to here.

I use these packages now instead of what I describe below. I'll leave this gist up for historical interest. 😆

@wrouesnel
wrouesnel / main.py
Last active May 23, 2021 12:52
Python argparse with config file fallback. This is great for scripting up daemon-like tools (note PyDev template syntax - replace as needed)
#!/usr/bin/env python
# encoding: utf-8
'''
${module}
'''
import sys
import os
from os import path
@zkamvar
zkamvar / Makevars
Created July 14, 2015 20:54
My R Makevars file to ensure that I build R packages with openmp
# This file's location is ~/.R/Makevars
# Default variables (no omp support):
# CXX=clang++
# CC=clang
# I followed the instructions at http://hpc.sourceforge.net/ to install gcc 4.9
CC=/usr/local/bin/gcc
CXX=/usr/local/bin/g++
FC=/usr/local/bin/gfortran
F77=/usr/local/bin/gfortran
@jmindek
jmindek / gist:62c50dd766556b7b16d6
Last active January 31, 2024 15:48
DISTINCT ON like functionality for Redshift

distinct column -> For each row returned, return only the unique members of a set. Think of it as for each row in a projection, concatenate all the column values and return only the strings that are unique.

test_db=# SELECT DISTINCT parent_id, child_id, id FROM test.foo_table ORDER BY parent_id, child_id, id LIMIT 10;
parent_id | child_id | id
-----------+------------+-----------------------------
1000040 | 103 | 1000040|2645405726|0001|103
@snurhussein
snurhussein / PSPPConvert.md
Last active October 31, 2021 23:49
Using PSPP to convert sav files to csv

#Converting SPSS files to csv with PSPP#

Install and open PSPP Use the File menu to open your file (it probably has a .sav extension) Go to File>New>Syntax to open PSPP's command line window

Enter:

@thomasjungblut
thomasjungblut / gist.R
Last active November 14, 2019 04:57
XGBoost Validation and Early Stopping in R
train <- read.csv("train.csv")
bound <- floor(nrow(train) * 0.9)
train <- train[sample(nrow(train)), ]
df.train <- train[1:bound, ]
df.validation <- train[(bound+1):nrow(train), ]
train.y <- df.train$TARGET
validation.y <- df.validation$TARGET
dtrain <- xgb.DMatrix(data=df.train, label=train.y)
@whophil
whophil / jupyter-style.ipynb
Last active November 27, 2021 09:40
Pretty style for Jupyter notebooks using Google web-fonts. Apply to all your notebooks using %run magic.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@aakansh9
aakansh9 / xgboost_extra.R
Last active January 27, 2020 20:55
Xgboost cross validation functions for time series data + gridsearch functions in R
# CV based on general traininds and testinds list
# useful for time-series based split
xgb.ts.cv <- function (params = list(), data, nrounds, nfold, label = NULL,
missing = NULL, prediction = FALSE, showsd = TRUE, metrics = list(),
obj = NULL, feval = NULL, stratified = TRUE, folds = NULL,
verbose = T, print.every.n = 1L, early.stop.round = NULL,
maximize = NULL, traininds, testinds, ...)
{
if (typeof(params) != "list") {
@ispmarin
ispmarin / confusion_matrix_spark.py
Created June 3, 2016 14:26
Confusion Matrix, precision and recall check for PySpark
rdd = sc.parallelize(
[
(0., 1.),
(0., 0.),
(0., 0.),
(1., 1.),
(1.,0.),
(1.,0.),
(1.,1.),
(1.,1.)
@eddies
eddies / setup-notes.md
Created July 29, 2016 08:00
Spark 2.0.0 and Hadoop 2.7 with s3a setup

Standalone Spark 2.0.0 with s3

###Tested with:

  • Spark 2.0.0 pre-built for Hadoop 2.7
  • Mac OS X 10.11
  • Python 3.5.2

Goal

Use s3 within pyspark with minimal hassle.