Skip to content

Instantly share code, notes, and snippets.

View ck37's full-sized avatar

Chris Kennedy ck37

View GitHub Profile
@sellorm
sellorm / render_with_jobs.R
Created April 15, 2021 20:54
Render an Rmarkdown document in the RStudio jobs pane.
render_with_jobs <- function(){
rstudioapi::verifyAvailable()
jobs_file <- tempfile(tmpdir = "/tmp", fileext = ".R")
rmd_to_render <- rstudioapi::selectFile(caption = "Choose an Rmd file...",
filter = "Rmd files (*.Rmd)")
if (is.null(rmd_to_render)){
stop("You must choose an Rmd file to proceed!")
}
cat(paste0('rmarkdown::render("', rmd_to_render, '")'), file = jobs_file)
rstudioapi::jobRunScript(path = jobs_file,
@aditya-malte
aditya-malte / smallberta_pretraining.ipynb
Created February 22, 2020 13:41
smallBERTa_Pretraining.ipynb
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@doraneko94
doraneko94 / roc_auc_ci.py
Last active January 4, 2025 08:56
Calculating confidence interval of ROC-AUC.
from sklearn.metrics import roc_auc_score
from math import sqrt
def roc_auc_ci(y_true, y_score, positive=1):
AUC = roc_auc_score(y_true, y_score)
N1 = sum(y_true == positive)
N2 = sum(y_true != positive)
Q1 = AUC / (2 - AUC)
Q2 = 2*AUC**2 / (1 + AUC)
SE_AUC = sqrt((AUC*(1 - AUC) + (N1 - 1)*(Q1 - AUC**2) + (N2 - 1)*(Q2 - AUC**2)) / (N1*N2))
@JohnMount
JohnMount / confEc2RServer.bash
Last active September 6, 2021 09:49
Configure an Amazon EC2 instance to serve a tunneled RStudio Server instance (from a Unix client)
#!/bin/bash
# on local
pempath="$1"
ec2target="$2"
ssh -T -i "${pempath}" -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no ubuntu@${ec2target} << 'EOBLOCK'
# on remote machine
sudo apt-get -y update
sudo apt-get -y upgrade
@mrecos
mrecos / Purrr Grid Search Parallel.R
Last active December 24, 2017 20:46
A bit of code for conducting parallelized random grid-search of randomForest hyperparameters using purrr::map() and futures (for multicore/multisession). This is a bit of a proof-of-concept as there are plenty of ways to iterate over a grid and do CV. Also, especially with randomForest, this is very memory inefficient. However, the approach may …
### ------- Load Packages ---------- ###
library("purrr")
library("future")
library("dplyr")
library("randomForest")
library("rsample")
library("ggplot2")
library("viridis")
### ------- Helper Functions for map() ---------- ###
# breaks CV splits into train (analysis) and test (assessmnet) sets
@ledell
ledell / h2o_rf_sigopt_demo_iris.R
Last active June 3, 2017 04:56
Demo of how to use the SigOpt API with H2O in R
# Set API Key
Sys.setenv(SIGOPT_API_TOKEN="HERE")
# Start a local H2O cluster for training models
library(h2o)
h2o.init(nthreads = -1)
# Load a dataset
data(iris)
y <- "Species"
@malkitsingh
malkitsingh / instructions.txt
Created February 2, 2017 17:37
Step by step guide to install nominatim server
I followed these two blogs to install server
1. http://koo.fi/blog/2015/03/19/openstreetmap-nominatim-server-for-geocoding/#Compile_Nominatim
This explains ( and is the main blog which I followed) various steps
2. https://www.linuxbabe.com/linux-server/openstreetmap-tile-server-ubuntu-16-04
This explains how to setup swap files and install tiles if needed.
I will use Ubuntu 14.04 LTS as the platform. Just a basic install with ssh server. We will install Apache to serve http requests. Make sure you have enough disk space and RAM to hold the data and serve it efficiently. I used the Finland extract, which was about a 200 MB download. The resulting database was 26 GB after importing, indexing and adding Wikipedia data. The Wikipedia data probably actually took more disk space than the OSM data. My server has 4 GB RAM, which seems to be enough for this small data set.
1. Sofware requirements
@mcburton
mcburton / jupyter-on-a-supercomputer.md
Last active May 30, 2025 03:47
A short(ish) guide on how to get Jupyter Notebooks up and running on the Bridges supercomputer.

Running Jupyter on a Supercomputer

This quick guide for getting a Jupyter Notebook up and running on Bridges, a supercomputer managed by the Pittsburgh Supercomputing Center. Bridges is a new machine designed to accommodate non-traditional uses of High Performance Computing (HPC) resources like data science and digital humanities. Bridges is available through XSEDE, which is the system that manages access to multiple supercomputing resources. Through XSEDE, Bridges is available researchers or educators at US academic or non-profit research institutions (see the XSEDE eligibility policies) Allocations are free, but there is a somewhat difficult to understand application process filled with jargon and acronyms that take time to understand. See the XSEDE getting started guide for more information about getting acc

@dgrapov
dgrapov / plotly_select_DT.R
Last active September 10, 2020 01:25
ggplot2 to plotly to shiny to box/lasso select to DT
#plotly box or lasso select linked to
# DT data table
# using Wage data
# the out group: is sex:Male, region:Middle Atlantic +
library(ggplot2)
library(plotly)
library(dplyr)
library(ISLR)
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.