Chris Kennedy ck37

Running Jupyter on a Supercomputer

This quick guide for getting a Jupyter Notebook up and running on Bridges, a supercomputer managed by the Pittsburgh Supercomputing Center. Bridges is a new machine designed to accommodate non-traditional uses of High Performance Computing (HPC) resources like data science and digital humanities. Bridges is available through XSEDE, which is the system that manages access to multiple supercomputing resources. Through XSEDE, Bridges is available researchers or educators at US academic or non-profit research institutions (see the XSEDE eligibility policies) Allocations are free, but there is a somewhat difficult to understand application process filled with jargon and acronyms that take time to understand. See the XSEDE getting started guide for more information about getting acc

	render_with_jobs <- function(){
	rstudioapi::verifyAvailable()
	jobs_file <- tempfile(tmpdir = "/tmp", fileext = ".R")
	rmd_to_render <- rstudioapi::selectFile(caption = "Choose an Rmd file...",
	filter = "Rmd files (*.Rmd)")
	if (is.null(rmd_to_render)){
	stop("You must choose an Rmd file to proceed!")
	}
	cat(paste0('rmarkdown::render("', rmd_to_render, '")'), file = jobs_file)
	rstudioapi::jobRunScript(path = jobs_file,

	from sklearn.metrics import roc_auc_score
	from math import sqrt

	def roc_auc_ci(y_true, y_score, positive=1):
	AUC = roc_auc_score(y_true, y_score)
	N1 = sum(y_true == positive)
	N2 = sum(y_true != positive)
	Q1 = AUC / (2 - AUC)
	Q2 = 2AUC*2 / (1 + AUC)
	SE_AUC = sqrt((AUC(1 - AUC) + (N1 - 1)(Q1 - AUC*2) + (N2 - 1)(Q2 - AUC*2)) / (N1N2))

	#!/bin/bash

	# on local
	pempath="$1"
	ec2target="$2"

	ssh -T -i "${pempath}" -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no ubuntu@${ec2target} << 'EOBLOCK'
	# on remote machine
	sudo apt-get -y update
	sudo apt-get -y upgrade

	### ------- Load Packages ---------- ###
	library("purrr")
	library("future")
	library("dplyr")
	library("randomForest")
	library("rsample")
	library("ggplot2")
	library("viridis")
	### ------- Helper Functions for map() ---------- ###
	# breaks CV splits into train (analysis) and test (assessmnet) sets

	# Set API Key
	Sys.setenv(SIGOPT_API_TOKEN="HERE")

	# Start a local H2O cluster for training models
	library(h2o)
	h2o.init(nthreads = -1)

	# Load a dataset
	data(iris)
	y <- "Species"

	I followed these two blogs to install server
	1. http://koo.fi/blog/2015/03/19/openstreetmap-nominatim-server-for-geocoding/#Compile_Nominatim
	This explains ( and is the main blog which I followed) various steps
	2. https://www.linuxbabe.com/linux-server/openstreetmap-tile-server-ubuntu-16-04
	This explains how to setup swap files and install tiles if needed.

	I will use Ubuntu 14.04 LTS as the platform. Just a basic install with ssh server. We will install Apache to serve http requests. Make sure you have enough disk space and RAM to hold the data and serve it efficiently. I used the Finland extract, which was about a 200 MB download. The resulting database was 26 GB after importing, indexing and adding Wikipedia data. The Wikipedia data probably actually took more disk space than the OSM data. My server has 4 GB RAM, which seems to be enough for this small data set.

	1. Sofware requirements

	#plotly box or lasso select linked to
	# DT data table
	# using Wage data
	# the out group: is sex:Male, region:Middle Atlantic +


	library(ggplot2)
	library(plotly)
	library(dplyr)
	library(ISLR)