Skip to content

Instantly share code, notes, and snippets.

@tgjc
tgjc / useful_pandas_snippets.py
Created June 11, 2018 15:18 — forked from bsweger/useful_pandas_snippets.md
Useful Pandas Snippets
# List unique values in a DataFrame column
# h/t @makmanalp for the updated syntax!
df['Column Name'].unique()
# Convert Series datatype to numeric (will error if column has non-numeric values)
# h/t @makmanalp
pd.to_numeric(df['Column Name'])
# Convert Series datatype to numeric, changing non-numeric values to NaN
# h/t @makmanalp for the updated syntax!
@tgjc
tgjc / gist:3f62450ae0f7e54e115eb26a93a05fb3
Created August 17, 2016 04:46
R dplyr: rename variables using string functions
DATA %>% rename_(.dots=setNames(names(.), tolower(gsub("FROM", "TO", names(.)))))
## Taken from: http://stackoverflow.com/questions/30382908/r-dplyr-rename-variables-using-string-functions
@tgjc
tgjc / not_in_lms.R
Created March 18, 2016 02:36
Script to read in HR data and filter using anti-joins
# Notes:
# 1. rename staff number field in each spreadsheet to "emp_id"
# 2. When adding new spreadsheets, convert emp_id class to character to allow anti_join() to work
# 3. if time allows, automate cleaning of column names and drop unused from hr-list.csv
library(dplyr)
library(readr)
@tgjc
tgjc / string-split.txt
Created September 2, 2015 22:40
Why I hate Excel
##Split text in the form "surname, firstname" and concantenate
to (firstname lastname)
##Surname
=LEFT(TEXT,(FIND(",",TEXT)-1))
##First name
=RIGHT(TEXT,LEN(I6) - E6 -1)
##Concantenate
@tgjc
tgjc / gantt.R
Last active August 29, 2015 14:23
Simple R script for parsing a vector of date-times and returning dates only
# Read data and extract start / finish columns
setwd("/Volumes/emr/Testing/")
raw_file <- read.csv("theo_Epic GANTT 20150605.csv")
raw_start <- raw_file[,"Start"]
raw_finish <- raw_file[,"Finish"]
# Parse date-time and revemove times
raw_start2 <- dmy(raw_start)
new_start <- paste(day(raw_start2), month(raw_start2), year(raw_start2),"/")