Skip to content

Instantly share code, notes, and snippets.

View hannesdatta's full-sized avatar
🚩
https://tilburgsciencehub.com

Hannes Datta hannesdatta

🚩
https://tilburgsciencehub.com
View GitHub Profile
@hannesdatta
hannesdatta / rolling.do
Last active November 21, 2019 11:27
Rolling window merge in Stata
/*
===================================
ROLLING WINDOW AGGREGATION IN STATA
===================================
Problem:
--------
Assume you have a data set with time/datestamps, noting
@hannesdatta
hannesdatta / mongo_chunk.py
Created November 29, 2019 12:07 — forked from kenju254/mongo_chunk.py
Python script that can export Mongo collections of any size into progressive numbered json chunks, type `python mongo_chunk.py --help` to have a list of available options
#!/usr/bin/python
import argparse
from pymongo import MongoClient as Client
from bson import BSON
from bson import json_util
import json
import os
# mongo client
@hannesdatta
hannesdatta / proc_auxilary.R
Created December 2, 2019 13:50
Auxilary functions to report regression results in R
psignstars <- function(x) {
sapply(x, function(p) ifelse(p < .01, "***", ifelse(p < .05, "**", ifelse(p < .1, "*", " "))))
}
# Function to run regression model (#mmix, with spec formula)
regmodel <- function(formula=list(~1+I(country_class=='linc') + as.factor(category) + as.factor(brand)),
dat, model = 'lm') {
lmerctrl = lmerControl(optimizer ="Nelder_Mead", check.conv.singular="ignore")
@hannesdatta
hannesdatta / download_from_dropbox.py
Last active August 14, 2024 07:33
Python script to download entire folder/directory structure from a (shared) Dropbox folder to a local computer
################################################################
# DOWNLOAD ENTIRE FOLDER STRUCTURE FROM DROPBOX TO LOCAL DRIVE #
################################################################
# Instructions:
# (1) install dropbox API using pip
# > pip install dropbox
# (2) Create application to make requests to the Dropbox API
# - Go to: https://dropbox.com/developers/apps
@hannesdatta
hannesdatta / script.R
Last active December 12, 2019 13:56
Customized column names when aggregating in data.table
# PROBLEM:
# I would like to give the new column in the new DT
# the name "mean_price"; however, I cannot figure out how to do this.
# It should be possible but I don't know how.
# Here is someone with a related issue: https://stackoverflow.com/questions/12391950/select-assign-to-data-table-when-variable-names-are-stored-in-a-character-vect
# Do you know how to resolve this issue?
# EXAMPLE:
@hannesdatta
hannesdatta / deprecated-classify-labels.R
Last active July 23, 2020 06:05
Classifying music labels into major- and independent labels
This gist has been replaced by an R package with an updated list of labels.
Get it on GitHub: https://github.com/hannesdatta/musicMetadata
LEGACY CODE
#################################################
# #
# Classify music labels #
# into major labels (Sony, Warner, Universal), #
@hannesdatta
hannesdatta / proc_unitroots.R
Last active February 12, 2020 21:44
Augmented Dickey-Fuller Test / Enders procedure when the data generating process is unknown (e.g., inclusion of deterministic trend, or not)
####################################
# #
# UNIT ROOT TESTS #
# IN THE ABSENCE OF #
# KNOWLEDGE ON THE ACTUAL #
# DATA GENERATING PROCESS #
# #
# Enders 1995, #
# Applied Econometric Time Series #
# pp. 254 - 258 and #
@hannesdatta
hannesdatta / clean_artistnames.R
Created March 24, 2020 15:56
clean clear-text artist names from collaborations and secondary artists
require(stringi)
spelling_variants <- function(x, remove_collabs=F, remove_parentheses=T) {
qualifiers = c(" feat .*", " feat[.].*", " ft.*", " ft[.].*"," featuring.*"," vs[.].*"," vs.*"," versus.*"," with.*","[-].*"," / .*",
"/.*","[|].*", "[[].*[]]", "[)].*", ";.*","[+].*","[&] .*","[&].*",",.*"," and .*", " con .*", " e .*", " et .*",
" x .*")
# remove articles (a, the)
ret = gsub(" a ", "", tolower(str_trim(x)))
@hannesdatta
hannesdatta / bulk-tinyurl-readme.txt
Last active June 12, 2020 08:56
Bulk-transform URLs to short(ened) URLs via tinyurl
# TRANSFORMS A SET OF URLS TO TINY-URLS (SHORTENED URLS)
# adapted from https://www.geeksforgeeks.org/python-url-shortener-using-tinyurl-api/
@hannesdatta
hannesdatta / convert-dates-in-data.table.R
Created September 27, 2020 11:53
Fast conversion of `character` data columns to Date using data.table
# Quick conversion of `character` date columns to Date format using data.table
# fread(..., colClasses = c(date='Date')) is slow for large data sets, especially when
# the number of unique dates is small, but the number of cross-sectional units is large.
# The intuition of this algorithm is to only convert the UNIQUE dates to dates using as.Date,
# and then merging them back to the original data.table.
library(data.table)
data.table.date <- function(dt, datecol) {