Tal Yarkoni tyarkoni

Building fun things at MidJourney.

tyarkoni / silly_pie_chart.py

Created April 1, 2018 20:01

code for silly pie chart (by request)

	import matplotlib.pyplot as plt
	import seaborn as sns
	%matplotlib inline

	rows = [
	('Doing research', 50),
	('Having meetings', 6),
	('Begging funding agencies for money so I can keep my job', 3),
	('Doing paperwork', 2),
	('Reviewing papers', 2),

tyarkoni / p_hacked_effect_sizes.py

Created January 25, 2018 23:02

Illustrating the effects of p-hacking on observed effect sizes

	import numpy as np
	from scipy import stats
	import matplotlib.pyplot as plt
	%matplotlib inline

	def run_study(step_size=50, max_n=200, num_tests=10, alpha=0.05):
	''' Run a single study in increments of N until we either (a) achieve
	significance, or (b) hit a maximum sample size. To model p-hacking, we
	conduct num_tests independent tests after each increment of sampling. '''
	X = np.zeros((0, num_tests))

tyarkoni / simulate_matching.py

Created April 3, 2016 15:59

Matching on unreliable variables produces residual confounding

	'''
	A small simulation to demonstrate that matching trials does not solve the
	problem of residual confounding. For description of original problem, see
	http://dx.doi.org/10.1371/journal.pone.0152719
	Here we simulate a situation where we match trials from two conditions that
	differ in Y on an indicator M. By hypothesis, there is no difference in Y in
	the population after controlling for M. But because of measurement error,
	matching on M will, on average, leave a residual mean difference in the Y's.
	Raising the reliability of M (REL_M) will decrease this difference, and setting
	it to 1.0 will eliminate it completely, demonstrating that matching works just

tyarkoni / t1_t2_correlation_sim.py

Last active February 23, 2016 02:15

Simulates correlation between effect sizes of original studies and replication studies

	import numpy as np
	import scipy.stats as ss
	import matplotlib.pyplot as plt

	g1_d_mu = 0.4
	g1_d_sd = 0.4
	prop_null = 0.3
	n_subs = 20
	n_studies = 400

tyarkoni / predict_from_text.py

Last active March 10, 2020 02:10

simple example predicting binary outcome from text features with sklearn

	from sklearn.datasets import fetch_20newsgroups
	from sklearn.feature_extraction.text import TfidfVectorizer
	from sklearn.linear_model import LogisticRegression
	from sklearn.pipeline import Pipeline
	import pandas as pd
	import numpy as np

	# Grab just two categories from the 20 newsgroups dataset
	categories=['sci.space', 'rec.autos']