- Test for normality:
- Shapiro-Wilk: Null Hypothesis is that the data is normally distributed. If p-value below alpha (0.05 or whatever significance you are looking for), null hypothesis is rejected (data is non-normal)
- When testing with large samples (test is biased by sample size - will be statistically significant at large sample size) accompany test with a Q-Q plot
- Anderson-Darling
- Comparison on distributions (no assumption of normality)
- Kolmogorov-Smirnov test
- Compares CDF's of two sample sets - D value close to 1 indicates distributions are different, close to 0 distributions are close to one another
- Wilcoxon’s signed-rank test
- Compares medians from two sample sets
- Mann-Whitney U Test: Similar to Wilcoxon, but samples don't have to be paired
- Null Hypothesis: Both groups have the same distribution
- If U-value is close to 1, medians are very different, if the medians are similar U will be close to n1*n2/2 where n1 and n2 are the number of points in dataset 1 and 2
- Permutation Tests: can be used to apply a number of different comparison metrics. Continually resamples the data, shuffling labels to produce an empirical comparison with the metric applied to the real labeled set.
- Kolmogorov-Smirnov test
- Zipf's Law/Zipfian Distribution
- Most frequently used word occurs 2x's as much as 2nd most frequent, 3x's as much as 3rd most frequent, etc.
- Distibution of words in a corpus
- Distance between 2 probability distributions
- Jensen-Shannon divergence
- Difference between statistical and probabilistic inference:
- Statistical Inference: Estimate parameters/underlying distribution based on analysis of data
- Probabilistic Inference: Computing joint or marginal distributions based on a known distribution
Last active
July 22, 2016 20:38
-
-
Save ctufts/6e8e23680f72c05cacbc8b3431578673 to your computer and use it in GitHub Desktop.
General notes about statistics (distributions, tests, etc.)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment