- Test for normality:
- Shapiro-Wilk: Null Hypothesis is that the data is normally distributed. If p-value below alpha (0.05 or whatever significance you are looking for), null hypothesis is rejected (data is non-normal)
- When testing with large samples (test is biased by sample size - will be statistically significant at large sample size) accompany test with a Q-Q plot
- Anderson-Darling
- Comparison on distributions (no assumption of normality)
- Kolmogorov-Smirnov test
- Compares CDF's of two sample sets - D value close to 1 indicates distributions are different, close to 0 distributions are close to one another
- Wilcoxon’s signed-rank test
- Compares medians from two sample sets
- Kolmogorov-Smirnov test
- Mann-Whitney U Test: Similar to Wilcoxon, but samples don't have to be paired
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# ds has columns A, B, C, - group by A, then use B and C as inputs in the | |
# MSE calculation | |
grouped = ds.groupby('A') | |
mse = grouped.apply( lambda x: metrics.mean_squared_error(x['B'], x['C'])) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
plot_df <-df %>% group_by(feature) %>% | |
do( | |
plots = ggplot(data = .) + aes(x = xcol, y = ycol) + | |
geom_point() + ggtitle(.$feature) | |
) | |
# show plots | |
plot_df$plots |
- df.dtypes : lists the type of each column in the dataframe (no parenthesis)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
license: gpl-3.0 | |
height: 500 | |
border: yes |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ds %>% group_by(group1, group2) %>% | |
summarise( | |
summary_value = some_function | |
) %>% arrange(desc(summary_value)) %>% group_by(group1) %>% | |
mutate(rank=row_number()) | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
license: gpl-3.0 | |
height: 500 | |
scrolling: no | |
border: no |
- CDC WONDER
- mortality data
- birth data
- environment
- population data
- Pennsylvania State Data Center
- County level data (mostly census data) for PA
- Census Data
- County Adjacency: County adjacency data from the US census bureau
- County Health Rankings
conda info --envs
: lists all environmentssource activate <env name>
: activate an environmentsource deactivate
: deactivate an environmentconda list
: list all packages installedconda create --name <env name> python=3 astroid babel
: create new environment, specify version of python, and install packages- WINDOWS NOTE: SOURCE is not recognized. When deactivating and activating in the anaconda command prompt, skip
source
and just typedeactivate
oractivate
depending on what you are trying to do. conda env export > environment.yml
: export conda environment requirements list to a fileconda env remove -n ENV_NAME
: delete environment
- sudo -i : elevate to super user
- du : get breakdown of memory usage of all subdirectories
- df -h : get breakdown of memory usage on disk
- ls -a : show all files in directory (including hidden files)
- rsync : copy files from one server to another (similar to scp but more functionality)
- Set up rsync with sudo
- rsync -az -e "ssh" --rsync-path="sudo rsync" user@servername:/pulled-source-directory /local-directory/
- rsync [source] [destination]