- df.dtypes : lists the type of each column in the dataframe (no parenthesis)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ds %>% group_by(group1, group2) %>% | |
summarise( | |
summary_value = some_function | |
) %>% arrange(desc(summary_value)) %>% group_by(group1) %>% | |
mutate(rank=row_number()) | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
license: gpl-3.0 | |
height: 500 | |
border: yes |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
plot_df <-df %>% group_by(feature) %>% | |
do( | |
plots = ggplot(data = .) + aes(x = xcol, y = ycol) + | |
geom_point() + ggtitle(.$feature) | |
) | |
# show plots | |
plot_df$plots |
- Test for normality:
- Shapiro-Wilk: Null Hypothesis is that the data is normally distributed. If p-value below alpha (0.05 or whatever significance you are looking for), null hypothesis is rejected (data is non-normal)
- When testing with large samples (test is biased by sample size - will be statistically significant at large sample size) accompany test with a Q-Q plot
- Anderson-Darling
- Comparison on distributions (no assumption of normality)
- Kolmogorov-Smirnov test
- Compares CDF's of two sample sets - D value close to 1 indicates distributions are different, close to 0 distributions are close to one another
- Wilcoxon’s signed-rank test
- Compares medians from two sample sets
- Kolmogorov-Smirnov test
- Mann-Whitney U Test: Similar to Wilcoxon, but samples don't have to be paired
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# ds has columns A, B, C, - group by A, then use B and C as inputs in the | |
# MSE calculation | |
grouped = ds.groupby('A') | |
mse = grouped.apply( lambda x: metrics.mean_squared_error(x['B'], x['C'])) |
- save_as_text : don't use this unless you just want to read the text in the file. Otherwise it will cause issues if you want to go back later and revise/filter the dictionary
- If you choose to import a dictionary then alter it, the corpus must also be updated as outlined here - Q8
- You have to limit the number of features in large datasets otherwise the memory consumption is huge
- This is regardless of weather the corpus is loaded in RAM or serialized
- Iterations argument - refers to the number of iterations in the EM step
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
------------------------------------------------------------------ | |
-- alter column name | |
ALTER TABLE `xyz` CHANGE `manufacurerid` `manufacturerid` INT; | |
------------------------------------------------------------------ | |
-- export database | |
------------------------------------------------------------------ | |
mysqldump db table > filename.out | |
------------------------------------------------------------------ | |
-- import database |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
license: gpl-3.0 | |
height: 500 | |
scrolling: no | |
border: no |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
license: gpl-3.0 | |
height: 500 | |
scrolling: no | |
border: no |