- Data Pipelines: Airflow
- MS Excel
- Programming/Tools: Python, PySpark, SQL
- Version Control: Github/Bitbucket
- Data Wrangling and Feature Engineering
- ML
- Regression
- Logistic regression (in Python)
- NLP
- Decision trees
- Classification
- Clustering
- Data preparation techniques
- Boosted algorithms
- Hyperparameter tuning
- Model evaluation metrics
- Presentation & Storytelling/Communication: Tableu, PowerBI, Python libraries
- Analytics and Modeling: analyze data, run tests, and create explanatory models to gather new insights and predict possible outcomes.
- A/B Testing
- Statistics
- To help make recommendations and decisions: maximum likelihood estimators, distributors, and statistical tests
- Tied to ML Algorithms: Calculus and linear algebra
- Descriptive statistics (using Python): mean, median, mode, variance, standard deviation.
- Probability distributions, sample and population, CLT, skewness and kurtosis
- Inferential statistics: hypothesis testing, confidence intervals
- Data Visualization
- Break down complex data into smaller, digestible pieces as well as using a variety of visual aids (charts, graphs, etc.)
- Effectively communicate key messaging and get buy in for proposed solutions
- Big Data: Hadoop, Apache Spark / PySpark
- Deep Learning: CNN, RNN
Last active
November 20, 2021 06:18
-
-
Save mepsrajput/204bd053338f4f7c28c081dc5067193d to your computer and use it in GitHub Desktop.
Data Science skills
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment