- Data analytics - examining data to uncover:
- patterns
- correlations
- insights
- Helps organizations:
- Optimize operations
- Predict trends
- Improve performance
- Crucial for:
- strategies
- outcomes
- Data Types:
- Structured: in predefined formats, e.g.:
- tables
- spreadsheets
- Unstructured: not organized, e.g.:
- text documents
- images
- Structured: in predefined formats, e.g.:
- Data Sources:
- Internal:
- Company DBs
- CRM systems
- financial records
- External:
- Public datasets
- social media
- APIs
- Internal:
- Data timings:
- Historical
- Real-time
Type | Object | Method | Reason |
---|---|---|---|
Descriptive | Historical Data | Summary | Understand past events |
Diagnostic | Historical Data | Statistical analysis and modeling | Explain why events occurred |
Predictive | Historical Data | Machine learning, Regression analysis | Forecast future trends |
Prescriptive | Current Data | Predictive Insights, Optimization Algorithms | Recommend actions |
Exploratory | All data |
Type | Subtype | Name | Description | Peculiarities |
---|---|---|---|---|
Calc | Central Tendency | Mean | average of all values | Skewed by outliers |
Calc | Central Tendency | Median | middle value form sorted in asc / desc | Less affected by outliers. |
Calc | Central Tendency | Mode | most frequent value | |
Calc | Dispersion | Variance (deviation) | average of squared differences from mean, median or mode | |
Calc | Dispersion | Standard Deviation | square root of variance |
Name | Description |
---|---|
Histograms | Display the frequency distribution of a dataset by dividing the range of values into bins and counting the number of data points in each bin. Useful for understanding the distribution and identifying skewness. |
Box Plots | Summarize data using a five-number summary (minimum, first quartile, median, third quartile, maximum) and display outliers. They help identify data spread and outliers. |
Scatter Plots | Plot data points on a Cartesian plane to examine the relationship between two variables. Useful for identifying correlations, trends, and potential outliers. |
Uncover insights
Category | Subcategory | Example | Usage |
---|---|---|---|
Visualizations | charts | scatter plots | reveal correlations |
Statistical Methods | statistical techniques | regression analysis | model relationships |
Type | Name | Purpose |
---|---|---|
Analysis | Critical thinking | identify patterns and trends |
Analysis | Problem solving | |
Communication | Presenting | clarity and efficiency |
Communication | Narratives | data-driven compelling |
Type | Subtype | Name | Purpose |
---|---|---|---|
Approach | Descriptive | Dashboards | Real-time reports |
Approach | Descriptive | Data Aggregation | Explain events |
Approach | Descriptive | Reports | Explain |
Approach | Descriptive | Visualizations | Presentation |
Approach | Diagnostic | Statistical Analysis | Explain causation |
Approach | Diagnostic | Statistical Modeling | Explain causation |
Approach | Predictive | Machine Learning | Forecastsing |
Approach | Predictive | Regression Analysis | Forecastsing |
Approach | Prescriptive | Optimization Algorithms | Recommendations |
Approach | Prescriptive | Predictive Insights | Recommendations |
Tool | Big Data | Hadoop | Processing |
Tool | Big Data | Spark | Processing |
Tool | Cloud | Aws Redshift | Scalability |
Tool | Cloud | Google Bigquery | Scalability |
Tool | ETL | Informatica | Integration |
Tool | ETL | Talend | Integration |
Tool | Lang | Python | |
Tool | Lang | R | |
Tool | Lang | SQL | DDL, DML |
Tool | Lib | Ggplot2 | |
Tool | Lib | Pandas | |
Tool | Viz | Power Bi | Dashboards |
Tool | Viz | Tableau | Dashboards |
Role | Activity | Result | Responcibility |
---|---|---|---|
Data Engineer | Build, maintain | data pipelines, data architectures | data quality, data availability |
Data Analyst | Analyze | Reports | support for business decisions |
Data Scientist | Develop | models, algorithms | problem solutions |
BI Analyst | Report | dashboards | business decision-making support |
Analyst Consultant | Advice | data collection and analysis practices | data issues solutions |
Data Ethicist | Ensure | data collection and analysis practices | ethical standards compliance |
Data Researcher | Research | data science technologies | Advancement |