Skip to content

Instantly share code, notes, and snippets.

@tsilvs
Created April 14, 2025 17:34
Show Gist options
  • Save tsilvs/9257194850ffaf1643a8aa66843e546a to your computer and use it in GitHub Desktop.
Save tsilvs/9257194850ffaf1643a8aa66843e546a to your computer and use it in GitHub Desktop.
Data Analytics Notes

Intro

  • Data analytics - examining data to uncover:
    • patterns
    • correlations
    • insights
  • Helps organizations:
    • Optimize operations
    • Predict trends
    • Improve performance
  • Crucial for:
    • strategies
    • outcomes

Core Concepts

  • Data Types:
    • Structured: in predefined formats, e.g.:
      • tables
      • spreadsheets
    • Unstructured: not organized, e.g.:
      • text documents
      • images
  • Data Sources:
    • Internal:
      • Company DBs
      • CRM systems
      • financial records
    • External:
      • Public datasets
      • social media
      • APIs
  • Data timings:
    • Historical
    • Real-time

Analysis Types

Type Object Method Reason
Descriptive Historical Data Summary Understand past events
Diagnostic Historical Data Statistical analysis and modeling Explain why events occurred
Predictive Historical Data Machine learning, Regression analysis Forecast future trends
Prescriptive Current Data Predictive Insights, Optimization Algorithms Recommend actions
Exploratory All data
Type Subtype Name Description Peculiarities
Calc Central Tendency Mean average of all values Skewed by outliers
Calc Central Tendency Median middle value form sorted in asc / desc Less affected by outliers.
Calc Central Tendency Mode most frequent value
Calc Dispersion Variance (deviation) average of squared differences from mean, median or mode
Calc Dispersion Standard Deviation square root of variance
Name Description
Histograms Display the frequency distribution of a dataset by dividing the range of values into bins and counting the number of data points in each bin. Useful for understanding the distribution and identifying skewness.
Box Plots Summarize data using a five-number summary (minimum, first quartile, median, third quartile, maximum) and display outliers. They help identify data spread and outliers.
Scatter Plots Plot data points on a Cartesian plane to examine the relationship between two variables. Useful for identifying correlations, trends, and potential outliers.

Patterns and Trends Identification

Uncover insights

Category Subcategory Example Usage
Visualizations charts scatter plots reveal correlations
Statistical Methods statistical techniques regression analysis model relationships

Soft Skills

Type Name Purpose
Analysis Critical thinking identify patterns and trends
Analysis Problem solving
Communication Presenting clarity and efficiency
Communication Narratives data-driven compelling

Hard skills

Type Subtype Name Purpose
Approach Descriptive Dashboards Real-time reports
Approach Descriptive Data Aggregation Explain events
Approach Descriptive Reports Explain
Approach Descriptive Visualizations Presentation
Approach Diagnostic Statistical Analysis Explain causation
Approach Diagnostic Statistical Modeling Explain causation
Approach Predictive Machine Learning Forecastsing
Approach Predictive Regression Analysis Forecastsing
Approach Prescriptive Optimization Algorithms Recommendations
Approach Prescriptive Predictive Insights Recommendations
Tool Big Data Hadoop Processing
Tool Big Data Spark Processing
Tool Cloud Aws Redshift Scalability
Tool Cloud Google Bigquery Scalability
Tool ETL Informatica Integration
Tool ETL Talend Integration
Tool Lang Python
Tool Lang R
Tool Lang SQL DDL, DML
Tool Lib Ggplot2
Tool Lib Pandas
Tool Viz Power Bi Dashboards
Tool Viz Tableau Dashboards

Fields

Retail

Purposes

Inventory Optimization

Experience Personalization

Finance

Purposes

Risk Management

Fraud Detection

Investment Optimization

Careers

Role Activity Result Responcibility
Data Engineer Build, maintain data pipelines, data architectures data quality, data availability
Data Analyst Analyze Reports support for business decisions
Data Scientist Develop models, algorithms problem solutions
BI Analyst Report dashboards business decision-making support
Analyst Consultant Advice data collection and analysis practices data issues solutions
Data Ethicist Ensure data collection and analysis practices ethical standards compliance
Data Researcher Research data science technologies Advancement
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment