Skip to content

Instantly share code, notes, and snippets.

@ricklentz
Created June 3, 2017 02:39
Show Gist options
  • Save ricklentz/620424c9fa2cad7e4337a1997ca9eb78 to your computer and use it in GitHub Desktop.
Save ricklentz/620424c9fa2cad7e4337a1997ca9eb78 to your computer and use it in GitHub Desktop.
Strategic Information Notes
make decisions to be competitive
attributes:
integrated (view of the global state)
truthful (accurate and consistent with business rules)
accessable (inforamtion consumers)
credible (no contradictions)
timely (when needed)
reasons for failure in early decision support systems:
lack flexibility to adapt
sourced from operation data
definitions:
Data Warehouse is a stand-alone information environment that provides strategic information in a timely and flexible manner (consolidates historical data for analysis)
def 2: re: W.H. Inmon, "A data warehouse is a subject-oriented, integrated, time-variant, and nonvoatile collection of data in support of management's decision-making process"
def 3: Raloh Kimball, A copy of transaction data specifically structured for query and analysis
Data Warehousing refers to the set of technologies, methodologies and activities involved in buidling, exploiting and maintaining the data warehouse (process of constructing data warehouses)
Business Intelligence - enterprise-wide complex of technologies and methodolgies to provide strategic inforamtion from relational data stored in a data warehouse
Big Data - expands BI in two dimensions (non-structured data), penetrates all areas of human activity (business, government, science, social...
Why is a Data Warehouse...
subject-oriented?
decision maker centric, modeling and analysis activites support subject based organization of information
contrasts against systems that perform modeling and analysis for daily operations or transaction processing
integrated?
integrated multiple, heterogeneous data sources (flat files, databases, web sources)
may need to be cleaned and integrated to get
consistent naming conventions and structure
converted and transformed
time-variant?
warehouse time horizon is significantly longer than that of operational systems
v.s. operational database: current data
provide information from a historical perspective E.g. 10 years
every key structure (explicity or implicitly) contains an element of time indicating when data was collected
not required of data in operational database, simply optional
non-volatile:
physically separate store of data
only two operations needed (initial loading, retrieval)
data is read only, not updated
no transactional databases (therefore no need for recovery or concurrency control mechanisms)
Operational vs Warehouse differences:
Ops are most granular
Warehouse can have multiple levels of granularity
once staged in the warehouse, it is validated, reformatted, reorganized, summarized, restructured, and supplemented with other data
result is main source of information for report generation, analysis, and presentation through ad hoc reports, portals, and dashboards
Difference between Data Marts and Data Warehouse:
D.W. is enterprise-wide, multiple business processes
D.M. one department, single business process
Types of Data Warehouses:
Centralized - Enterprise-level information stored at the lowest level of detail in single centralized structure
Independent Data Marts - Department data stored independently in own schema. Data definitions and standards not unified
Hub-and-Spoke - Consists of enterprise-level data warehouse and set of DMs that depend on it
Data-Mart Bus - Logically integrated DMs that share common dimensions
Data Warehouse structure (components):
Sources for incoming warehouse data
Data staging: extract, transform (clean, standardize, reorganize), and load
Data Storage: read-only separate from operational systems (relational or proprietary multidimensional database)
Information Delivery (reports, query tools, multidimensional analysis tools, executive information systems and data mining tools)
Metadata: database catalogs
Management and Control: Data movement and components (by using the metadata component)
Metadata in the Data Warehouse:
Operational - info to track data to its operational source
ETL - what process was used to extract and transform the data
End User - navigational map for the end-user
Data Warehouse:
Mainstream technology
Facilitates expansion
Vendor solutions and products
Data Warehouse Trends:
Real-time, zero latency refresh time
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment