Created
June 3, 2017 02:39
-
-
Save ricklentz/620424c9fa2cad7e4337a1997ca9eb78 to your computer and use it in GitHub Desktop.
Strategic Information Notes
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
make decisions to be competitive | |
attributes: | |
integrated (view of the global state) | |
truthful (accurate and consistent with business rules) | |
accessable (inforamtion consumers) | |
credible (no contradictions) | |
timely (when needed) | |
reasons for failure in early decision support systems: | |
lack flexibility to adapt | |
sourced from operation data | |
definitions: | |
Data Warehouse is a stand-alone information environment that provides strategic information in a timely and flexible manner (consolidates historical data for analysis) | |
def 2: re: W.H. Inmon, "A data warehouse is a subject-oriented, integrated, time-variant, and nonvoatile collection of data in support of management's decision-making process" | |
def 3: Raloh Kimball, A copy of transaction data specifically structured for query and analysis | |
Data Warehousing refers to the set of technologies, methodologies and activities involved in buidling, exploiting and maintaining the data warehouse (process of constructing data warehouses) | |
Business Intelligence - enterprise-wide complex of technologies and methodolgies to provide strategic inforamtion from relational data stored in a data warehouse | |
Big Data - expands BI in two dimensions (non-structured data), penetrates all areas of human activity (business, government, science, social... | |
Why is a Data Warehouse... | |
subject-oriented? | |
decision maker centric, modeling and analysis activites support subject based organization of information | |
contrasts against systems that perform modeling and analysis for daily operations or transaction processing | |
integrated? | |
integrated multiple, heterogeneous data sources (flat files, databases, web sources) | |
may need to be cleaned and integrated to get | |
consistent naming conventions and structure | |
converted and transformed | |
time-variant? | |
warehouse time horizon is significantly longer than that of operational systems | |
v.s. operational database: current data | |
provide information from a historical perspective E.g. 10 years | |
every key structure (explicity or implicitly) contains an element of time indicating when data was collected | |
not required of data in operational database, simply optional | |
non-volatile: | |
physically separate store of data | |
only two operations needed (initial loading, retrieval) | |
data is read only, not updated | |
no transactional databases (therefore no need for recovery or concurrency control mechanisms) | |
Operational vs Warehouse differences: | |
Ops are most granular | |
Warehouse can have multiple levels of granularity | |
once staged in the warehouse, it is validated, reformatted, reorganized, summarized, restructured, and supplemented with other data | |
result is main source of information for report generation, analysis, and presentation through ad hoc reports, portals, and dashboards | |
Difference between Data Marts and Data Warehouse: | |
D.W. is enterprise-wide, multiple business processes | |
D.M. one department, single business process | |
Types of Data Warehouses: | |
Centralized - Enterprise-level information stored at the lowest level of detail in single centralized structure | |
Independent Data Marts - Department data stored independently in own schema. Data definitions and standards not unified | |
Hub-and-Spoke - Consists of enterprise-level data warehouse and set of DMs that depend on it | |
Data-Mart Bus - Logically integrated DMs that share common dimensions | |
Data Warehouse structure (components): | |
Sources for incoming warehouse data | |
Data staging: extract, transform (clean, standardize, reorganize), and load | |
Data Storage: read-only separate from operational systems (relational or proprietary multidimensional database) | |
Information Delivery (reports, query tools, multidimensional analysis tools, executive information systems and data mining tools) | |
Metadata: database catalogs | |
Management and Control: Data movement and components (by using the metadata component) | |
Metadata in the Data Warehouse: | |
Operational - info to track data to its operational source | |
ETL - what process was used to extract and transform the data | |
End User - navigational map for the end-user | |
Data Warehouse: | |
Mainstream technology | |
Facilitates expansion | |
Vendor solutions and products | |
Data Warehouse Trends: | |
Real-time, zero latency refresh time | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment