What exactly is a DW
A dW is a subject-oriented, integrated, time variant and non –volatile collection in support of management’s decision making process.
Subject Oriented
A dW is organized around major subjects of customer, suppliers, products and sales, it excludes data that is not useful in decision making process
Integrated
A dW is usually constructed by integrating multiple records eg. Relation dB, flat-files
Time-Variant
Provides information from a historical perspective ( past 5-10 years)
Non –Volatile
Does not require transaction processing, recovery and concurrency control. It requires initial loading and access.
What is data Warehousing
It is the process of constructing and using data warehouses.
How are organizations using information from dW
- Increasing customer focus
o Analysis of customer buying patterns
- Repositioning Products and Managing Product Portfolios
o Compare the performance of sales by quarter or territory.
- Analysing operations and looking for sources of profit
- Managing customer relationships
- Making environmental corrections
- Managing the cost of corporate assets
DWing is useful from the Point of View of heterogeneous database integration
- to integrate data from diverse databases.
It is popular because
- Update driven rather than query driven approach
o Information is integrated in advance and stored in warehouse for direct quering and analysis.
o Brings high performance to the integrated dB system
o Query processing in DW does not interfere with processing at local store
o DW can store and integrate historical information and support multidimensional queries.
Different Between Operational dB system and DW
ODBS: Major task: To perform online transaction and query processing- these are called on-line transaction processing (OLTP) systems.
- Cover most of day to day operations of organization eg. Purchasing, inventory etc.
DW- Serve users or knowledge workers in the role of data analysis and decision making. These are known as Online Analytical Processing (OLTP) systems.
Why not perform OLAP directly on operational dB rather than having a DW
- An Operational dB is designed for day-to-day operation. DW queries are more complex.
- In OP. dB concurrency control mechanism are required. But we need concurrency in DB to process the queries concurrently.
- Decision support systems require historical data whereas Op. Database do not maintain historical data
- Op. dB contain only raw data such as transactions which need to be consolidated before analysis.
But such Op. databases are coming which also support data warehousing.
Data Warehousing Architecture
A dW is a subject-oriented, integrated, time variant and non –volatile collection in support of management’s decision making process.
Subject Oriented
A dW is organized around major subjects of customer, suppliers, products and sales, it excludes data that is not useful in decision making process
Integrated
A dW is usually constructed by integrating multiple records eg. Relation dB, flat-files
Time-Variant
Provides information from a historical perspective ( past 5-10 years)
Non –Volatile
Does not require transaction processing, recovery and concurrency control. It requires initial loading and access.
What is data Warehousing
It is the process of constructing and using data warehouses.
How are organizations using information from dW
- Increasing customer focus
o Analysis of customer buying patterns
- Repositioning Products and Managing Product Portfolios
o Compare the performance of sales by quarter or territory.
- Analysing operations and looking for sources of profit
- Managing customer relationships
- Making environmental corrections
- Managing the cost of corporate assets
DWing is useful from the Point of View of heterogeneous database integration
- to integrate data from diverse databases.
It is popular because
- Update driven rather than query driven approach
o Information is integrated in advance and stored in warehouse for direct quering and analysis.
o Brings high performance to the integrated dB system
o Query processing in DW does not interfere with processing at local store
o DW can store and integrate historical information and support multidimensional queries.
Different Between Operational dB system and DW
ODBS: Major task: To perform online transaction and query processing- these are called on-line transaction processing (OLTP) systems.
- Cover most of day to day operations of organization eg. Purchasing, inventory etc.
DW- Serve users or knowledge workers in the role of data analysis and decision making. These are known as Online Analytical Processing (OLTP) systems.
Why not perform OLAP directly on operational dB rather than having a DW
- An Operational dB is designed for day-to-day operation. DW queries are more complex.
- In OP. dB concurrency control mechanism are required. But we need concurrency in DB to process the queries concurrently.
- Decision support systems require historical data whereas Op. Database do not maintain historical data
- Op. dB contain only raw data such as transactions which need to be consolidated before analysis.
But such Op. databases are coming which also support data warehousing.
Data Warehousing Architecture
Views regarding design of dW
Top-down View: It allows selection of relevant information necessary for dW.
Data source View: Exposes the info. Being captured, stored and managed by operational systems.
Data Warehouse View: It includes fact tables and dimension tables. It represents the info that is stored inside the dW.
Business Query View: Perspective of data in dW from the POV of the end user.
Skills required in Building a dW
Business skills: How the system stores and manages the data. How to build extractors- transfer data from operational system to dW. How to build refresh software- to update the dW data.
Technology Skills: How to make assessment from quantitative information
Program Management Skills: Need to interface with many technologies vendors and end users.
Process of DW design
See the figure
Steps in WH design process
Choose a business process to model eg. Orders, invoices, sales etc.
Choose the grain of the business process ( grain: fundamental level of data to be represented in the fact table) ( a fact table is a large central table containing the bulk of the data with no redundancy)
Choose the dimension that will apply to each fact table. Record eg. Time, item, customer, supplier.
See figure:
Choose the measures that will populate each fact table. Record eg. Dollars_sold and units_sold