What is the meaning of a data warehouse?
A data warehouse is a data repository which is used to store large quantities of historical data that are mostly used for creating reports that help businesses to identify their weaknesses and strengths. For example, demographics data about people of a region can help businesses identify the needs and preferences of the region’s population. A data warehouse acts as a central storage for storing data that comes from one or more heterogenous sources. It also consists of data which can be old and of historical value. One of the most important applications of this type of system is that it is used for supporting business intelligence and data analytics applications.
Introduction to data warehouse
The data warehouse is defined as the storage for huge volumes of data. This data is mostly transactional data which is used for taking critical business decisions. The data in a data warehouse is derived from various sources and data analytics techniques are applied on those data to gather useful insights from the raw data.
The data stored in the data warehouse is collected from the sales, marketing, finance, product management and various other departments of an organization. The transactional data which consists of day-to-day online transactions is also stored in the data warehouse. All this data can be used for analysis.
Therefore, data warehousing acts as a back-end engine for business intelligence tools, which show the reports and dashboards for business users. Thus, data warehouse is used in domains such as banking and insurance, marketing, health care, e-commerce and so forth.
The common data warehouse characteristics which are set by William Inmon are defined as follows.
- Integrated: The data warehouse stores the data from various disparate sources into a logical format and at a single storage location.
- Subject-oriented: The data warehouse data pertains to a single subject such as sales, customer, order and so forth.
- Non-volatile: The data stored corresponds to archived data which is not updated on real-time like transactional data.
- Time-Variant: The data stored in the data warehouse has a time stamp attached to it. It corresponds to the data collected over several years.
Key characteristics of data warehouse
- The data stored in a data warehouse is structured for easy access and high-speed performance.
- The data in a data warehouse is present in large volumes and it consists of large amounts of historical data.
- Ad Hoc and predefined queries are most commonly answered using data available in data warehouses.
- It helps in the decision making process.
- It gives better insights to the current business scenario and offers guidelines for future scope.
OLTP vs data warehousing environment
The major differences between On-Line Transactional Processing (OLTP) performed on traditional database systems and Online Analytical Processing (OLAP) performed on data warehouse are listed below.
- Workload: Data analysis and ad hoc queries are accommodated by the data warehouse. The data warehouse is optimized to perform various queries and analytical operations whereas, OLTP system only supports predefined operations.
- Data modifications: In OLTP systems content of the database is updated often by issuing SQL queries whereas in a data warehouse the contents are not updated often since it acts as a repository of historical data.
- Schema design: A de-normalized schema is used by the data warehouse to optimize queries and to perform analytics whereas, in the OLTP system, fully normalized schemas are used to optimize, insert, delete, and update performance.
- Historical data: Data warehouse supports reporting and analyzing on historical data whereas, in the OLTP system only the recent transactional data are stored.
Data warehousing application types
Business intelligence (BI) applications are catered by the data warehouse. The following are the types of data warehousing applications:
- Information processing.
- Analytical processing.
- Data mining.
Information processing
This data present in the data warehouse is processed by using the well known data analytics and statistical techniques and the final results are communicated to business users in the form of charts, tables, graphs, or reports.
Analytical processing
Data in a data warehouse is represented in the form of multi-dimensional data cube. The following are the operations which can be performed on the cube.
(i) Slice-and-dice: It is used to select a single value for any of the dimensions. For example, it can be used to determine the sales of various products in various regions in the year 2010.
(ii) Drill-down: The data in a data warehouse are stored in multiple levels of abstraction. It is used view that data at a more detailed level.
For example, the sales data can be stored at country-levelregional levelstate-leveldistrict levelstore level along the location dimension. Drill-down operation can be used to view sales from state level to store level by moving down the hierarchy.
(iii) Roll-up: The opposite of drill-down is a roll-up. Roll-up is used view data at a higher level of abstraction. The data is aggregated by moving up the concept hierarchy. For example, sales data can be viewed from state level to country level.
(iv) Pivot: In pivoting, the dimensional data is analyzed by rotating the cube. For example, the row dimension is often modified to the column dimension and vice-versa.
Data mining
Data mining is used to derive useful insights by applying various descriptive and predictive modelling techniques on the data stored in the data warehouse. It is also known as Knowledge Discovery in Database (KDD).
Data mining drives data with its results and past association to forecast the future. Therefore, data mining is data-driven and not user-driven. Data is discovered with the help of association, hidden patterns, predictions and classification.
Multi-dimensional data model and schemas
Data in a data warehouse is modelled in the form of data cube. It stores precomputed data which helps in Online Analytical Processing (OLAP). The data can be stored at different levels of abstraction along each dimension. The various abstraction levels available along each dimension is represented by concept hierarchy. For example, concept hierarchy for time dimension consists of levels such as year -> Quarter ->month -> week - >day and so forth.
The three schemas used to organize data in a data warehouse are listed below:
- Star schema - It consists of a single fact table at the center and many dimension tables arranged in a radial pattern around the fact table. It is used when data is gathered with respect to a single subject.
- Snowflake schema - It consists of single fact table but more than one dimension table for a particular dimension. The dimension tables are normalized and represented as more than one table.
- Fact - constellation schema - It consists of multiple fact tables and several dimension tables related to the fact tables. It is used when data is gathered around more than one subject.
Data warehousing benefits
- Helps organization to make informed decisions.
- Increases ROI.
- Provides visualization which helps in easy interpretation.
- Maintains historical data.
Data warehousing disadvantages
- Creating a data warehouse is a tedious task.
- It incurs huge maintenance cost.
- Data warehouse admin must be a skilled professional.
- Data integration is difficult.
Context and Applications
This topic is important for postgraduate and undergraduate courses, particularly for, Bachelors in Computer Science Engineering, and Associate of Science in Computer Science.
Practice Problems
Question 1: Which of the following correspond to the operation carried out in a data warehouse?
- Data mining
- Analytical processing
- Transaction processing
- All the above
Answer: Option D is correct.
Explanation: Analytical processing, information processing, and data mining are the 3 applications of the data warehouse.
Question 2: A data warehouse does not require recovery, concurrency controls, and transaction processing.
- Cannot say
- True
- False
- Can be True or False
Answer: Option B is correct.
Explanation: A data warehouse does not require recovery, concurrency control and transaction processing mechanism because they are stored physically but isolated from the operational database.
Question 3: _____________ supports knowledge discovery by finding constructing analytical models, hidden patterns, performing classification, prediction, and associations.
- Information processing
- Analytical processing
- Data mining
- None of these
Answer: Option C is correct.
Explanation: Data mining supports knowledge discovery by finding constructing analytical models, hidden patterns, performing classification, prediction, and associations. Visualization tools are used to present the mining results.
Question 4: Are data transformation and data cleaning are the major steps in improving the quality of data and data mining results?
- False
- True
- Can be True or False
- Cannot say
Answer: Option B is correct.
Explanation: The given statement is true that both (data transformation & data cleaning) are vital steps to improve the quality of data and data mining results.
Question 5: Which of the following are schemas used in a data warehouse?
- Star
- Snowflake
- Fact-constellation
- All the above
Answer: Option D is correct.
Explanation: Data in a data warehouse is organized using star, snowflake or fact-constellation schema.
Want more help with your computer science homework?
*Response times may vary by subject and question complexity. Median response time is 34 minutes for paid subscribers and may be longer for promotional offers.
Search. Solve. Succeed!
Study smarter access to millions of step-by step textbook solutions, our Q&A library, and AI powered Math Solver. Plus, you get 30 questions to ask an expert each month.
Fundamentals of Datawarehouse Homework Questions from Fellow Students
Browse our recently answered Fundamentals of Datawarehouse homework questions.
Search. Solve. Succeed!
Study smarter access to millions of step-by step textbook solutions, our Q&A library, and AI powered Math Solver. Plus, you get 30 questions to ask an expert each month.