Home > Topics > Data Mining and Business Intelligence > Characteristics of Data Warehouse

Characteristics of Data Warehouse 🏗️🌟

Not every big database is a Data Warehouse. According to the industry standard definition, a DW must have four specific characteristics.


Loading stats…


1. Subject-Oriented

A regular database is Process-Oriented (e.g., "Process a Sale"). A Data Warehouse is Subject-Oriented.

  • Specific Business Topics: It organizes data around subjects like "Product," "Customer," or "Promotion" rather than the functional operations of the company.
  • Irrelevant Data Exclusion: It deliberately excludes data that is not useful for decision-making (e.g., it will store a customer's total annual spend but not their temporary website session cookies).
  • Focused Discovery: By grouping all "Sales" data from different regions into one subject, analysts can compare store performance globally without switching databases.
  • Simplified Access: Users don't need to understand the complex internal logic of 50 different apps; they just look at the "Subject" they care about.

2. Integrated

Integration is the most important characteristic of a Data Warehouse.

  • Consistency of Units: One database might measure distance in "Miles" and another in "Kilometers." The DW integrates them into a single standard (e.g., all Miles).
  • Conflict Resolution: If two source systems have different addresses for the same customer, the DW uses integration rules to decide which one is the "Truth."
  • Uniform Formatting: It ensures that all dates follow the same format (e.g., YYYY-MM-DD) even if the source systems used 10 different styles.
  • Naming Standardization: Different apps might call an item "Price," "Cost," and "MSRP." The DW maps them all to one clear, integrated attribute: "Unit_Price."
  • Breaking Silos: By integrating HR, Sales, and Finance data, the company can finally answer questions like: "Do employees with higher sales targets have higher stress-related leaves?"

3. Time-Variant

Historical data is the heartbeat of a DW. It allows the business to see the "Story" of their data over years.

  • Long-Term Horizon: While a regular DB might delete data after 90 days to save space, a DW keeps it for 5 to 15 years.
  • Snapshot Logic: The DW stores "Snapshots" of data. It knows exactly what your address was in 2018, even if you changed it in 2022.
  • Implicit vs. Explicit Time: Every record in a DW must contain an explicit time element (Date/Time stamp) to identify when that specific data was true.
  • Trend Analysis: Because data remains for years, businesses can perform "Seasonality Analysis"—like predicting that demand for umbrellas will spike 20% every June.
  • Comparative Analysis: Managers can easily compare "Q1 2026 vs Q1 2025" sales because the historical data is perfectly preserved and aligned.

4. Non-Volatile

In a regular database, you are constantly "Updating" and "Deleting" rows. In a DW, you only Read and Insert.

  • Operational Stability: Once data enters the warehouse, it becomes "Permanent." It is never overwritten or updated, even if the source data changes in the production DB.
  • Read-Only Optimization: Since data doesn't change, the system doesn't need "Database Locks" or "Concurrency Controls," making it significantly faster for thousands of people to read at once.
  • Auditability: Because old values aren't deleted, you have a perfect audit trail. You can reconstruct exactly what the business looked like at any point in history.
  • Simplified Recovery: If a mistake is made during an "Analytical Run," the data itself is still safe and stable in its original non-volatile state.
  • Mass Loading: Changes are usually applied in bulk (batches) rather than single-record updates, which preserves the stability of the analytical environment.

Key Insight

Because it is Non-Volatile, a Data Warehouse does not need complex transaction controls (like locks) that regular databases use, making it much faster for reading.


Summary

  • Subject-Oriented: Organized by business topics (Sales, Customers).
  • Integrated: One consistent format for all company data.
  • Time-Variant: Stores history to allow comparison across time.
  • Non-Volatile: Data is stable—no updates or deletions of historical records.

Quiz Time! 🎯

Loading quiz…