Home > Topics > Data Mining and Business Intelligence > Design and Development of Data Warehouse

Data Warehouse Design 🏛️📐

Building a Data Warehouse is like building a skyscraper; you need a blueprint (Schema). The way you arrange your tables determines how fast your reports will run.

💎

Fact Table

Center

🌐

Dimensions

Outer

⭐

Star

Schema 1

❄️

Snowflake

Schema 2

Loading stats…

1. The Fact and Dimension Model

In a DW, we divide data into two types to make analysis easier:

Fact Table: The center of the model. It contains the Numbers (Quantifiable data) and "Foreign Keys" that link to dimensions.
- Additive Facts: Numbers that can be summed up (e.g., Total Sales).
- Non-Additive Facts: Numbers that cannot be summed (e.g., Unit Price, Percentage).
- Granularity: The level of detail in the fact table (e.g., "One row per day" vs. "One row per individual sale").
Dimension Table: The surrounding tables. They contain the Context (Descriptive data).
- Contextual Details: Attributes like Brand Name, Color, Customer City, and Date.
- Filtering & Grouping: Dimensions are used to filter facts (e.g., "Show me Sales where Product = 'Laptop'").

2. Star Schema

The simplest and most common design in data warehousing.

Direct Connectivity: One central Fact table connected directly to multiple Dimension tables. There are no "Sub-dimensions."
De-normalized Structure: Because dimensions are not broken down, the system contains redundant data (e.g., the city "Mumbai" might be repeated 1,000 times), but this makes it faster.
Query Performance: Since the computer only has to perform a single "Join" between the Fact and any Dimension, reports generate almost instantly.
Simplicity: It is extremely easy for business users and tools like Power BI to understand and navigate.

3. Snowflake Schema

A more complex, highly organized version of the Star schema.

Normalized Dimensions: In this schema, dimension tables are broken down into smaller sub-tables (e.g., the 'Product' table links to a 'Supplier' table, which links to a 'City' table).
Reduced Redundancy: By normalizing data, you ensure that a city's name is stored only once in the entire database, saving storage space.
Data Integrity: Because data is structured in multiple levels, it is easier to maintain and update without creating inconsistencies.
Complex Joins: The trade-off is speed. To get a simple report, the computer might have to join 5 or 6 tables, which can be slow for massive datasets.

4. Galaxy Schema (Fact Constellation)

For very large companies, a single Fact table isn't enough.

Multiple Fact Tables: A Galaxy schema contains multiple Fact tables (e.g., "Sales Fact" and "Inventory Fact") that share the same Dimension tables.
Shared Dimensions: Both Sales and Inventory might use the same "Time" and "Product" dimensions.
Cross-Functional Analysis: This allows a manager to see how "Sales Trends" are affecting "Warehouse Inventory" levels in one single report.
Complex Architecture: This is the most realistic schema used by global giants like Walmart or Amazon.

Star Schema (Speed)

Simple, 1-level structure.
Very fast querying.
Uses more disk space.
De-normalized data.

Snowflake Schema (Space)

Complex, multi-level.
Slower query speed.
Saves disk space.
Highly normalized data.

Loading comparison…

Summary

Fact Tables store purely numerical data for analysis.
Dimension Tables store descriptions of those numbers.
Star Schema is the gold standard for Speed.
Snowflake Schema is the gold standard for Data Efficiency.

Quiz Time! 🎯

Test Your Knowledge

Question 1 of 5

1. In a Data Warehouse schema, the 'Sales Amount' would be stored in a:

Dimension Table

Fact Table

Logic Table

Backup Table

Loading quiz…