Home > Topics > Big Data Analysis > HDFS Architecture

HDFS Architecture

1. Definition

HDFS Architecture follows a master-slave pattern consisting of a single NameNode (master) managing cluster metadata and multiple DataNodes (slaves) storing actual data blocks.

2. Core Components

2.1 NameNode (Master)

Role: Manages the file system namespace and regulates client access to files.

Responsibilities:

Namespace Management"Maintains directory tree and file metadata"

↓

Block Mapping"Tracks which blocks belong to which files"

↓

DataNode Coordination"Monitors DataNode health"

↓

Client Requests"Handles file open, close, rename"

↓

Replication"Ensures blocks meet replication factor"

Loading diagram…

Metadata Stored:

File names and directory structure
File-to-block mapping (which blocks belong to which file)
Block-to-DataNode mapping (which DataNodes store which blocks)
File permissions and ownership
Modification timestamps

Important: NameNode does NOT store actual data, only metadata.

2.2 DataNode (Slave)

Role: Stores actual data blocks and serves read/write requests from clients.

Responsibilities:

Store Blocks: Maintain block data on local disk
Serve Requests: Handle read and write operations from clients
Block Reports: Periodically report all blocks to NameNode
Heartbeats: Send heartbeat signals to NameNode every 3 seconds
Execute Commands: Follow instructions from NameNode for replication and deletion

Storage:

DataNode Local Disk:
/hadoop/data/current/
  ├── blk_1073741825 (128 MB)
  ├── blk_1073741825.meta (checksum)
  ├── blk_1073741826 (128 MB)
  ├── blk_1073741826.meta (checksum)
  └── ...

3. Master-Slave Architecture

NameNode (Master)

Count: Single active NameNode
Role: Metadata management
Data: File namespace, block locations
Hardware: High-end server with RAM
Critical: Single point of failure (mitigated by Secondary NameNode)

DataNodes (Slaves)

Count: Multiple (10s to 1000s)
Role: Data storage and retrieval
Data: Actual file blocks
Hardware: Commodity servers with disks
Failure: Individual failures tolerated easily

Loading comparison…

4. Block Management

4.1 Block Creation Process

Workflow:

Client Writes File → NameNode Creates Metadata → 
NameNode Selects DataNodes → Client Writes to DataNode Pipeline → 
DataNodes Replicate Blocks → NameNode Updates Block Mapping

Detailed Steps:

Client Request: Client requests to write file "sales.csv"
NameNode Response: NameNode creates namespace entry, selects 3 DataNodes for first block
Pipeline Write: Client writes to DataNode1 → DataNode1 writes to DataNode2 → DataNode2 writes to DataNode3
Acknowledgment: DataNode3 acknowledges to DataNode2 → DataNode2 to DataNode1 → DataNode1 to Client
Next Block: Process repeats for subsequent blocks
Completion: Client notifies NameNode when file write is complete

4.2 Block Replication Strategy

Rack-Aware Placement:

For 3x replication:

1st replica: Same node as writer (if writer is DataNode) or random
2nd replica: Different rack from 1st replica
3rd replica: Same rack as 2nd replica, different node

Why Different Racks?

🛡️

Survives rack switch failure

Rack Failure Protection

⚡

Can read from multiple racks

Read Performance

📶

2 copies on same rack saves bandwidth

Network Efficiency

🎯

Reliability + Performance

Balance

Loading stats…

Example:

File Block Distribution:
Block 1: 
  - Copy 1: Rack1-DataNode1 (writer location)
  - Copy 2: Rack2-DataNode3 (different rack)
  - Copy 3: Rack2-DataNode4 (same rack as copy 2)

Benefits:
✅ If Rack1 fails, data available on Rack2
✅ If Rack2 switch fails, data available on Rack1
✅ Write bandwidth: Only 1 copy crosses racks

5. Communication Mechanisms

5.1 Heartbeats

Purpose: NameNode monitors DataNode health.

Frequency: Every 3 seconds from each DataNode.

Contents:

DataNode is alive and functioning
Storage capacity (total, used, remaining)
Number of data transfers in progress

Failure Detection:

Normal: Heartbeat received every 3 seconds
Missing: No heartbeat for 10 minutes (600 seconds)
Action: NameNode marks DataNode as dead
Recovery: Re-replicate blocks from dead node

5.2 Block Reports

Purpose: DataNode informs NameNode about all blocks it stores.

Frequency: Every 6 hours (default).

Contents:

List of all block IDs stored on DataNode
Block length and generation stamp
Storage location on disk

Use Cases:

NameNode builds complete block-to-DataNode mapping
Detect missing blocks (corruption)
Identify over-replicated blocks
Verify replication factor compliance

6. Read Operation Architecture

Step-by-Step Process:

1. Client Open File"Contact NameNode to open file"

↓

2. NameNode Response"Return block locations"

↓

3. Client Selects DataNode"Choose nearest DataNode with block"

↓

4. Read from DataNode"Directly read block data"

↓

5. Repeat for Blocks"Continue for all blocks"

↓

6. Close File"Inform NameNode completion"

Loading diagram…

Detailed Example:

Client wants to read "sales.csv" (256 MB = 2 blocks)

Step 1: Client → NameNode: "Open sales.csv for reading"
Step 2: NameNode → Client: 
  Block 1: [DataNode1, DataNode3, DataNode5]
  Block 2: [DataNode2, DataNode4, DataNode6]

Step 3: Client selects DataNode1 (closest) for Block 1
Step 4: Client reads Block 1 from DataNode1
Step 5: Client selects DataNode4 (closest) for Block 2
Step 6: Client reads Block 2 from DataNode4
Step 7: Client assembles file from blocks
Step 8: Client → NameNode: "Close sales.csv"

Data Locality: Client prefers DataNodes on same machine → same rack → different rack.

7. Write Operation Architecture

Step-by-Step Process:

1. Client → NameNode: "Create file sales.csv"
2. NameNode: Creates namespace entry, selects DataNodes
3. NameNode → Client: "Write Block 1 to [DN1, DN2, DN3]"
4. Client → DN1: Write block data
5. DN1 → DN2: Pipeline replication
6. DN2 → DN3: Pipeline replication
7. DN3 → DN2 → DN1 → Client: Acknowledgment
8. Repeat steps 3-7 for remaining blocks
9. Client → NameNode: "Close file"

Pipeline Replication:

Client writes to 1st DataNode only
1st DataNode simultaneously:
  - Writes to local disk
  - Forwards to 2nd DataNode

2nd DataNode simultaneously:
  - Writes to local disk
  - Forwards to 3rd DataNode

3rd DataNode:
  - Writes to local disk
  - Sends acknowledgment back

Result: All 3 copies created in parallel
Efficiency: Client writes once, network handles replication

8. Fault Tolerance Mechanisms

8.1 DataNode Failure

Scenario: DataNode crashes or becomes unreachable.

Detection & Recovery:

Detection: NameNode notices missing heartbeats (10 minutes)
Marking: DataNode marked as dead
Analysis: NameNode identifies under-replicated blocks
Re-replication: Instructs healthy DataNodes to create new replicas
Completion: All blocks restored to replication factor

Timeline: Minutes to hours depending on data volume.

8.2 NameNode Failure

Problem: Single point of failure.

Solutions:

Secondary NameNode

Role: Checkpoint creation
Function: Merges edit logs with fsimage
Recovery: Manual failover required
Downtime: Hours
Cost: Free (open-source)

High Availability (HA)

Role: Active-Standby NameNodes
Function: Automatic failover
Recovery: Seconds to minutes
Downtime: Minimal
Cost: Additional infrastructure

Loading comparison…

Exam Pattern Questions and Answers

Question 1: "Explain HDFS architecture with its components." (10 Marks)

Answer:

Introduction (1 mark):
HDFS follows master-slave architecture consisting of single NameNode managing metadata and multiple DataNodes storing actual data blocks, designed for distributed storage of very large files across commodity hardware.

NameNode - Master (3 marks):
NameNode is the master server managing file system namespace and regulating client access. It maintains directory tree, file-to-block mapping, block-to-DataNode mapping, file permissions, and timestamps. NameNode stores only metadata in memory, not actual data. It coordinates with DataNodes through heartbeats (every 3 seconds) and block reports (every 6 hours), monitoring cluster health and managing block replication.

DataNode - Slave (3 marks):
DataNodes are slave servers storing actual data blocks on local disks and serving read/write requests. Each cluster has multiple DataNodes (10s to 1000s) running on commodity hardware. They periodically send heartbeats to NameNode indicating health status and block reports listing all stored blocks. DataNodes execute NameNode commands for block replication, deletion, and data transfers.

Block Management (2 marks):
HDFS splits files into 128 MB blocks distributed across DataNodes with rack-aware placement. For 3x replication, first replica on writer node, second on different rack, third on same rack as second. This strategy balances fault tolerance (survives rack failures) with network efficiency (only 1 copy crosses racks during write).

Communication (1 mark):
NameNode monitors DataNodes through heartbeats every 3 seconds and receives block reports every 6 hours. Missing heartbeats for 10 minutes triggers automatic re-replication of blocks from failed DataNode, ensuring fault tolerance without manual intervention.

Question 2: "Describe HDFS write operation process." (6 Marks)

Answer:

Client Request (1 mark):
Client application contacts NameNode requesting to create or write a file. NameNode creates namespace entry and permissions check, then returns list of DataNodes selected for storing first block's replicas.

Pipeline Creation (2 marks):
Client establishes write pipeline with first DataNode, which connects to second DataNode, which connects to third DataNode. This forms a linear pipeline for replication where each DataNode both receives and forwards data simultaneously.

Data Transfer (2 marks):
Client streams block data to first DataNode. First DataNode writes to local disk while simultaneously forwarding packets to second DataNode. Second DataNode similarly writes locally and forwards to third DataNode. Third DataNode writes to disk and sends acknowledgment back through pipeline.

Completion (1 mark):
Acknowledgments flow back from third to second to first to client. Process repeats for subsequent blocks. After all blocks written, client notifies NameNode to close file. NameNode updates metadata marking file as complete. This pipeline approach enables efficient replication while client writes to only one DataNode.

Summary

Key Points for Revision:

Architecture: Master (NameNode) - Slave (DataNodes) pattern
NameNode: Metadata management, namespace, block mapping
DataNode: Block storage, read/write serving, heartbeats
Heartbeats: Every 3 seconds, failure detection after 10 minutes
Block Reports: Every 6 hours, comprehensive block inventory
Replication: Rack-aware placement (different rack + same rack)
Write: Pipeline replication for efficiency
Read: Direct from DataNode using data locality

Exam Tip

Draw architecture diagram showing NameNode at top, multiple DataNodes below, and communication arrows (heartbeats, block reports, client read/write). Always mention specific timings (3 seconds heartbeat, 6 hours block report, 10 minutes failure detection).

Quiz Time! 🎯

Test Your Knowledge

Question 1 of 5

1. NameNode stores:

Actual file data

Only metadata about files and blocks

Both metadata and data

Only block data

Loading quiz…