HDFS Architecture
1. Definition
HDFS Architecture follows a master-slave pattern consisting of a single NameNode (master) managing cluster metadata and multiple DataNodes (slaves) storing actual data blocks.
2. Core Components
2.1 NameNode (Master)
Role: Manages the file system namespace and regulates client access to files.
Responsibilities:
Loading diagram…
Metadata Stored:
- File names and directory structure
- File-to-block mapping (which blocks belong to which file)
- Block-to-DataNode mapping (which DataNodes store which blocks)
- File permissions and ownership
- Modification timestamps
Important: NameNode does NOT store actual data, only metadata.
2.2 DataNode (Slave)
Role: Stores actual data blocks and serves read/write requests from clients.
Responsibilities:
- Store Blocks: Maintain block data on local disk
- Serve Requests: Handle read and write operations from clients
- Block Reports: Periodically report all blocks to NameNode
- Heartbeats: Send heartbeat signals to NameNode every 3 seconds
- Execute Commands: Follow instructions from NameNode for replication and deletion
Storage:
DataNode Local Disk:
/hadoop/data/current/
├── blk_1073741825 (128 MB)
├── blk_1073741825.meta (checksum)
├── blk_1073741826 (128 MB)
├── blk_1073741826.meta (checksum)
└── ...
3. Master-Slave Architecture
Loading comparison…
4. Block Management
4.1 Block Creation Process
Workflow:
Client Writes File → NameNode Creates Metadata →
NameNode Selects DataNodes → Client Writes to DataNode Pipeline →
DataNodes Replicate Blocks → NameNode Updates Block Mapping
Detailed Steps:
- Client Request: Client requests to write file "sales.csv"
- NameNode Response: NameNode creates namespace entry, selects 3 DataNodes for first block
- Pipeline Write: Client writes to DataNode1 → DataNode1 writes to DataNode2 → DataNode2 writes to DataNode3
- Acknowledgment: DataNode3 acknowledges to DataNode2 → DataNode2 to DataNode1 → DataNode1 to Client
- Next Block: Process repeats for subsequent blocks
- Completion: Client notifies NameNode when file write is complete
4.2 Block Replication Strategy
Rack-Aware Placement:
For 3x replication:
- 1st replica: Same node as writer (if writer is DataNode) or random
- 2nd replica: Different rack from 1st replica
- 3rd replica: Same rack as 2nd replica, different node
Why Different Racks?
Loading stats…
Example:
File Block Distribution:
Block 1:
- Copy 1: Rack1-DataNode1 (writer location)
- Copy 2: Rack2-DataNode3 (different rack)
- Copy 3: Rack2-DataNode4 (same rack as copy 2)
Benefits:
✅ If Rack1 fails, data available on Rack2
✅ If Rack2 switch fails, data available on Rack1
✅ Write bandwidth: Only 1 copy crosses racks
5. Communication Mechanisms
5.1 Heartbeats
Purpose: NameNode monitors DataNode health.
Frequency: Every 3 seconds from each DataNode.
Contents:
- DataNode is alive and functioning
- Storage capacity (total, used, remaining)
- Number of data transfers in progress
Failure Detection:
Normal: Heartbeat received every 3 seconds
Missing: No heartbeat for 10 minutes (600 seconds)
Action: NameNode marks DataNode as dead
Recovery: Re-replicate blocks from dead node
5.2 Block Reports
Purpose: DataNode informs NameNode about all blocks it stores.
Frequency: Every 6 hours (default).
Contents:
- List of all block IDs stored on DataNode
- Block length and generation stamp
- Storage location on disk
Use Cases:
- NameNode builds complete block-to-DataNode mapping
- Detect missing blocks (corruption)
- Identify over-replicated blocks
- Verify replication factor compliance
6. Read Operation Architecture
Step-by-Step Process:
Loading diagram…
Detailed Example:
Client wants to read "sales.csv" (256 MB = 2 blocks)
Step 1: Client → NameNode: "Open sales.csv for reading"
Step 2: NameNode → Client:
Block 1: [DataNode1, DataNode3, DataNode5]
Block 2: [DataNode2, DataNode4, DataNode6]
Step 3: Client selects DataNode1 (closest) for Block 1
Step 4: Client reads Block 1 from DataNode1
Step 5: Client selects DataNode4 (closest) for Block 2
Step 6: Client reads Block 2 from DataNode4
Step 7: Client assembles file from blocks
Step 8: Client → NameNode: "Close sales.csv"
Data Locality: Client prefers DataNodes on same machine → same rack → different rack.
7. Write Operation Architecture
Step-by-Step Process:
1. Client → NameNode: "Create file sales.csv"
2. NameNode: Creates namespace entry, selects DataNodes
3. NameNode → Client: "Write Block 1 to [DN1, DN2, DN3]"
4. Client → DN1: Write block data
5. DN1 → DN2: Pipeline replication
6. DN2 → DN3: Pipeline replication
7. DN3 → DN2 → DN1 → Client: Acknowledgment
8. Repeat steps 3-7 for remaining blocks
9. Client → NameNode: "Close file"
Pipeline Replication:
Client writes to 1st DataNode only
1st DataNode simultaneously:
- Writes to local disk
- Forwards to 2nd DataNode
2nd DataNode simultaneously:
- Writes to local disk
- Forwards to 3rd DataNode
3rd DataNode:
- Writes to local disk
- Sends acknowledgment back
Result: All 3 copies created in parallel
Efficiency: Client writes once, network handles replication
8. Fault Tolerance Mechanisms
8.1 DataNode Failure
Scenario: DataNode crashes or becomes unreachable.
Detection & Recovery:
- Detection: NameNode notices missing heartbeats (10 minutes)
- Marking: DataNode marked as dead
- Analysis: NameNode identifies under-replicated blocks
- Re-replication: Instructs healthy DataNodes to create new replicas
- Completion: All blocks restored to replication factor
Timeline: Minutes to hours depending on data volume.
8.2 NameNode Failure
Problem: Single point of failure.
Solutions:
Loading comparison…
Exam Pattern Questions and Answers
Question 1: "Explain HDFS architecture with its components." (10 Marks)
Answer:
Introduction (1 mark):
HDFS follows master-slave architecture consisting of single NameNode managing metadata and multiple DataNodes storing actual data blocks, designed for distributed storage of very large files across commodity hardware.
NameNode - Master (3 marks):
NameNode is the master server managing file system namespace and regulating client access. It maintains directory tree, file-to-block mapping, block-to-DataNode mapping, file permissions, and timestamps. NameNode stores only metadata in memory, not actual data. It coordinates with DataNodes through heartbeats (every 3 seconds) and block reports (every 6 hours), monitoring cluster health and managing block replication.
DataNode - Slave (3 marks):
DataNodes are slave servers storing actual data blocks on local disks and serving read/write requests. Each cluster has multiple DataNodes (10s to 1000s) running on commodity hardware. They periodically send heartbeats to NameNode indicating health status and block reports listing all stored blocks. DataNodes execute NameNode commands for block replication, deletion, and data transfers.
Block Management (2 marks):
HDFS splits files into 128 MB blocks distributed across DataNodes with rack-aware placement. For 3x replication, first replica on writer node, second on different rack, third on same rack as second. This strategy balances fault tolerance (survives rack failures) with network efficiency (only 1 copy crosses racks during write).
Communication (1 mark):
NameNode monitors DataNodes through heartbeats every 3 seconds and receives block reports every 6 hours. Missing heartbeats for 10 minutes triggers automatic re-replication of blocks from failed DataNode, ensuring fault tolerance without manual intervention.
Question 2: "Describe HDFS write operation process." (6 Marks)
Answer:
Client Request (1 mark):
Client application contacts NameNode requesting to create or write a file. NameNode creates namespace entry and permissions check, then returns list of DataNodes selected for storing first block's replicas.
Pipeline Creation (2 marks):
Client establishes write pipeline with first DataNode, which connects to second DataNode, which connects to third DataNode. This forms a linear pipeline for replication where each DataNode both receives and forwards data simultaneously.
Data Transfer (2 marks):
Client streams block data to first DataNode. First DataNode writes to local disk while simultaneously forwarding packets to second DataNode. Second DataNode similarly writes locally and forwards to third DataNode. Third DataNode writes to disk and sends acknowledgment back through pipeline.
Completion (1 mark):
Acknowledgments flow back from third to second to first to client. Process repeats for subsequent blocks. After all blocks written, client notifies NameNode to close file. NameNode updates metadata marking file as complete. This pipeline approach enables efficient replication while client writes to only one DataNode.
Summary
Key Points for Revision:
- Architecture: Master (NameNode) - Slave (DataNodes) pattern
- NameNode: Metadata management, namespace, block mapping
- DataNode: Block storage, read/write serving, heartbeats
- Heartbeats: Every 3 seconds, failure detection after 10 minutes
- Block Reports: Every 6 hours, comprehensive block inventory
- Replication: Rack-aware placement (different rack + same rack)
- Write: Pipeline replication for efficiency
- Read: Direct from DataNode using data locality
Draw architecture diagram showing NameNode at top, multiple DataNodes below, and communication arrows (heartbeats, block reports, client read/write). Always mention specific timings (3 seconds heartbeat, 6 hours block report, 10 minutes failure detection).
Quiz Time! 🎯
Loading quiz…