Home > Topics > Big Data Analysis > HDFS Internals

HDFS Internals

1. Definition

HDFS Internals refer to the underlying mechanisms and data structures that enable HDFS to manage metadata, track blocks, ensure data integrity, and maintain fault tolerance.

2. Metadata Storage

2.1 FsImage (File System Image)

Definition: Snapshot of entire file system namespace stored on disk.

Contents:

Complete directory structure
File-to-block mapping
File permissions and quotas
Block replication information

Location: ${dfs.namenode.name.dir}/current/fsimage_*

Size: Typically few hundred MBs to few GBs depending on namespace size.

Example Entry:

/user/hadoop/sales.csv
  - Size: 1 GB
  - Blocks: [blk_1001, blk_1002, blk_1003, blk_1004, blk_1005, blk_1006, blk_1007, blk_1008]
  - Replication: 3
  - Permissions: rw-r--r--
  - Owner: hadoop

2.2 Edit Logs

Definition: Transaction log recording all changes to file system namespace.

Purpose: Capture modifications between fsimage checkpoints.

Operations Logged:

File creation/deletion
Directory operations
Permission changes
Replication factor modifications

Location: ${dfs.namenode.name.dir}/current/edits_*

Example Entries:

OP_ADD: /user/hadoop/data.csv created
OP_DELETE: /user/hadoop/temp.txt deleted  
OP_MKDIR: /user/hadoop/reports created
OP_SET_REPLICATION: /user/hadoop/critical.csv replication=5

Workflow:

NameNode Starts → Loads fsimage → Applies edit logs → 
In-Memory Namespace Ready → New changes append to edit logs

2.3 Checkpoint Process

Problem: Edit logs grow indefinitely, increasing startup time.

Solution: Periodically merge edit logs with fsimage.

Process:

1. Trigger Checkpoint"Time-based or transaction-based"

↓

2. Secondary NameNode"Downloads fsimage + edits"

↓

3. Merge"Apply edits to fsimage"

↓

4. New FsImage"Create updated fsimage"

↓

5. Upload"Send to NameNode"

↓

6. NameNode Replace"New fsimage becomes current"

Loading diagram…

Frequency: Every hour OR every 1 million transactions (whichever comes first).

3. Block Management Internals

3.1 Block ID Generation

Structure: 64-bit unique identifier.

Components:

Generation Stamp: Timestamp when block created
Block ID: Incremental sequence number

Example: blk_1073741825_1001

Block ID: 1073741825
Generation Stamp: 1001

Purpose: Unique identification prevents conflicts during replication and recovery.

3.2 Block Scanner

Function: Periodically verifies data integrity on DataNodes.

Process:

Read block data from disk
Compute checksum
Compare with stored checksum
Report corrupted blocks to NameNode

Frequency: Every 3 weeks per block (default).

Corruption Handling:

Corrupted Block Detected → Report to NameNode → 
NameNode marks block corrupt → Instructs re-replication from good replica → 
Delete corrupted block → Maintain replication factor

4. Memory Management

4.1 NameNode RAM Usage

Metadata Size: Approximately 150 bytes per block.

Calculation:

📁

~300 MB RAM

1 million files

📦

~1.5 GB RAM

10 million blocks

💾

~15 GB RAM

100 million blocks

🖥️

~150 GB RAM

1 billion blocks

Loading stats…

Example Calculation:

Cluster storing 10 PB:
10 PB = 10,000 TB = 10,000,000 GB
Blocks = 10,000,000 GB / 0.128 GB per block = 78,125,000 blocks
RAM needed = 78,125,000 × 150 bytes ≈ 12 GB

With 3x replication metadata:
Total RAM ≈ 12 GB × 1.5 = 18 GB

5. Data Integrity

5.1 Checksums

Mechanism: CRC-32C (Cyclic Redundancy Check) computed for each 512-byte chunk.

Storage: Checksum file stored alongside block file.

Verification Points:

Write Time: Client computes checksum, DataNode verifies
Read Time: DataNode sends data + checksum, client verifies
Background: Periodic block scanner verification

Corruption Detection:

Read Operation:
Client requests block → DataNode reads from disk → 
Computes checksum → Compares with stored checksum →
If mismatch: Report corruption, read from replica →
If match: Send data to client

6. Safe Mode

Definition: Read-only startup state where HDFS verifies block replication.

Purpose: Ensure cluster health before allowing writes.

Entry Conditions:

NameNode startup
Manual activation by administrator
Critical metadata inconsistencies

Exit Conditions:

99.9% of blocks meet minimum replication (default)
30 seconds elapsed since threshold met

During Safe Mode:

✅ Read operations allowed
❌ Write operations blocked
✅ Block reports processed
❌ No block deletions/replications

Example:

NameNode Startup:
[2024-01-15 10:00:00] Entering safe mode
[2024-01-15 10:00:05] Loading fsimage (500 MB)
[2024-01-15 10:00:10] Replaying edits (1000 transactions)
[2024-01-15 10:00:15] Building block map from DataNode reports
[2024-01-15 10:02:30] 99.95% blocks replicated (threshold: 99.9%)
[2024-01-15 10:03:00] Leaving safe mode - cluster ready

Exam Pattern Questions and Answers

Question 1: "Explain fsimage and edit logs in HDFS with checkpoint process." (8 Marks)

Answer:

FsImage (2 marks): FsImage is a snapshot of entire file system namespace stored on NameNode's disk, containing complete directory structure, file-to-block mappings, permissions, and replication information. It represents the file system state at a specific point in time and size ranges from hundreds of MBs to few GBs depending on namespace size.

Edit Logs (2 marks): Edit logs are transaction logs recording all modifications to file system namespace between fsimage checkpoints. They capture operations like file creation, deletion, directory operations, and permission changes. As NameNode processes requests, changes are appended to edit logs, ensuring durability of metadata modifications.

Checkpoint Process (4 marks): Checkpointing merges edit logs with fsimage to prevent indefinite growth of edit logs. Secondary NameNode downloads current fsimage and edit logs from NameNode, applies edit log transactions to fsimage creating updated snapshot, and uploads new fsimage back to NameNode. This occurs every hour or every 1 million transactions, whichever comes first. NameNode then replaces old fsimage with new one and starts fresh edit log, improving startup time and reducing memory overhead.

Question 2: "How does HDFS ensure data integrity?" (4 Marks)

Answer:

** Checksums** (2 marks): HDFS computes CRC-32C checksums for each 512-byte chunk of data. During writes, clients compute checksums and DataNodes verify them before storing. Checksum files are stored alongside block files on disk.

Verification (2 marks): Data integrity is verified at three points: during writes (client-computed checksums verified by DataNode), during reads (DataNode computes checksum and client verifies), and through background block scanner running every 3 weeks per block. If corruption detected, NameNode is notified, triggers re-replication from good replica, and corrupted block is deleted, maintaining overall data reliability without user intervention.

Summary

Key Points for Revision:

FsImage: Namespace snapshot on disk
Edit Logs: Transaction log of modifications
Checkpointing: Merge edits with fsimage hourly or per 1M transactions
Block ID: 64-bit with generation stamp
Checksums: CRC-32C for 512-byte chunks
Safe Mode: Read-only startup, exits at 99.9% replication
Memory: ~150 bytes per block in NameNode RAM

Exam Tip

Always mention specific numbers: 150 bytes per block, CRC-32C, 512-byte chunks, 99.9% threshold for safe mode, hourly checkpoints. Explain the complete checkpoint workflow showing Secondary NameNode's role.

Quiz Time! 🎯

Test Your Knowledge

Question 1 of 5

1. FsImage contains:

Actual file data

File system namespace snapshot

Only edit logs

DataNode information only

Loading quiz…