Home > Topics > Big Data Analysis > HDFS Daemons

HDFS Daemons

1. Definition

HDFS Daemons are background processes that run continuously on Hadoop cluster nodes to provide HDFS storage and management functionality.

2. Three Main Daemons

2.1 NameNode Daemon

Role: Master daemon managing file system metadata.

Runs On: Master node (single server).

Responsibilities:

Maintain file system namespace in RAM
Track file-to-block mappings
Monitor DataNode health via heartbeats
Manage block replication
Handle client metadata requests

Critical Nature: Single point of failure (SPOF).

2.2 DataNode Daemon

Role: Slave daemon storing actual data blocks.

Runs On: Worker nodes (multiple servers).

Responsibilities:

Store data blocks on local disks
Serve read/write requests
Send heartbeats to NameNode (every 3 seconds)
Send block reports (every 6 hours)
Perform block operations (create, delete, replicate)

Failure Tolerance: Individual failures handled automatically.

2.3 Secondary NameNode Daemon

Role: Checkpoint helper (NOT a backup NameNode).

Runs On: Separate server.

Responsibilities:

Download fsimage and edit logs
Merge them to create new fsimage
Upload to NameNode

Frequency: Every hour OR 1 million transactions.

Common Misconception: It is NOT a hot standby for NameNode.

Exam Pattern Questions and Answers

Question 1: "Explain HDFS daemons and their roles." (6 Marks)

Answer:

NameNode Daemon (2 marks): NameNode is the master daemon running on master node, managing file system namespace in RAM. It maintains directory structure, file-to-block mappings, and block-to-DataNode mappings. NameNode monitors DataNode health through heartbeats and manages block replication across cluster.

DataNode Daemon (2 marks): DataNode is slave daemon running on worker nodes, storing actual data blocks on local disks. It serves client read/write requests, sends heartbeats every 3 seconds and block reports every 6 hours to NameNode, and executes block operations as instructed.

Secondary NameNode (2 marks): Secondary NameNode performs checkpointing by downloading fsimage and edit logs from NameNode, merging them to create updated fsimage, and uploading it back. This happens hourly or after 1 million transactions, reducing NameNode startup time and edit log growth.

Summary

NameNode: Master, metadata, single instance
DataNode: Slave, data blocks, multiple instances
Secondary NameNode: Checkpoint creation, NOT backup

Exam Tip

Clarify that Secondary NameNode is NOT a backup or standby - it only helps with checkpointing.

Quiz Time! 🎯

Test Your Knowledge

Question 1 of 2

1. NameNode stores:

Actual data blocks

Only metadata

Both metadata and data

Only logs

Loading quiz…