Home > Topics > Big Data Analysis > HDFS Daemons

HDFS Daemons

1. Definition

HDFS Daemons are background processes that run continuously on Hadoop cluster nodes to provide HDFS storage and management functionality.


2. Three Main Daemons

2.1 NameNode Daemon

Role: Master daemon managing file system metadata.

Runs On: Master node (single server).

Responsibilities:

  1. Maintain file system namespace in RAM
  2. Track file-to-block mappings
  3. Monitor DataNode health via heartbeats
  4. Manage block replication
  5. Handle client metadata requests

Critical Nature: Single point of failure (SPOF).


2.2 DataNode Daemon

Role: Slave daemon storing actual data blocks.

Runs On: Worker nodes (multiple servers).

Responsibilities:

  1. Store data blocks on local disks
  2. Serve read/write requests
  3. Send heartbeats to NameNode (every 3 seconds)
  4. Send block reports (every 6 hours)
  5. Perform block operations (create, delete, replicate)

Failure Tolerance: Individual failures handled automatically.


2.3 Secondary NameNode Daemon

Role: Checkpoint helper (NOT a backup NameNode).

Runs On: Separate server.

Responsibilities:

  1. Download fsimage and edit logs
  2. Merge them to create new fsimage
  3. Upload to NameNode

Frequency: Every hour OR 1 million transactions.

Common Misconception: It is NOT a hot standby for NameNode.


Exam Pattern Questions and Answers

Question 1: "Explain HDFS daemons and their roles." (6 Marks)

Answer:

NameNode Daemon (2 marks): NameNode is the master daemon running on master node, managing file system namespace in RAM. It maintains directory structure, file-to-block mappings, and block-to-DataNode mappings. NameNode monitors DataNode health through heartbeats and manages block replication across cluster.

DataNode Daemon (2 marks): DataNode is slave daemon running on worker nodes, storing actual data blocks on local disks. It serves client read/write requests, sends heartbeats every 3 seconds and block reports every 6 hours to NameNode, and executes block operations as instructed.

Secondary NameNode (2 marks): Secondary NameNode performs checkpointing by downloading fsimage and edit logs from NameNode, merging them to create updated fsimage, and uploading it back. This happens hourly or after 1 million transactions, reducing NameNode startup time and edit log growth.


Summary

  1. NameNode: Master, metadata, single instance
  2. DataNode: Slave, data blocks, multiple instances
  3. Secondary NameNode: Checkpoint creation, NOT backup
Exam Tip

Clarify that Secondary NameNode is NOT a backup or standby - it only helps with checkpointing.


Quiz Time! 🎯

Loading quiz…