Secondary NameNode
1. Definition
Secondary NameNode is a helper daemon that performs periodic checkpointing of NameNode metadata by merging fsimage with edit logs.
2. Purpose and Role
Primary Function: Create checkpoints to reduce NameNode startup time.
NOT a Backup: Common misconception - it is NOT standby or backup NameNode.
Why Needed: Edit logs grow continuously; without checkpointing, NameNode startup would take hours to replay millions of edits.
3. Checkpoint Process
Loading diagram…
Detailed Steps:
Secondary NameNode:
1. "NameNode, give me current fsimage and edits"
2. Downloads: fsimage_00000000001 (1 GB) + edits_00000000001-00001000000 (500 MB)
3. Loads fsimage into memory
4. Applies all 1 million edit log transactions
5. Saves new fsimage_00001000000 (1.5 GB)
6. "NameNode, here's your new checkpoint"
NameNode:
7. Receives new fsimage
8. Renames current edits to edits.old
9. Starts new empty edits file
10. Uses new fsimage as current
4. Configuration
Checkpoint Triggers:
- Time-based: Every 3600 seconds (1 hour) default
- Transaction-based: Every 1,000,000 edits default
<!-- hdfs-site.xml -->
<property>
<name>dfs.namenode.checkpoint.period</name>
<value>3600</value> <!-- seconds -->
</property>
<property>
<name>dfs.namenode.checkpoint.txns</name>
<value>1000000</value> <!-- transactions -->
</property>
5. Secondary NameNode vs Standby NameNode
Loading comparison…
Exam Pattern Questions and Answers
Question 1: "Explain the role of Secondary NameNode with checkpoint process." (6 Marks)
Answer:
Role (2 marks): Secondary NameNode performs periodic checkpointing by merging NameNode's fsimage and edit logs. It is NOT a backup or standby NameNode but a helper daemon that reduces NameNode startup time by preventing edit logs from growing indefinitely.
Checkpoint Process (4 marks): Checkpointing occurs every hour or after 1 million transactions, whichever comes first. Secondary NameNode downloads current fsimage and edit logs from NameNode, loads fsimage into memory, applies all edit log transactions to create updated fsimage, and uploads new fsimage to NameNode. NameNode then replaces old fsimage with new checkpoint and starts fresh edit log. This ensures that NameNode startup only needs to load recent fsimage and replay minimal edits, reducing startup time from potential hours to minutes.
Summary
- Role: Checkpoint creation (NOT backup)
- Frequency: Hourly OR 1M transactions
- Process: Download → Merge → Upload
- Benefit: Faster NameNode startup
Always clarify that Secondary NameNode is NOT a backup - this is a very common misconception. Explain why checkpointing is necessary.
Quiz Time! 🎯
Loading quiz…