HDFS Access Options
1. Definition
HDFS Access Options are various methods and interfaces available for users and applications to interact with HDFS for reading, writing, and managing files.
2. Command Line Interface (CLI)
Most Common Method: Using hadoop fs or hdfs dfs commands.
Basic Operations:
# List files
hadoop fs -ls /user/hadoop
# Upload file
hadoop fs -put local-file.txt /user/hadoop/
# Download file
hadoop fs -get /user/hadoop/hdfs-file.txt local-file.txt
# View file
hadoop fs -cat /user/hadoop/data.txt
# Delete file
hadoop fs -rm /user/hadoop/temp.txt
Use Case: Interactive file management, shell scripts, administration tasks.
3. Java API
For: Custom applications and MapReduce programs.
Example:
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
// Create file
Path file = new Path("/user/hadoop/output.txt");
FSDataOutputStream out = fs.create(file);
out.writeUTF("Hello HDFS");
out.close();
// Read file
FSDataInputStream in = fs.open(file);
String content = in.readUTF();
in.close();
Use Case: Building Hadoop applications, custom data processing.
4. Web Interface (HTTP)
WebHDFS: RESTful HTTP interface for HDFS.
Example:
# List directory
curl "http://namenode:50070/webhdfs/v1/user/hadoop?op=LISTSTATUS"
# Read file
curl "http://namenode:50070/webhdfs/v1/user/hadoop/data.txt?op=OPEN"
# Create file
curl -X PUT "http://namenode:50070/webhdfs/v1/user/hadoop/file.txt?op=CREATE"
Use Case: Remote access, cross-platform integration, web applications.
5. HDFS UI (Browser)
NameNode Web UI: http://namenode:9870
Features:
- Browse HDFS directory structure
- View file contents
- Check cluster status
- Monitor DataNode health
- View block locations
Use Case: Visual exploration, monitoring, debugging.
6. NFS Gateway
Allows: Mount HDFS as network file system.
Setup:
# Mount HDFS
mount -t nfs -o vers=3,proto=tcp,nolock namenode:/ /mnt/hdfs
# Now use as regular filesystem
cp /local/file.txt /mnt/hdfs/user/hadoop/
ls /mnt/hdfs/user/hadoop/
Use Case: Legacy applications, users familiar with traditional file systems.
7. Comparison of Access Methods
Loading comparison…
Exam Pattern Questions and Answers
Question 1: "Explain different ways to access HDFS with examples." (8 Marks)
Answer:
Command Line Interface (CLI) (2 marks): Users access HDFS through hadoop fs or hdfs dfs commands for basic file operations. Examples include hadoop fs -ls /user/hadoop to list files, hadoop fs -put file.txt /user/hadoop/ to upload files, and hadoop fs -get /user/hadoop/data.txt local.txt to download files. This method is suitable for interactive file management and shell scripting.
Java API (2 marks): Developers use Java FileSystem API for programmatic access in custom applications and MapReduce programs. Code creates FileSystem object, then uses methods like create(), open(), delete() for file operations. This provides full control over HDFS operations and is essential for building Hadoop applications.
Web Interface (WebHDFS) (2 marks): RESTful HTTP interface enables remote access using HTTP requests. For example, curl "http://namenode:50070/webhdfs/v1/user/hadoop?op=LISTSTATUS" lists directory contents. This allows cross-platform integration and access from non-Java applications and web browsers.
HDFS UI and NFS Gateway (2 marks): NameNode Web UI (http://namenode:9870) provides visual interface for browsing files, monitoring cluster, and viewing block locations. NFS Gateway allows mounting HDFS as network file system using standard NFS mount command, enabling legacy applications to access HDFS as traditional filesystem without code changes.
Summary
Key Access Methods:
- CLI: hadoop fs commands (interactive, scripts)
- Java API: Programmatic access (applications)
- WebHDFS: HTTP REST API (remote, cross-platform)
- Web UI: Browser interface (monitoring, browsing)
- NFS Gateway: Mount as filesystem (legacy apps)
For access methods questions, provide specific command/code examples for each method. Mention appropriate use cases (CLI for admin, Java API for development, WebHDFS for integration).
Quiz Time! 🎯
Loading quiz…