HDFS
Install the Remote File Systems plugin
This functionality relies on the Remote File Systems plugin, which you need to install and enable.
Press Ctrl+Alt+S to open settings and then select
.Open the Marketplace tab, find the Remote File Systems plugin, and click Install (restart the IDE if prompted).
Connect to an HDFS server
In the Big Data Tools window, click and select HDFS.
In the Big Data Tools dialog that opens, specify the connection parameters:
Name: the name of the connection to distinguish it between the other connections.
In Configuration source, select one of:
Optionally, you can set up:
Per project: select to enable these connection settings only for the current project. Deselect it if you want this connection to be visible in other projects.
Enable connection: deselect if you want to disable this connection. By default, the newly created connections are enabled.
Hadoop user name: enter a username to log in to the server. If not specified, the
HADOOP_USER_NAME
environment variable is used. If this variable is not defined, theuser.name
property is used. If Kerberos is enabled, it overrides any of these three values.Enable tunneling (Only NameNode operation). This option creates an SSH tunnel to the remote host. It can be useful if the target server is in a private network, but an SSH connection to the host in the network is available. SSH tunneling currently works only for operators with the following NameNodes: list files, get meta info.
Select the checkbox and specify a configuration of an SSH connection (click ... to create a new SSH configuration).
Under Extended Connection Settings, you can set up:
Root path: a path on the target server to be the root for the HDFS connection.
Operation timeout (s): enter a timeout (in seconds) for operations performed on the remote storage, such as getting file info, listing or deleting objects. The default value is 15 seconds.
Once you fill in the settings, click Test connection to ensure that all configuration parameters are correct. Then click OK.
When the connection is successfully established, the Driver home path field shows the target IP address of connection including a port number. Example: hdfs://127.0.0.1:65224/.
Samples of Hadoop File System configuration files
Type | Sample configuration |
---|---|
HDFS |
<?xml version="1.0"?>
<configuration>
<property>
<name>fs.hdfs.impl</name>
<value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://example.com:9000/</value>
</property>
</configuration>
|
S3 |
<?xml version="1.0"?>
<configuration>
<property>
<name>fs.s3a.impl</name>
<value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
</property>
<property>
<name>fs.s3a.access.key</name>
<value>sample_access_key</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>sample_secret_key</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>s3a://example.com/</value>
</property>
</configuration>
|
WebHDFS |
<?xml version="1.0"?>
<configuration>
<property>
<name>fs.webhdfs.impl</name>
<value>org.apache.hadoop.hdfs.web.WebHdfsFileSystem</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>webhdfs://master.example.com:50070/</value>
</property>
</configuration>
|
WebHDFS and Kerberos |
<?xml version="1.0"?>
<configuration>
<property>
<name>fs.webhdfs.impl</name>
<value>org.apache.hadoop.hdfs.web.WebHdfsFileSystem</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>webhdfs://master.example.com:50070</value>
</property>
<property>
<name>hadoop.security.authentication</name>
<value>Kerberos</value>
</property>
<property>
<name>dfs.web.authentication.kerberos.principal</name>
<value>testuser@EXAMPLE.COM</value>
</property>
<property>
<name>hadoop.security.authorization</name>
<value>true</value>
</property>
</configuration>
|