DataSpell 2021.3 Help

Amazon EMR

The Big Data Tools plugin let you monitor clusters and nodes in the Amazon EMR data processing platform.

Create a connection to AWS EMR

  1. In the Big Data Tools window, click Add a connection and select AWS EMR under the Data Processing Platforms section.

  2. The Big Data Tools Connection dialog opens.

    Configure AWS EMR connection

    Mandatory parameters:

    • Name: the name of the connection to distinguish it between the other connections.

    Optionally, you can set up:

    • Select if you want to specify a custom endpoint or a region. You can select a region from the list or let DataSpell to auto-detect it.

    • Authentication type: the authentication method. You can use your account credentials (by default), or opt to entering the access and secret keys. You can also use a named profile that is located in the default AWS config location (~/.aws/credentials on Linux or macOS, or C:\Users\<USERNAME>\.aws\credentials on Windows). If needed you can specify any profile from a custom credential file.

    • Enable connection: deselect if you want to restrict using this connection. By default, the newly created connections are enabled.

    • HTTP Proxy: select if you want to use IDE proxy settings or if you want to specify custom proxy settings.

    • Click the

      Open SSH Key Settings to create an SSH connection authenticated with a private key file. You need to specify the Amazon EC2 key pair private key in the EMR SSH Keystore dialog.

  3. Once you fill in the settings, click Test connection to ensure that all configuration parameters are correct. Then click OK.

At any time, you can open the connection settings in one of the following ways:

  • Go to the Tools | Big Data Tools Settings page of the IDE settings Ctrl+Alt+S.

  • Click settings on the AWS EMR tool window toolbar.

Once you have established a connection to the server, the AWS EMR tool window appears.

The window consists of the several areas to monitor clusters:

Cluster info

This tab shows details about the selected cluster. You can start typing any parameter name in the Search field and it will be highlighted in the list of parameters.

Obtain more info

  • You can preview the cluster details in the web interface. Click Browse the cluster details or the Open Subnet, Muster Security Group, or Core and Tasks Security Group.

  • Click Open an SFTP connection to establish an SFTP connection to the target server, then specify the path to the config file in your file system.

  • You can preview EMR logs for the selected cluster. Click Open EMR logs to open the logs in the Big Data Tools tool window.

    EMR logs in the Big Data Tools window
  • For more details on JSON representation of the selected cluster configuration, click View JSON representation.

Cluster steps

This tab shows application steps, their IDs, and execution status. You can start typing any ID or name in the Search field and it will be selected in the list of steps.

Manage steps

  • Click Browse the step details to preview the application step in the web interface.

  • You can add more steps of different types. Click More steps and select a step type to add. Then, specify its parameters.

    Add an application to Steps
  • Click Clone a step to duplicate the selected step.

  • For more details on JSON representation of the selected cluster configuration, click View JSON representation.

Cluster instances

This tab shows details about instances of the selected cluster. You can start typing any instance name in the Search field and it will be selected.

View instances

  • You can preview the instance details in the web interface. Click Browse the instance details. You can also click Manage
                  visibility of instance parameters to show or hide a particular parameter of instances.

  • Click Open an SFTP connection to establish an SFTP connection to the target server, then specify the path to the config file in your file system.

  • For more details on JSON representation of the selected cluster configuration, click View JSON representation.

Cluster applications

This tab shows running applications. Click Browse the application details to preview this application status in the web interface.

Last modified: 08 April 2022