IntelliJ IDEA 2022.3 Help

Big Data Tools

The Big Data Tools plugin is available for IntelliJ IDEA 2019.2 and later. It provides specific capabilities to monitor and process data with Zeppelin, AWS S3, Apache Spark, Apache Kafka, Apache Hive, Apache Flink, Google Cloud Storage, MinIO, Linode, DigitalOcean Spaces, Microsoft Azure, and Hadoop Distributed File System (HDFS).

You can create new or edit existing local or remote Zeppelin notebooks, execute code paragraphs, preview the resulting tables and graphs, and export the results to various formats.

Big data tools UI overview

The plugin supports many IDE features to work with notebooks:

Coding assistance for Scala

Notebook features

Getting started with Big Data Tools in IntelliJ IDEA

The basic workflow for big data processing in IntelliJ IDEA includes the following steps:

Configure your environment

  1. Install the required plugins:

  2. Create a new project in IntelliJ IDEA.

  3. Configure a connection to the target server.

  4. Work with your notebooks and data files.

Work with notebooks

  1. Create and edit a notebook.

  2. Execute the notebook.

  3. Analyze your data

Get familiar with the user interface

When you install the Big Data Tools plugin for IntelliJ IDEA, the following user interface elements appear:

Big Data Tools window

The Big Data Tools window appears in the rightmost group of the tool windows. The window displays the list of the configured servers, notebooks, and files structured by folders. Even when no connections are configured, you can see the available types of servers to connect to.

Basic operations on notebooks are available from the context menu.

Big Data Tools window

You can navigate through the directories and preview columnar structures of .csv, .parquet, .avro, and .orc files.

Basic operations on data files are available from the context menu. You can also move files by dragging them to the target directory on the target server.

Context menu in the Big Data Tools tool window

For the basic operations with the servers, use the window toolbar:

Add connection

Add a new connection to a server.

Delete connection

Delete the selected connection.

Search in notebooks

For Zeppelin: find a note in your Zeppelin server.

For storages: navigate to a file.

Refresh Connection

Refresh connections to all configured servers.

Connection settings

Open the connection settings for the selected server.

Open in Editor

Only for storages. Open the storage in a separate tab of your editor

If you have any questions regarding the Big Data Tools plugin, click the Support link and select one of the available options. You can join the support Slack channel, submit a ticket in the YouTrack system, or copy the support email to send your question.

BDT support options

Notebook editor

Zeppelin notebook editor

In the notebook editor, you can add and execute Python, Scala and SQL code paragraphs. When editing your code paragraph, you can use all the coding assistance features available for a particular language. Code warnings and errors will be highlighted in the corresponding code constructs in the scrollbar. The results of paragraph execution are shown in the preview area below each paragraph.

Use the notebook editor toolbar for the basic operations with notebooks:

Run all

Executes all paragraphs in the notebook.

Stop execution

Stops execution of the notebook paragraphs.

Clear all outputs

Clears output previews for all paragraphs.

Additional actions

Select Export Note Code to HTML to save the note as an HTML file. Select Alter Code Visibility to hide code sections in paragraphs (by default, both code and result sections are shown). Select Show State Viewer Window to open State Viewer.

Interpreter bindings

Opens the Interpreter Bindings dialog to configure interpreters for the selected notebook.

Open in a browser

Click this button to open the notebook in the browser or copy a link to it.

Navigate

Allows you to jump to a particular paragraph of a notebook.

Minimap

Shows the minimap for quick navigation through the notebook.

A toolbar of a local note contains a list of available Zeppelin servers, so that you can select one to execute the note.

The notebook editor toolbar also shows the status of the last paragraph execution: Finished, Aborted, or Failed and the synchronization status of State Viewer.

Monitoring tool windows

These windows appear when you have connected to a Spark or Hadoop server.

Spark monitoring: jobs
Click to preview in a separate tab

Building your BigData project

With the dedicated project type and the corresponding wizard, you can create a project in IntelliJ IDEA that has all the required Spark dependencies. You can start developing Spark applications without any additional configurations.

Create a dedicated project

  1. Select File | New | Project from the main menu

  2. Select Big Data from the options on the left, then ensure that the Spark type of project is specified. Click Next.

    Spark project
  3. On the next page of the wizard, specify your project's name, location, build system (SBT, Gradle or Maven), JDK, and artifact coordinates.

    Spark project

    Fields in the More settings section are populated automatically based on the project's name value. You can alter them, if needed.

  4. Click Finish to complete the task.

For more details on developing Spark applications, see the Spark Programming Guide.

Last modified: 07 February 2023