PyCharm 2020.2 Help

Big Data tools

The Big Data Tools plugin is available for PyCharm 2020.1 and later. It provides specific capabilities to monitor and process data with AWS S3, Spark, Google Cloud Storage, Minio, Linode, Digital Open Space, Microsoft Azure and Hadoop Distributed File System (HDFS).

You can create new or edit existing local or remote Zeppelin notebooks, execute code paragraphs, preview the resulting tables and graphs, and export the results to various formats.

Big data tools UI overview

Getting started with Big Data Tools in PyCharm

The basic workflow for big data processing in PyCharm includes the following steps:

Configure your environment

  1. Install the Big Data Tools plugin.

  2. Create a new project in PyCharm.

  3. Configure a connection to the target server.

  4. Work with your data files.

Work with notebooks

  1. Create and edit a notebook.

  2. Execute the notebook.

  3. Analyze your data:

Get familiar with the user interface

When you install the Big Data Tools plugin for PyCharm, the following user interface elements appear:

Big Data Tools window

The Big Data Tools window appears in the rightmost group of the tool windows. The window displays the list of the configured servers and files structured by folders.

Basic operations on notebooks are available from the context menu.

Big Data Tools window

You can navigate through the directories and preview columnar structures of .csv and .parquet files.

Basic operations on data files are available from the context menu. You can also move files by dragging them to the target directory on the target server.

Data files in the BDT window

For the basic operations with the servers, use the window toolbar:

ItemDescription
Add connectionAdds a new connection to a server.
Delete connectionDeletes the selected connection.
Zeppelin connection searchOpens a window to search across all the available Zeppelin connections.
Refresh ConnectionRefreshes connections to all configured servers.
Connection settingsOpens the connection settings for the selected server.

Notebook editor

Zeppelin notebook editor

In the notebook editor, you can add and execute Scala, SQL, and Python code paragraphs. When editing your code paragraph, you can use all the coding assistance features available for a particular language. Code warnings and errors will be highlighted in the corresponding code constructs in the scrollbar. The results of paragraph execution are shown in the preview area below each paragraph.

Use the notebook editor toolbar for the basic operations with notebooks:

ItemDescription
Run allExecutes all paragraphs in the notebook, all cells above the selected cell, or all cells below the selected cell.
Stop executionStops execution of the notebook paragraphs.
Clear all outputsClears output previews for all paragraphs.
Interpreter bindingsOpens the Interpreter Bindings dialog to configure interpreters for the selected notebook.
Open in a browserOpens the notebook in the browser.
Go to a paragraphAllows you to jump to a particular paragraph of a notebook.
MinimapShows the minimap for quick navigation through the notebook.

The notebook editor toolbar also shows the status of the last paragraph execution.

Status of the paragraph execution
Execution with errors occurred
.
Execution with errors occurred
.

Monitoring tool windows

These windows appear when you have connected to a Spark or Hadoop server.

Spark monitoring: jobs
Click to preview in a separate tab
Last modified: Sun Nov 29 17:59:48 UTC 2020