Run notebooks and analyze data

To preview and analyze data sets, you need to run the executable paragraphs of your notebook.

Running notebooks

You can run paragraphs one by one or all at once. When executing any paragraph, mind code dependencies. If, for example, the current paragraph relies on the variables that are initialized in the previous paragraph, it needs to be executed first.

Click on the notebook editor toolbar to execute all paragraphs of the notebook, all paragraphs above or below the current one. The progress of the execution will be shown on the toolbar.
Click icon in the gutter to execute a particular paragraph of the notebook.

Once the execution completes, the execution status is shown in the toolbar and in the gutter:

: execution has been successfully finished
You can click this icon to execute the paragraph again.
: execution has failed
: execution has been aborted

In case of the successful execution, preview the output that is shown below the paragraph code.

Executing code paragraph has been successfully finished

The Spark job link appears in the preview area when the paragraph contains any RDD operation that starts a Spark job, for example, count or saveAsTextFile methods. Click this link to open the Spark Monitoring tool window and preview the completion status, event timeline, and DAG visualization.

You can select a Spark job code in a notebook and extract it into a Scala file for further usage.

Extract a Spark job

Select a Spark job code fragment in the notebook.
Right-click the selected code and select Refactor | Extract Spark Job from the context menu.
Specify the Scala filename and its location in your file system, then confirm your choice. The specified file with the extracted job appears in a separate editor tab.

Refresh interpreters

When you execute code of your notebook, you might want to restart an interpreter on the target Zeppelin server. For your convenience, IntelliJ IDEA provides several options to do this:

Click on the notebook toolbar.
Right-click the Run icon in the gutter and select Restart Interpreter.
Right-click any paragraph in the editor and select Restart Interpreter from the context menu.

When you execute SQL statements or run the show method of a Zeppelin or Spark object, the results are shown in the Table and Chart tabs of the preview area.

Viewing outputs

If your notebook processes data collections, you can preview output both in tabular and graphical forms. You can manage the output presentation by selecting a table, graph, or split view. Hover over the right side of the paragraph output to see the corresponding controls.

Organize data in the table

Click a column header to order values in it.
Click to filter data in the selected column.
Click to organize table in pages. Toggle this button and specify the number of table rows to display on a page: 10, 15, 30, or 100.
Click and select the columns to be shown in the table.

Export tables

Click to save the table in a .csv file.
Enter the filename and click Save.

The default type of the chart is defined by the chart settings on the server. However, you can configure and modify the predefined chart type.

Configure charts

Click to alter the initial settings of the chart.
Click any icon that corresponds to a chart type and the new chart will be plotted. For example, click to add a new scatter chart.
Drag the columns you want to plot to the specific field:
Click the Add new series link to add more series to the chart. Then drag the required columns to the target fields to set the axes.

Export charts

Click to save the generated graphical output in the .png format.
Enter the filename and click Save.

Configure chart settings

To define the way the chart looks, click on the chart toolbar (right side of the output area).
Select the contrast or default theme. Click to modify the theme colors. Also, you can click to clone the theme and customize it later.
Review the modified settings in the preview area and save the changes.

Viewing runtime data with State Viewer

State Viewer allows you to preview local variables and SQL schemas for the current Zeppelin session. It establishes a protocol between the Zeppelin server and the IDE, and provides runtime information to get more details about the variables, and offers smart coding assistance.

Use State Viewer

In the Zeppelin connection settings, make sure the Enable State Viewer integration checkbox is selected.
If you want to fine-tune State Viewer settings, click Show State Viewer Settings.
Open any notebook on the target Zeppelin server and execute any paragraph to collect data.
Alternatively, you can create a local Zeppelin notebook in your project and link it to the configured connection.
Once the paragraph is executed, the State Viewer tool window opens. You can see the State Viewer synced status in the notebook toolbar.
In the State Viewer tool window, you can preview the values of the variables and expand hierarchical data. You can right-click any variable to open a context menu and inspect the variable in a separate window with the Inspect ... command, or preview its value in text form (View Text).
At any time, you can click to sync up with the server.
If you closed the State Viewer tool window, you can quickly reopen it: click at the top of your Zeppelin notebook and select Open State Viewer Window.

With code assistance that the State Viewer provides, you can complete the exact names of columns in SQL tables and Scala dataframes. You can also check that the names of your columns do not contain any errors (for example, references to columns that do not exist). Start typing any pattern matching the column name, and you should expect to see code completion:

State Viewer settings

If you are initiating a Zeppelin connection, you can configure State Viewer settings in the corresponding window using the Show State Viewer Settings section. If you already have established a Zeppelin connection, you can quickly access State Viewer settings by clicking in the State Viewer tool window.

Common Introspector Settings	Collect variables: select if you want to view variable values in State Viewer. Collect SQL info: select if you want to view SQL data. Enable debug mode: select if you want to show the State Viewer log icon in the State Viewer tool window. It can be used to view State Viewer service info, errors, and log.
Variables Introspector Settings	Collect variables only on manual refresh: select if you want State Viewer to collect and display variable only when you click in the State Viewer tool window. Otherwise, State Viewer will update variables on each paragraph execution. Do not collect variables from other notes: collect variables only from the note that you run. If disabled, State Viewer collects and displays all variables of your code state, which can include data from other notebooks. Timeout for variables collecting (ms): if collecting variables takes more time than specified by the timeout, State Viewer will stop collecting them and will show only those that have been collected during this timeout. Collect single variable timeout (ms): if collecting a variable takes more time than specified by the timeout, only the variable name and a part of its value (first elements of an array or attributes of a class) will be shown. Limit size for collection: the maximum number of elements in collections (arrays, lists, and so on) to be displayed. Strings max length: specify the first N characters of string values to be displayed. Count last 'res' variables: maximum number of paragraph executions to be displayed in the State Viewer tool window. Introspection depth: maximum depth of objects to be displayed as a hierarchy. The remaining part will be represented as a string.
SQL Introspector Settings	Collect SQL schema timeout (ms): timeout to collect SQL schemas. Collection strategy: All data in state — after paragraph execution: collect info on all SQL schemas from your program state after a paragraph execution. Current paragraph data — after execution, all data — on refresh: collect schemas from the current paragraph after its execution and collect all schemas from the program state — on manual refresh. Only on refresh: collect all schemas on manual refresh. Schema pattern filter: specify a database and/or table name pattern to collect. Collect only temp tables: collect data from temporary tables only.

Troubleshooting

If the execution of the notebook or a particular paragraph has failed, review the error message and consider some typical troubleshooting actions:

Problem	Recommended action
The notebook toolbar is not available. The following warning message is shown:	Click the Try Reconnect link to get the notebook connected to the server.
Server connection is lost. The corresponding icon shows the disconnected status of the server:	Click to reestablish the connection to the server.
Interpreter session gets expired. For example, the error message reports that the Spark session is expired.	Click on the notebook toolbar control and restart the problematic interpreter.

Last modified: 20 January 2023