DataSpell 2023.3 Help

Configure notebooks

Using DataSpell, you can connect to a Zeppelin server or create notebooks in your project.

Connect to a Zeppelin server

  1. In the Big Data Tools window, click Add a connection and select Zeppelin.

  2. In the Big Data Tools dialog that opens, specify the connection parameters:

    Connection Settings
    • Name: the name of the connection to distinguish it between the other connections.

    • URL: the URL of the Zeppelin server.

    • User name and Password: enter credentials of a Zeppelin user or select Log in as anonymous.

    Optionally, you can set up:

    • Enable connection: deselect if you want to disable this connection. By default, the newly created connections are enabled.

    • Zeppelin version: enter a Zeppelin version or leave the field blank to autodetect the version used on the server.

    • Enable HTTP basic authentication: connection with the HTTP authentication using the specified username and password.

    • Proxy: connection with the HTTP or SOCKS Proxy authentication. Select if you want to use IDEA HTTP Proxy settings or use custom settings with the specified host name, port, login, and password.

    • Enable tunneling. This option creates an SSH tunnel to the remote host. It can be useful if the target server is in a private network but an SSH connection to the host in the network is available.

      Select the checkbox and specify a configuration of an SSH connection (click ... to create a new SSH configuration).

    • Notifications. Select Enable cell execution notification if you want to be notified when execution time exceeds the specified time interval (60 seconds by default).

  3. Once you fill in the settings, click Test connection to ensure that all configuration parameters are correct. Then click OK.

Configure notebook dependencies

Once you have established a connection to a Zeppelin server, you can start working with your notebooks. However, it might be a good practice to ensure that all the libraries and packages required for execution on a particular server are installed and available.

  1. In the main menu, go to File | Project Structure.

  2. In the Project Structure dialog, select Modules in the list of the Project Settings. Then select any of the configured connections in the list of the modules and double-click System Dependencies.

  3. Inspect the list of the added libraries. Click the list and start typing to search for a particular library.

    Configure dependencies
  4. If needed, modify the list of the libraries

    • Click the Add button to add a new library.

    • Click the Specify Documentation URL button and specify the URL of the external documentation.

    • Click the Execute button to select the items that you want DataSpell to ignore (folders, archives and folders within the archives), and click OK.

    • Click the Remove button to remove the selected ordinary library from the library or restore the selected excluded items. The items themselves will stay in the library.

Manage Zeppelin interpreters

You can configure interpreters on a Zeppelin server. Once an interpreter is added, it is available for all notes on this server.

Configure Zeppelin interpreters

  1. Open interpreter settings using one of the following ways:

    • Click the interpreter settings on the notebook toolbar.

    • Right-click a Zeppelin server in the BigDataTools tool window and select Open Interpreter Settings from the context menu.

  2. Preview the list of the available interpreters in the Interpreter Settings window.

    Interpreter settings

    Note that the list of the interpreters is identical to the list that opens in the Interpreter Bindings dialog for Zeppelin 0.8 and earlier. For Zeppelin 0.9, Interpreter Bindings shows only interpreters in use. To filter out the list of the interpreters, type the target name in the Search field.

    You can use the following actions of the interpreter toolbar:

    Item

    Description

    Refresh

    Updates the list of the interpreters.

    Add an interpreter

    Opens a dialog to add a new interpreter. You can include a new interpreter to an existing group of interpreters and configure its settings.

    Delete the selected interpreter

    Deletes the selected interpreter.

    Restart the interpreter

    Restarts the selected interpreter.

    Manage repositories

    Opens a dialog to add, remove, and modify interpreter repositories.

  3. Preview the settings of the target interpreter.

    • When an interpreter has resolved all dependencies and is ready for use, its status is shown as Ready.

    • If the selected interpreter is a root of the interpreter group, you should see the interpreters that are included in this group. For example, the spark group consists of %spark, %spark.sql, %spark.pyspark, %spark.ipyspark, %spark.r, %spark.ir, %spark.shiny, %spark.kotlin

    • Select SHARED, SCOPED, or ISOLATED interpreter binding modes. In shared mode, every note using this interpreter shares a single interpreter instance. Scoped and isolated mode can be used under per user or per note dimensions. In scoped per note mode, each note will create a new interpreter instance in the same interpreter process. In isolated per note mode, each note will create a new interpreter process.

    • Select the Set permission checkbox and specify the owner names, if you want to restrict access to the selected interpreter.

    • Select the Connect to existing process checkbox to provide a Host and Port on the target server.

    • You can add interpreter Properties or modify the predefined set of properties and their values. Properties are exported as environment variables on the system if the property name consists of upper-case characters, numbers, or underscores ([A-Z_0-9]). Otherwise, the property is set as a common interpreter property. See more details in the Apache Zeppelin documentation.

      For example, you can add the zeppelin.SparkInterpreter.precode property and put some code into the Value field to execute on interpreter init.

      Add the zeppelin.SparkInterpreter.precode property

      This code is resolved in a note after initialization of the interpreter:

      Resolving the zeppelin.SparkInterpreter.precode property                          in a note
    • In the Dependencies area add any library you want to use with the selected interpreter. If needed, specify the files that should be excluded.

  4. Click Refresh to update the list of the interpreters. To restart the selected interpreter, click Restart the interpreter.

Manage repositories

  1. To open Repository Settings, click New interpreter on the interpreter toolbar.

    Manage repositories

    You can refresh the list of the repositories (Refresh), add a new repository (New repository), and remove the selected repository (Remove the selected repository).

  2. To add a new repository, click New repository and fill in the repository settings:

    Mandatory parameters:

    • Id: a unique name of the repository

    • Url: address of the repository

    Optionally, you can set up:

    • Name: a username to access the repository

    • Password: a password to access the repository

    • Host: an HTTP or HTTPS server where the repository resides

    • Port: a port of the repository server

    • Name and Password: user credentials to access the repository server

Last modified: 23 February 2024