Datalore Help

Install Datalore Enterprise using Docker or Kubernetes (Helm)

The instructions in this article describe both installation methods: Docker (on Linux only) and Kubernetes.

Prerequsites

Before installation, make sure that you have the following:

  • Docker

  • Docker Compose V2

To check the requirements, execute docker compose version.

Expected result:

docker compose version Docker Compose version v2.2.3

  • K8s cluster (tested with v1.22 version of Kubernetes)

  • Kubectl on your machine pointed to this cluster

  • Helm (tested with the v3.2.4 version)

Hardware requirements
  • Datalore server machine: 4GB of RAM (the number of CPU is irrelevant if the load is not high)

  • For every additional computational machine (pod or container) per notebook: minimum 2GB of RAM required, 4GB of RAM recommended

Basic Datalore installation

Follow the instruction to install Datalore using the selected method.

Install Datalore

  1. Clone or download the content of this repository.

  2. Do the following to set up your database:

    Open docker-compose.yaml in [repository_folder]/docker-compose folder in any text editor and replace the values of DB_PASSWORD and POSTGRES_PASSWORD properties with any random string (both properties must have the same value). This string will be used as your database password. Make sure you keep it secret.

    Create a datalore.yaml file and add there your volume configuration and dbRootPassword (to be used as your database password, random string advised).

    For example:

    ``` volumes: - name: storage emptyDir: { } - name: postgresql-data emptyDir: { } dbRootPassword: "super_secret_password" ```
  3. Run the following command and wait for Datalore to start up:

    docker compose up
    helm install -f datalore.values.yaml <path_to_datalore-configs>/charts/datalore
  4. Go to http://127.0.0.1:8080/ and create the first user. The first created user will automatically receive admin rights.

  5. Click your avatar in the upper-right corner, select Admin panel and provide your license key.

    Opening Admin panel

(Optional) Use Hub for authentication services

JetBrains Hub supports most popular auth modules. Follow the procedures below to run Hub for Datalore.

Run Hub for Docker

  1. Add following volumes to your docker-compose file:

    ``` volumes: hub-data: { } hub-conf: { } hub-logs: { } hub-backups: { } ```
  2. Add following service to your docker-compose file:

    ``` services: … hub: image: jetbrains/hub:2021.1.14194 ports: - "8082:8080" networks: - datalore-backend-network volumes: - "hub-data:/data/hub/data" - "hub-conf:/data/hub/conf" - "hub-logs:/data/hub/logs" - "hub-backups:/data/hub/backups" ```
  3. Run docker compose up hub to run Hub only (we assume that Datalore is not running at the moment).

If you have already installed Hub, go to the Configuration procedure. You can find more details about the Hub installation process here.

Install Hub

  1. Configure persistent storage by setting the volumes or volumeClaimTemplates helm parameter.

  2. Install the Hub Helm chart using the helm install command.

  3. (Optional) It is assumed that you can access Hub at http://localhost:8082. For it to work, forward the port with the following command:

    kubectl port-forward --address 0.0.0.0 service/hub 8082
  4. Check the container output using the kubectl logs service/hub command. It should contain a line like this:

    JetBrains Hub 2021.1 Configuration Wizard will listen inside the container on {0.0.0.0:8080}/ after start and can be accessed by this URL: [http://<put-your-docker-HOST-name-here>:<put-host-port-mapped-to-container-port-8080-here>/?wizard_token=pPXTShp4NXceXqGYzeAq].

    Copy the wizard_token value to the clipboard.

  5. Go to http://localhost:8082/ and insert the token from the previous step into the Token field.

  6. Click the Log in button.

  7. Click the Set Up link.

  8. Generate a URL (referred to as HUB_ROOT_URL later) to access Hub from Datalore. Consider the following:

    • The URL must be accessible from both the cluster pods and the browser (by the end users of your Datalore installation).

    • The URL must point to the / path of your Hub installation, i.e. http://127.0.0.1:8080/ inside the container where Hub is launched (by default, the hub-0) pod.

    • How you set up your cluster to serve such a URL depends on the specifics of your cluster configuration.

  9. In Base URL, enter HUB_ROOT_URL. Do not change the Application Listen Port setting.

  10. Click the Next button.

  11. Configure the admin account (set the admin password).

  12. Click the Finish button and wait for Hub to start.

Configure Hub

Go to HUB_ROOT_URL and log into Hub via admin account.

Configure the Datalore service

  1. Create one more URL (referred to as DATALORE_ROOT_URL later) to access Datalore. Consider the following:

    • The URL must be accessible from the browser (by the end users of your Datalore installation).

    • The URL must point to the / path of your Datalore installation, i.e. http://127.0.0.1:8080/ inside the container where Datalore will be launched (by default, it is pod datalore-on-premise-0).

    • How you set up your cluster to serve such a URL depends on the specifics of your cluster configuration.

  2. Go to Services(${HUB_ROOT_URL}/hub/services) and click the New service button. Use the name datalore and enter DATALORE_ROOT_URL in Home URL.

  3. Copy the ID field value and save it somewhere: it is used when configuring Datalore ($HUB_DATALORE_SERVICE_ID property).

  4. Click the Change... button next to the Secret label.

  5. Copy the generated secret and save it somewhere: it will be used when configuring Datalore ($HUB_DATALORE_SERVICE_SECRET property).

  6. Click the Change secret button.

  7. Enter DATALORE_ROOT_URL in the Base URLs field.

  8. Enter the line /api/hub/openid/login in the Redirect URIs field.

  9. Click the Trust Service button in the upper right corner.

  10. Click the Save button.

Create a Hub token

  1. Go to Users (${HUB_ROOT_URL}/hub/users).

  2. Click your admin username.

  3. Go to the Authentication tab.

  4. Click the New token... button.

  5. Add Hub and Datalore into Scope. You can use any Name. Click the Create button. Copy the token (with the perm: prefix) and save it somewhere. It will be used when configuring Datalore ($HUB_PERM_TOKEN property).

(Optional) Force email verification

Datalore uses user emails from Hub; so it is recommended to force email verification in Hub. Users with unverified emails will not be able to use Datalore.

  1. Configure the SMTP server:

    • Go to SMTP (${HUB_ROOT_URL}/hub/smtp-settings).

    • Click the Configure SMTP server... button.

    • Configure your SMTP server parameters.

    • Click the Save button.

    • Click the Enable notifications button.

    • (Optional) To make sure your configuration is working, click the Send Test message button.

  2. Enable email verification:

    • Go to Auth Modules (${HUB_ROOT_URL}/hub/authmodules).

    • Open the Common settings page.

    • Enable the Email verification option.

    • Click the Save button.

  3. Set and verify an admin user email:

    • Go to Users (${HUB_ROOT_URL}/hub/users).

    • Click your admin username.

    • Set an email in the Email field.

    • Click the Save button.

    • Click the Send verification email link.

    • Find the verification email in your inbox and click the Verify email address button.

(Optional) Ban a guest user

  1. Go to Users (${HUB_ROOT_URL}/hub/users).

  2. Select a guest user.

  3. Click the Ban button.

(Optional) Enable auth modules

  1. Go to Auth Modules (${HUB_ROOT_URL}/hub/authmodules).

  2. Add or remove auth modules (for example, Google auth, GitHub auth, LDAP, and so on). Find more details here.

Configure the Datalore service

Edit the docker-compose file (for Docker) or values under the dataloreEnv key in the datalore (for Helm).

Define the following environment values:

HUB_PUBLIC_BASE_URL

Base public (accessible via browser) URL of your Hub installation (${HUB_ROOT_URL}/hub from the Install Hub section, for example, https://hub.your.domain/hub).

HUB_DATALORE_SERVICE_ID

ID of the Datalore service in Hub (see Configure the Datalore service).

HUB_DATALORE_SERVICE_SECRET

Token of the Datalore service in Hub (see Configure the Datalore service).

HUB_PERM_TOKEN

Token for accessing Datalore and Hub scopes (see Create a Hub token).

HUB_FORCE_EMAIL_VERIFICATION

Used to specify whether email verification is required from the Datalore user.

Example (Docker):

``` services: datalore: environment: DB_PASSWORD: "changeme" DATABASES_DOCKER_NETWORK: "datalore-agents-network" ADMIN_API_AUTH_TOKEN: "changeme" HUB_PUBLIC_BASE_URL: "http://127.0.0.1:8082/hub" HUB_DATALORE_SERVICE_ID: "9030674b-2679-495a-b606-c554384f42a3" HUB_DATALORE_SERVICE_SECRET: "sHCpaPQfPWco" HUB_PERM_TOKEN: "perm:YWRtaW4=.NDUtMA==.MBJEauHYuzg9nSXS6d1FkJ93zZcZvT" HUB_FORCE_EMAIL_VERIFICATION: "false" ```

(Optional, Helm only) Run Datalore in a non-default namespace

  1. Specify the namespace when running Datalore:

    helm install -n [non_default_namespace] -f datalore.values.yaml datalore charts/datalore/
  2. Add the namespace under the agentsConfig key as shown in the code below::

    k8s: namespace: datalore instances: ...
  3. Add DATABASES_K8S_NAMESPACE: "[non_default_namespace]" under the dataloreEnv key.

(Optional, Helm only) Use an external postgres database

  1. Add two variables under dataloreEnv:

    • DB_USER: "[database_user]" to specify the database user

    • DB_URL: "jdbc:postgresql://[database_host]:[database_port]/[database_name]" to specify the database URL

  2. Set internalDatabse to false.

All config files in Helm

Modify the following Helm values in the datalore chart.

dataloreEnv

Editing fields under this key is mandatory to get everything working. The format is as follows:

dataloreEnv: KEY_NAME: "key_value" ...

Mandatory parameters

DATALORE_PUBLIC_URL

URL by which Datalore is accessed (DATALORE_ROOT_URL). It is used to generate links.

HUB_PUBLIC_BASE_URL

Base public (accessible via browser) URL of your Hub installation (${HUB_ROOT_URL}/hub from the Install Hub section, for example, https://hub.your.domain/hub).

HUB_DATALORE_SERVICE_ID

ID of the Datalore service in Hub (see Configure the Datalore service).

HUB_DATALORE_SERVICE_SECRET

Token of the Datalore service in Hub (see Configure the Datalore service).

HUB_PERM_TOKEN

Token for accessing Datalore and Hub scopes (see Create a Hub token).

HUB_FORCE_EMAIL_VERIFICATION

Used to specify whether email verification is required from the Datalore user.

DATABASES_BASE_URL

Must always be equal to "http://${SQL_SERVER_HOST}:${SQL_SERVER_PORT}".

SQL_SERVER_HOST

Internal hostname for the datalore service. Must be equal to DATALORE_INTERNAL_HOST.

DATALORE_INTERNAL_HOST

Internal hostname for the datalore service.

DEFAULT_INSTANCE_TYPE_ID

ID of the instance type that will be used by default (for more information, see the agentsСonfig description).

DEFAULT_PACKAGE_MANAGER

Default package manager.

DEFAULT_BASE_ENV_NAME

Default environment, matches one of the default package manager environments.

MAIL_ENABLED

If set to true, enables Datalore to send emails (welcome emails, sharing invitations, etc) and requires the following parameters:

  • MAIL_SENDER_EMAIL: sender's email

  • MAIL_SENDER_NAME: sender's name

  • MAIL_SENDER_USERNAME: username of SMTP user

  • MAIL_SENDER_PASSWORD password of SMTP user

  • MAIL_SMTP_SERVER: SMTP server host

  • MAIL_SMTP_PORT: SMTP server port

ADMIN_API_AUTH_TOKEN

Environment variable for an API token to set up an admin user.

Optional parameters

HUB_INTERNAL_BASE_URL (default: http://hub:8082/hub)

URL to access Hub from inside the cluster. Used if HUB_PUBLIC_BASE_URL is only available from outside and not inside the cluster.

DATABASES_K8S_NAMESPACE (default: default)

Name of the Kubernetes namespace where Datalore is installed. Used if you plan to install Datalore in a namespace other than default.

dbRootPassword

Used to set up the PostgreSQL password. There is one field to override:

  • ROOT_PASSWORD: root user's password. The database can be accessed on port 5432 with the username postgres and this password.

internalDatabase

Used to specify if you use an external database (for example, AWS RDS). To use an external database, set it to false and specify DB_USER and DB_URL under the dataloreEnv key.

volume, volumeClaimTemplates

Used to configure persistent storage.

The config has two Kubernetes volumes described:

  • storage: contains workbook data, such as attached files (UID:GID 5000:5000).

  • postgresql-data: contains PostgreSQL database data (UID:GID 999:999).

agentsConfig

Used to define agent types (such as Basic and Large machines in the cloud version of Datalore). It has the following schema:

k8s: instances: - id: <Unique instance ID> label: <Instance name> description: <Short description of what the instance is> features: <Information to be displayed in the tooltip text when hovering over the instance> minAllowed: <Minimum number of instances to be preserved in the pool> maxAllowed: <Maximum number of instances to be preserved in the pool> numCPUs: <Number of CPUs> cpuMemoryText: <CPU memory> numGPUs: <Number of GPUs> gpuMemoryText: <GPU memory> yaml: <Kubernetes config of Pod to be used for the instance> - id: <Another type with the same schema as above> ...

The minAllowed and maxAllowed fields are used to configure the number of pre-created instances, which will speed up the process of starting up notebooks.

logbackConfig

Used to collect logs from Datalore and agents. We provide the default one, which prints requested information to stdout, but you can configure it any way you like. Find more information on how to configure Logback in the official documentation.

Email, plan, and gift code options

Enable emails

By default, emails are not enabled and email verification for users is disabled. To enable the email service, specify the following parameters in the docker-compose file (for Docker) or values under the dataloreEnv key in the datalore (for Helm):

  • MAIL_SENDER_EMAIL: sender's email

  • MAIL_SENDER_NAME: sender's name

  • MAIL_SENDER_USERNAME: username of SMTP user

  • MAIL_SENDER_PASSWORD password of SMTP user

  • MAIL_SMTP_SERVER: SMTP server host

  • MAIL_SMTP_PORT: SMTP server port

If you want to disable email verification while having enabled emails, you can explicitly set the FORCE_EMAIL_VERIFICATION property to false.

Example (Docker)

``` services: datalore: ... environment: … MAIL_SMTP_SERVER: "email-smtp.your_domain.com" MAIL_SMTP_PORT: "465" MAIL_SENDER_USERNAME: "email_user" MAIL_SENDER_PASSWORD: "pa$$w0rd" MAIL_SENDER_EMAIL: "no_reply.datalore@you_domain.com" MAIL_SENDER_NAME: "Datalore Team" FORCE_EMAIL_VERIFICATION: "false" ```
Enable plans

If you want to limit resources which are available for users you could use the plans feature.

In order to enable it you need to mount the configuration file into the Datalore container and put the path to it into the DATALORE_PLANS_CONFIGURATION variable.

``` services: datalore: ... environment: … DATALORE_PLANS_CONFIGURATION: "/opt/datalore/configs/plans.yaml" ```

Set ENABLE_PLANS to true under dataloreEnv and configure plans under the plansConfig key.

Each plan has a unique planId. Make sure that one of them is marked as default. When adding or configuring a plan, specify the following parameters:

Option

Value example

Description

diskUsageLimit

10 Gb

Disk space in persistence storage allowed per user

numRunningInstancesLimit

10

Number of parallel running agents allowed per user

parallelInstancesQuota

PT100500H

Deprecated

instanceDurationQuotaMap

basic-agent: "PT100H"

large-agent: "PT10H"

Computation time per month available for each agent type; more than one agent type allowed

Example (Docker):

``` - planId: "Default plan" default: true instanceDurationQuotaMap: basic-agent: "PT100H" diskUsageLimit: "10.00 GB" numRunningInstancesLimit: 3 parallelInstancesQuota: "PT100500H" - planId: "Extra plan" instanceDurationQuotaMap: basic-agent: "P2DT3H4M" diskUsageLimit: "120.00 GB" numRunningInstancesLimit: 999 parallelInstancesQuota: "PT100500H" ```

Enable gift codes

Turn on the option by setting GIFT_CODES_ENABLED to true in the docker-compose file (for Docker) or values under the dataloreEnv key in the datalore (for Helm).

Example (Docker):

``` services: datalore: ... environment: … GIFT_CODES_ENABLED: "true" ```

Refer to Gift codes to find information on how to use gift codes.

Last modified: 20 June 2022