Datalore 2024.2 Help

Install on a Kubernetes cluster using Helm charts

The instructions in this article describe the installation of Datalore Enterprise in a Kubernetes cluster using Helm.

It is highly recommended that you have experience using the Kubernetes technology, particularly Helm. For the PoC purpose, we suggest trying the Docker-based installation.

Prerequisites

Before installation, make sure that you have the following:

  • k8s cluster

  • Kubectl on your machine pointed to this cluster

  • Helm

This installation was tested with Kubernetes v1.24 and Helm v3.12.3, but other versions may work too.

Hardware requirements
  • Datalore server machine: 4GB of RAM (the number of CPU is irrelevant if the load is not high)

  • For every concurrently run notebook: from 4GB of RAM

AWS EKS deployment limitations

Datalore's reactive mode may not operate properly on an Amazon EKS cluster with the Amazon Linux (default option) compute nodes. We recommend that you use Ubuntu 20.04 with the corresponding AMIs specifically designed for the EKS.

Here are our tips for AWS EKS deployments:

  • To find an AMI for manual setup, follow this link and select your option based on the cluster version and region.

  • To configure the cluster deployment using Terraform, you can refer to this sample file.

Basic Datalore installation

Follow the instruction to install Datalore using Helm.

Install Datalore

  1. Add the Datalore Helm repository:

    helm repo add datalore https://jetbrains.github.io/datalore-configs/charts
  2. Create a datalore.values.yaml file.

  3. In datalore.values.yaml, add a databaseSecret parameter to set up your database password. A random string is advised.

    databaseSecret: password: xxxx
  4. Configure your volumes. In datalore.values.yaml, add the following parameters:

    volumes: - name: storage ... - name: postgresql-data ...

    where:

    • storage: contains workbook data, such as attached files (UID:GID 5000:5000).

    • postgresql-data: contains PostgreSQL database data (UID:GID 999:999).

    Below are exemplary procedures of configuring your volumes:

    Configure hostPath volumes

    1. Create directories:

      mkdir -p /data/postgresql mkdir -p /data/datalore chown 999:999 /data/postgresql chown 5000:5000 /data/datalore
    2. Add to datalore.values.yaml:

      volumes: - name: postgresql-data hostPath: path: /data/postgresql type: Directory - name: storage hostPath: path: /data/datalore type: Directory

    Use volumeClaimTemplates

    If you set up volume auto-provisioning in Kubernetes, you can replace volumes with volumeClaimTemplates.

    volumeClaimTemplates: - metadata: name: storage spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi - metadata: name: postgresql-data spec: accessModes: - ReadWriteOnce resources: requests: storage: 2Gi
  5. Run the following command and wait for Datalore to start up:

    helm install -f datalore.values.yaml datalore datalore/datalore --version 0.2.17
  6. Go to http://127.0.0.1:8080/ and sign up the first user. The first signed-up user will automatically receive admin rights.

  7. To access Datalore by a domain other than 127.0.0.1, add a URL with this host as the DATALORE_PUBLIC_URL parameter in the datalore.values.yaml file.

    For example, if you want to use the https://datalore.yourcompany.com domain, add the following:

    dataloreEnv: ... DATALORE_PUBLIC_URL: "https://datalore.yourcompany.com"
  8. Click your avatar in the upper right corner, select Admin panel | License and provide your license key.

    Opening Admin panel

Optional procedures

Run Datalore in a non-default namespace

  1. When running Datalore, specify the namespace:

    helm install -n <non_default_namespace> -f datalore.values.yaml datalore datalore/datalore --version 0.2.17
  2. (Optional) If you use a custom config, add the namespace under the agentsConfig key as shown in the code below:

    k8s: namespace: <non_default_namespace> instances: ...

Use an external postgres database

  1. Add two variables under dataloreEnv: database user and database URL.

    dataloreEnv: ... DB_USER: "<database_user>" DB_URL: "jdbc:postgresql://[database_host]:[database_port]/[database_name]"
  2. Set internalDatabase to false.

Enable an email whitelist

Enable a whitelist for new user registration. Only users with emails entered to the whitelist can be registered. The respective tab will be available on the Admin panel.

  1. Open the values.yaml file.

  2. Add the following parameter:

    dataloreEnv: ... EMAIL_ALLOWLIST_ENABLED = TRUE

The respective tab will become available on the Admin panel.

Enable user filtration based on Hub group membership

By default, all Hub users can get registеred unless you disable registration on the Admin panel. If you want to grant Datalore access only to a specific Hub group members, perform the steps below:

  1. Open the values.yaml file.

  2. Add the following parameter:

    dataloreEnv: ... HUB_ALLOWLIST_GROUP: 'group_name', 'group_name1'

Configure notebook code import limit

Set your own value in bytes to configure the limit of notebook code import.

  1. Open the values.yaml file.

  2. Add the following parameter:

    dataloreEnv: VFS_MAX_IMPORT_SOURCE_LENGTH: 'integer, prefixes (K-, M-, etc.) not supported'

Fargate restrictions

While Datalore can operate in Fargate, be aware of the following restrictions:

  • Attached files and reactive mode will not work due to Fargate security policies.

  • Spawning agents in privileged mode, as set up by default, is not supported by Fargate.

  • Fargate does not support EBS volumes, our default volume option. Currently, as a workaround, we suggest that you have an AWS EFS, create PersistentVolume and PersistenVolumeContainer objects, and edit the values.yaml config file as shown in the example below:

    volumeClaimTemplates: - metadata: name: postgresql-data spec: accessModes: - ReadWriteMany storageClassName: efs-sc resources: requests: storage: 2Gi - metadata: name: storage spec: accessModes: - ReadWriteMany storageClassName: efs-sc resources: requests: storage: 10Gi

Further steps

Follow the basic installation with configuration procedures. Some of them are required as you need to customize Datalore Enterprise in accordance with your project.

Procedure

Description

Required

Configure agents

Used to change the default agents configuration

Set up GPU machines

Used to enable GPU machines

Configure plans

Used to customize plans for your Datalore users

Optional

Customize or update environment

Used to create multiple base environments out of custom Docker images

Set up JetBrains Hub

Used to integrate an authentication service

Enable gift codes

Used to enable a service generating and distributing gift codes

Enable email service

Used to activate email notifications

Enable user activity logging

Used to set up auditing of your Datalore users

Last modified: 22 April 2024