Install on a Kubernetes cluster using Helm charts
The instructions in this article describe the installation of Datalore On-Premises on a Kubernetes cluster using Helm.
The chapters in this section describe the processes of installing, configuring, and updating Datalore On-Premises in Kubernetes deployment (Helm charts method).
This is what Kubernetes-based setup for Datalore looks like:
You will learn how to do the following:
Basic installation: You complete the basic procedure to get Datalore On-Premises up and running on the infrastructure of your choice.
Required and optional configuration procedures: You customize and configure Datalore On-Premises. Some of these configurations are essential for you to start working on your projects.
Upgrade procedure: You upgrade your version of Datalore On-Premises. We duly notify you of our new releases.
It is highly recommended that you have experience using the Kubernetes technology, particularly Helm. For the PoC purpose, we suggest trying the Docker-based installation.
- Prerequisites
Before installation, make sure that you have the following:
k8s cluster
Kubectl on your machine pointed to this cluster
Helm
This installation was tested with Kubernetes v1.24 and Helm v3.12.3, but other versions may work too.
- Hardware requirements
Datalore server machine: 4GB of RAM (the number of CPU is irrelevant if the load is not high)
For every concurrently run notebook: from 4GB of RAM
AWS EKS deployment limitations
Datalore's Reactive mode may not operate properly on an Amazon EKS cluster with the Amazon Linux (default option) compute nodes. We recommend that you use Ubuntu 20.04 with the corresponding AMIs specifically designed for the EKS.
Here are our tips for AWS EKS deployments:
To find an AMI for manual setup, follow this link and select your option based on the cluster version and region.
To configure the cluster deployment using Terraform, you can refer to this sample file.
Basic Datalore installation
Follow the instruction to install Datalore using Helm.
Install Datalore
Add the Datalore Helm repository:
helm repo add datalore https://jetbrains.github.io/datalore-configs/chartsCreate a datalore.values.yaml file.
In datalore.values.yaml, add a
databaseSecret
parameter to set up your database password. A random string is advised.databaseSecret: password: xxxxConfigure your volumes. In datalore.values.yaml, add the following parameters:
volumes: - name: storage ... - name: postgresql-data ...where:
storage
: contains workbook data, such as attached files (UID:GID 5000:5000).postgresql-data
: contains PostgreSQL database data (UID:GID 999:999).
Below are exemplary procedures of configuring your volumes:
Configure hostPath volumes
Create directories:
mkdir -p /data/postgresql mkdir -p /data/datalore chown 999:999 /data/postgresql chown 5000:5000 /data/dataloreAdd to datalore.values.yaml:
volumes: - name: postgresql-data hostPath: path: /data/postgresql type: Directory - name: storage hostPath: path: /data/datalore type: Directory
Use volumeClaimTemplates
If you set up volume auto-provisioning in Kubernetes, you can replace
volumes
withvolumeClaimTemplates
.volumeClaimTemplates: - metadata: name: storage spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi - metadata: name: postgresql-data spec: accessModes: - ReadWriteOnce resources: requests: storage: 2GiRun the following command and wait for Datalore to start up:
helm install -f datalore.values.yaml datalore datalore/datalore --version 0.2.22Go to http://127.0.0.1:8080/ and sign up the first user. The first signed-up user will automatically receive admin rights.
To access Datalore by a domain other than 127.0.0.1, add a URL with this host as the
DATALORE_PUBLIC_URL
parameter in the datalore.values.yaml file.For example, if you want to use the https://datalore.yourcompany.com domain, add the following:
dataloreEnv: ... DATALORE_PUBLIC_URL: "https://datalore.yourcompany.com"Click your avatar in the upper right corner, select Admin panel | License and provide your license key.
Optional procedures
Run Datalore in a non-default namespace
To deploy the Datalore server into a non-default namespace, run the following command:
helm install -n <non_default_namespace> -f datalore.values.yaml datalore datalore/datalore --version 0.2.22To specify the non-default namespace for your agents configs, define the
namespace
variable in datalore.values.yaml as shown in the code block below:agentsConfig: k8s: namespace: <non_default_namespace> instances: ...Find more details about configuring agents in this topic
Under
dataloreEnv
in datalore.values.yaml, you can define the following variables:Name
Type
Default value
Description
DATABASES_K8S_NAMESPACE
String
default
K8s namespace where all database connector pods will be spawned.
GIT_TASK_K8S_NAMESPACE
String
default
K8s namespace where all Git-related task pods will be spawned.
Find the full list of customized server configuration options in this topic.
Use an external postgres database
Add two variables under
dataloreEnv
: database user and database URL.dataloreEnv: ... DB_USER: "<database_user>" DB_URL: "jdbc:postgresql://[database_host]:[database_port]/[database_name]"Set
internalDatabase
tofalse
.
Enable an email whitelist
Enable a whitelist for new user registration. Only users with emails entered to the whitelist can be registered.
Open the values.yaml file.
Add the following parameter:
dataloreEnv: ... EMAIL_ALLOWLIST_ENABLED: "true"
The respective tab will be available on the Admin panel.
Enable user filtration based on Hub group membership
By default, all Hub users can get registŠµred unless you disable registration on the Admin panel. If you want to grant Datalore access only to a specific Hub group members, perform the steps below:
Open the values.yaml file.
Add the following parameter:
dataloreEnv: ... HUB_ALLOWLIST_GROUP: 'group_name', 'group_name1'
Configure notebook code import limit
Set your own value in bytes to configure the limit of notebook code import.
Open the values.yaml file.
Add the following parameter:
dataloreEnv: VFS_MAX_IMPORT_SOURCE_LENGTH: 'integer, prefixes (K-, M-, etc.) not supported'
Fargate restrictions
While Datalore can operate in Fargate, be aware of the following restrictions:
Attached files and reactive mode will not work due to Fargate security policies.
Spawning agents in privileged mode, as set up by default, is not supported by Fargate.
Fargate does not support EBS volumes, our default volume option. Currently, as a workaround, we suggest that you have an AWS EFS, create
PersistentVolume
andPersistenVolumeContainer
objects, and edit the values.yaml config file as shown in the example below:volumeClaimTemplates: - metadata: name: postgresql-data spec: accessModes: - ReadWriteMany storageClassName: efs-sc resources: requests: storage: 2Gi - metadata: name: storage spec: accessModes: - ReadWriteMany storageClassName: efs-sc resources: requests: storage: 10Gi
Further steps
Follow the basic installation with configuration procedures. Some of them are required as you need to customize Datalore On-Premises in accordance with your project.
Procedure | Description |
---|---|
Required | |
Used to change the default agents configuration | |
Used to enable GPU machines | |
Used to customize plans for your Datalore users | |
Optional | |
Used to create multiple base environments out of custom Docker images | |
Used to integrate an authentication service | |
Used to enable a service generating and distributing gift codes | |
Used to activate email notifications | |
Used to set up auditing of your Datalore users |
We also recommend referring to this page for the full list of Datalore server configuration options.
Keywords
Datalore installation, Datalore deployment, install Datalore, installation procedures, installation requirements, Kubernetes deployment