Datalore Help

Install Datalore Enterprise on Kubernetes with Kustomize

Infrastructure and process

The diagram below shows the Datalore on-premises infrastructure using Kubernetes.

Datalore on-premises infrastructure using Kubernetes

To install Datalore on-premises, first install and configure Hub, which provides a single point of entry for user management. The procedures below describe both stages of the process.

Before you begin

Download this repository. Make sure that:

  • If you're using git clone, check out branch 2021.3.4 before continuing installation.

  • You run all shell commands in the <repository_root>/k8s folder.

Install and configure Hub

If you have already installed Hub, go to the Configuration procedure. You can find more details about the Hub installation process here.

Install Hub

  1. Configure Hub persistent volumes: change the emptyDir values in the volumes section of the ./hub/statefulSet.yaml file to the volumes available in your kubernetes cluster.

  2. Run Hub using the kubectl apply -k ./hub/ command.

  3. (Optional) It is assumed that you can access Hub at http://localhost:8082. For it to work, forward the port with the following command:

    kubectl port-forward --address 0.0.0.0 service/hub 8082
  4. Check the container output using the kubectl logs service/hub command. It should contain a line like this:

    JetBrains Hub 2021.1 Configuration Wizard will listen inside the container on {0.0.0.0:8080}/ after start and can be accessed by this URL: [http://<put-your-docker-HOST-name-here>:<put-host-port-mapped-to-container-port-8080-here>/?wizard_token=pPXTShp4NXceXqGYzeAq].

    Copy the wizard_token value to the clipboard.

  5. Go to http://localhost:8082/ and insert the token from the previous step into the Token field.

  6. Click the Log in button.

  7. Click the Set Up link.

  8. Generate a URL (referred to as HUB_ROOT_URL later) to access Hub from Datalore. Consider the following:

    • The URL must be accessible from both the cluster pods and the browser (by the end users of your Datalore installation).

    • The URL must point to the / path of your Hub installation, i.e. http://127.0.0.1:8080/ inside the container where Hub is launched (by default, the hub-0) pod.

    • How you set up your cluster to serve such a URL depends on the specifics of your cluster configuration.

  9. In Base URL, enter HUB_ROOT_URL. Do not change the Application Listen Port setting.

  10. Click the Next button.

  11. Configure the admin account (set the admin password).

  12. Click the Finish button and wait for the Hub startup.

Configure Hub

Go to HUB_ROOT_URL and log into Hub via admin account.

Configure the Datalore service

  1. Create one more URL (referred to as DATALORE_ROOT_URL later) to access Datalore. Consider the following:

    • The URL must be accessible from the browser (by the end users of your Datalore installation).

    • The URL must point to the / path of your Datalore installation, i.e. http://127.0.0.1:8080/ inside the container where Datalore will be launched (by default, it is pod datalore-on-premise-0).

    • How you set up your cluster to serve such a URL depends on the specifics of your cluster configuration.

  2. Go to Services (${HUB_ROOT_URL}/hub/services) and click the New service button. Use the name datalore and enter DATALORE_ROOT_URL in Home URL.

  3. Copy the ID field value and save it somewhere: it is used when configuring Datalore ($HUB_DATALORE_SERVICE_ID property).

  4. Click the Change... button next to the Secret label.

  5. Copy the generated secret and save it somewhere: it will be used when configuring Datalore ($HUB_DATALORE_SERVICE_SECRET property).

  6. Click the Change secret button.

  7. Enter DATALORE_ROOT_URL in the Base URLs field.

  8. Enter the line /api/hub/openid/login in the Redirect URIs field.

  9. Click the Trust Service button in the upper-right corner.

  10. Click the Save button.

Create a Hub token

  1. Go to Users (${HUB_ROOT_URL}/hub/users).

  2. Click your admin username.

  3. Go to the Authentication tab.

  4. Click the New token... button.

  5. Add Hub and Datalore into Scope. You can use any Name. Click the Create button. Copy the token (with the perm: prefix) and save it somewhere. It will be used when configuring Datalore ($HUB_PERM_TOKEN property).

(Optional) Force email verification

Datalore uses user emails from Hub; so it is recommended to force email verification in Hub. Users with unverified emails will not be able to use Datalore.

  1. Configure the SMTP server:

    • Go to SMTP (${HUB_ROOT_URL}/hub/smtp-settings).

    • Click the Configure SMTP server... button.

    • Configure your SMTP server parameters.

    • Click the Save button.

    • Click the Enable notifications button.

    • (Optional) To make sure your configuration is working, click the Send Test message button.

  2. Enable email verification:

    • Go to Auth Modules (${HUB_ROOT_URL}/hub/authmodules).

    • Open the Common settings page.

    • Enable the Email verification option.

    • Click the Save button.

  3. Set and verify an admin user email:

    • Go to Users (${HUB_ROOT_URL}/hub/users).

    • Click your admin username.

    • Set an email in the Email field.

    • Click the Save button.

    • Click the Send verification email link.

    • Find the verification email in your inbox and click the Verify email address button.

(Optional) Ban a guest user

  1. Go to Users (${HUB_ROOT_URL}/hub/users).

  2. Select a guest user.

  3. Click the Ban button.

(Optional) Enable auth modules

  1. Go to Auth Modules (${HUB_ROOT_URL}/hub/authmodules).

  2. Add or remove auth modules (for example, Google auth, GitHub auth, LDAP, and so on).

Install Datalore

Configure Datalore

To simplify the configuration process, the Kubernetes config is split into small chunks and assembled with the Kustomize tool (-k flag of kubectl). Edit the following files in the datalore/configs directory to configure your Datalore installation.

user_secret_env.sh

Editing this file is mandatory to get everything working. The file has the following fields:

Mandatory parameters

FRONTEND_URL

URL by which Datalore is accessed (DATALORE_ROOT_URL). It is used to generate links.

HUB_PUBLIC_BASE_URL

Base public (accessible via browser) URL of your Hub installation (${HUB_ROOT_URL}/hub from the Install Hub section, for example, https://hub.your.domain/hub).

HUB_DATALORE_SERVICE_ID

ID of the Datalore service in Hub (see Configure the Datalore service).

HUB_DATALORE_SERVICE_SECRET

Token of the Datalore service in Hub (see Configure the Datalore service).

HUB_PERM_TOKEN

Token for accessing Datalore and Hub scopes (see Create a Hub token).

HUB_FORCE_EMAIL_VERIFICATION

Used to specify whether email verification is required from the Datalore user.

DEFAULT_INSTANCE_TYPE_ID

ID of the instance type that will be used by default (for more information, see the agents_config.yaml config file description).

DEFAULT_PACKAGE_MANAGER

Default package manager.

DEFAULT_BASE_ENV_NAME

Default environment, matches one of the default package manager environments.

MAIL_ENABLED

If set to true, enables Datalore to send emails (welcome emails, sharing invitations, etc) and requires the following parameters:

  • MAIL_SENDER_EMAIL: sender's email

  • MAIL_SENDER_NAME: sender's name

  • MAIL_SENDER_USERNAME: username of SMTP user

  • MAIL_SENDER_PASSWORD password of SMTP user

  • MAIL_SMTP_SERVER: SMTP server host

  • MAIL_SMTP_PORT: SMTP server port

ADMIN_API_AUTH_TOKEN

Environment variable for an API token to set up an admin user.

Optional parameters

HUB_INTERNAL_BASE_URL (default: http://hub:8082/hub)

URL to access Hub from inside the cluster. Used if HUB_PUBLIC_BASE_URL is only available from outside and not inside the cluster.

DATABASES_K8S_NAMESPACE (default: default)

Name of the Kubernetes namespace where Datalore is installed. Used if you plan to install Datalore in a namespace other than default.

DATALORE_INTERNAL_HOST (default: datalore-on-premise)

Internal hostname for the datalore-on-premise service.

SQL_SERVER_HOST (default: datalore-on-premise)

Internal hostname for the datalore-on-premise service. Must be equal to DATALORE_INTERNAL_HOST.

SQL_SERVER_PORT (default: 8081)

Internal Datalore API port.

DATABASES_BASE_URL

Must always be equal to "http://${SQL_SERVER_HOST}:${SQL_SERVER_PORT}".

db_secret_env.txt

This config file is used to configure PostgreSQL connection from Datalore. There is one field to override:

  • ROOT_PASSWORD: root user's password. The database can be accessed on port 5432 with the username postgres and this password.

volumes_config.yaml

The config has two Kubernetes volumes described:

  • storage: contains workbook data, such as attached files (UID:GID 5000:5000).

  • postgresql-data: contains PostgreSQL database data (UID:GID 999:999).

agents_config.yaml

This config file is used to define agent types (such as Basic and Large machines in the cloud version of Datalore). It has the following schema:

k8s: instances: - id: <Unique instance ID> label: <Instance name> description: <Short description of what the instance is> minAllowed: <Minimum number of instances to be preserved in the pool> maxAllowed: <Maximum number of instances to be preserved in the pool> yaml: <Kubernetes config of Pod to be used for the instance> - id: <Another type with the same schema as above> ...

The minAllowed and maxAllowed fields are used to configure the number of pre-created instances, which will speed up the process of starting up notebooks.

images_config.yaml, introspection.yaml, connection_checker.yaml

These config files are used to define Datalore container images. Most likely, you will need to change this only to update your installation with newer versions of on-premises images.

logback.xml

This is the Logback configuration file that will be used to collect logs from Datalore and agents. We provide the default one, which prints requested information to stdout, but you can configure it any way you like. Find more information on how to configure Logback in the official documentation.

Run Datalore

Use the following commands:

  • Start: kubectl apply -k ./datalore/

  • Stop: kubectl delete -k ./datalore/

(Optional) Run Datalore in a non-default namespace

  1. Specify the namespace when running Datalore:

    kubectl -n [non_default_namespace] apply -k ./datalore/
  2. Open the agents_config.yaml file and add the namespace as shown in the code below:

    k8s: namespace: [non_default_namespace] instances: ...
  3. Add export DATABASES_K8S_NAMESPACE="[non_default_namespace]" in the user_secret_env.sh file.

Admin user and licenses

Set up an admin user

Create a user with admin rights to access the admin panel feature.

  1. Log into Hub (HUB_ROOT_URL) as the user you want to grant an admin role to. Make sure this user's email is set.

  2. Log into Datalore with the same user (DATALORE_ROOT_URL) and accept Terms of Service.

  3. Send a POST request to http://<DATALORE_ROOT_URL>/api/admin/user/role?email=<EMAIL_OF_ADMIN_USER>&role=<NEW_USER_ROLE> with the header Authorization: <ADMIN_API_AUTH_TOKEN>.

  4. Choose one of the following user roles:

    • REGULAR: regular user. Can be used to demote the user from the admin role.

    • ADMIN: admin user with access to the admin panel.

    • SUPER_ADMIN: admin user who can also change other users' roles via the admin panel.

Add a license

To use Datalore, you need to activate your license (provided in license.key). Only with an activated license, you can start computations and create more than one user. You have the following options:

  • (Recommended) Use the admin panel:

    1. Set up your admin user.

    2. Open http://<DATALORE_ROOT_URL>/admin/license.

    3. Add your license.key in the Add new license field.

    Once submitted and verified, the license will be immediately activated (no restart needed). Licenses are persisted in the database, so they will work even after restart.

  • Use the admin REST API as an alternative to the admin panel:

    1. Send a POST request to http://<DATALORE_ROOT_URL>/api/admin/license with the header Authorization: <ADMIN_API_AUTH_TOKEN> (token from this procedure).

    2. Place your license.key in the request body.

Last modified: 18 January 2022