Datalore Help

Install Datalore Enterprise using Kubernetes

Infrastructure and process

The diagram below shows the Datalore on-premises infrastructure using Kubernetes.

Datalore on-premises infrastructure using Kubernetes

To install Datalore on-premises, first install and configure Hub, which provides a single point of entry for user management. The procedures below describe both stages of the process.

Before you being

Download this repository. Make sure that:

  • If you're using git clone, your checkout branch is on-premises-0.3.0 before continuing installation.

  • You run all shell commands in the <repository_root>/k8s folder.

Install and configure Hub

If you have already installed Hub, go to the Configuration procedure. You can find more details about the Hub installation process here.

Install Hub

  1. Configure Hub persistent volumes: change the emptyDir values in the volumes section of the ./hub/statefulSet.yaml file to the volumes available in your kubernetes cluster.
  2. Run Hub using the kubectl apply -k ./hub/ command.

  3. (Optional) It is assumed that you can access Hub at http://localhost:8082. For it to work, forward the port with the following command:

    kubectl port-forward --address 0.0.0.0 service/hub 8082

  4. Check the container output using the kubectl logs service/hub command. It should contain a line like this:
    JetBrains Hub 2020.1 Configuration Wizard will listen inside the container on {0.0.0.0:8080}/ after start and can be accessed by this URL: [http://<put-your-docker-HOST-name-here>:<put-host-port-mapped-to-container-port-8080-here>/?wizard_token=pPXTShp4NXceXqGYzeAq].

    Copy the wizard_token value to the clipboard.

  5. Go to http://localhost:8082/ and insert the token from the previous step into the Token field.

  6. Click the Log in button.

  7. Click the Set Up link.

  8. Generate a URL (referred to as HUB_ROOT_URL later) to access Hub from Datalore. Consider the following:
    • The URL must be accessible from both the cluster pods and the browser (by the end users of your Datalore installation).

    • The URL must point to the / path of your Hub installation, i.e. http://127.0.0.1:8080/ inside the container where Hub is launched (by default, the hub-0) pod.

    • How you set up your cluster to serve such a URL depends on the specifics of your cluster configuration.

  9. In Base URL, enter HUB_ROOT_URL. Do not change the Application Listen Port setting.

  10. Click the Next button.

  11. Configure the admin account (set the admin password).

  12. Click the Finish button and wait for the Hub startup.

Configure Hub

Go to HUB_ROOT_URL and log into Hub via admin account.

Configure the Datalore service

  1. Create one more URL (referred to as DATALORE_ROOT_URL later) to access Datalore. Consider the following:
    • The URL must be accessible from the browser (by the end users of your Datalore installation).

    • The URL must point to the / path of your Datalore installation, i.e. http://127.0.0.1:8080/ inside the container where Datalore will be launched (by default, it is pod datalore-on-premise-0).

    • How you set up your cluster to serve such a URL depends on the specifics of your cluster configuration.

  2. Go to Services (${HUB_ROOT_URL}/hub/services) and click the New service button. Use the name datalore and enter DATALORE_ROOT_URL in Home URL.

  3. Copy the ID field value and save it somewhere: it is used when configuring Datalore ($HUB_DATALORE_SERVICE_ID property).

  4. Click the Change... button next to the Secret label.

  5. Copy the generated secret and save somewhere: it will be used when configuring Datalore ($HUB_DATALORE_SERVICE_SECRET property).

  6. Click the Change secret button.

  7. Enter DATALORE_ROOT_URL in the Base URLs field.

  8. Enter the line /api/hub/openid/login in the Redirect URIs field.

  9. Click the Trust button in the upper-right corner.

  10. Click the Save button.

Create a Hub token

  1. Go to Users (${HUB_ROOT_URL}/hub/users).

  2. Click your admin username.

  3. Go to the Authentication tab.

  4. Click the New token... button.

  5. Add Hub and Datalore into Scope. You can use any Name. Click the Create button. Remember the token. It will be used when configuring Datalore ($HUB_PERM_TOKEN property).

(Optional) Force email verification

Datalore uses user emails from Hub; so it is recommended to force email verification in Hub. Users with unverified emails will not be able to use Datalore.

  1. Configure the SMTP server:
    • Go to SMTP (${HUB_ROOT_URL}/hub/smtp-settings).

    • Click the Configure SMTP server... button.

    • Configure your SMTP server parameters.

    • Click the Save button.

    • Click the Enable notifications button.

    • (Optional) To make sure your configuration is working, click the Send Test message button.

  2. Enable email verification:
    • Go to Auth Modules (${HUB_ROOT_URL}/hub/authmodules).

    • Open the Common settings page.

    • Enable the Email verification option.

    • Click the Save button.

  3. Set and verify an admin user email:
    • Go to Users (${HUB_ROOT_URL}/hub/users).

    • Click your admin username.

    • Set an email in the Email field.

    • Click the Save button.

    • Click the Send verification email link.

    • Find the verification email in your inbox and click the Verify email address button.

(Optional) Ban a guest user

  1. Go to Users (${HUB_ROOT_URL}/hub/users).

  2. Select a guest user.

  3. Click the Ban button.

(Optional) Enable auth modules

  1. Go to Auth Modules (${HUB_ROOT_URL}/hub/authmodules).

  2. Add or remove auth modules (e.g. Google auth, GitHub auth, LDAP, etc.).

Install Datalore

To run Datalore, you need Kubernetes. We recommend using version 1.17.6, but other versions can also be applicable.

Configure Datalore

To simplify the configuration process, the Kubernetes config is split into small chunks and assembled with the Kustomize tool (-k flag of kubectl). Edit the following files in the datalore/configs directory to configure your Datalore installation.

user_config.yaml

Editing this file is mandatory to get everything working. The file has the following fields:

FRONTEND_URLURL by which Datalore is accessed (DATALORE_ROOT_URL). It is used to generate links.
HUB_PUBLIC_BASE_URLBase public (accessible via browser) URL of your Hub installation (${HUB_ROOT_URL}/hub from the Install Hub section, for example, https://hub.your.domain/hub).
HUB_INTERNAL_BASE_URLBase internal (accessible from the datalore pod) URL of your Hub installation (in most cases can be equal to ${HUB_PUBLIC_BASE_URL}).
HUB_DATALORE_SERVICE_IDID of the Datalore service in Hub (see Configure the Datalore service).
HUB_DATALORE_SERVICE_SECRETToken of the Datalore service in Hub (see Configure the Datalore service).
HUB_PERM_TOKENToken for accessing Datalore and Hub scopes (see Create a Hub token).
DEFAULT_INSTANCE_TYPE_IDID of the instance type that will be used by default (for more information, see agents_config.yaml).
DEFAULT_PACKAGE_MANAGERPackage manager selected by default. Can be set to pip or conda.
DEFAULT_BASE_ENV_NAMEName of the default environment, matching one of the default package manager environments.
MAIL_ENABLED

If set to true, enables Datalore to send emails (welcome emails, sharing invitations, etc) and requires the following parameters:

  • MAIL_SENDER_EMAIL: sender's email

  • MAIL_SENDER_NAME: sender's name

  • MAIL_SENDER_USERNAME: username of SMTP user

  • MAIL_SENDER_PASSWORD password of SMTP user

  • MAIL_SMTP_SERVER: SMTP server host

  • MAIL_SMTP_PORT: SMTP server port

ADMIN_API_AUTH_TOKENEnvironment variable defined to set up an admin user. It is recommended to remove it from the user_config.yaml file after you complete the procedure.
db_config.yaml

This config file is used to configure PostgreSQL connection from Datalore. There is one field to override:

  • ROOT_PASSWORD: root user's password. The database can be accessed on port 5432 with the username postgres and this password.

volumes_config.yaml

The config has two Kubernetes volumes described:

  • storage: contains workbook data, such as attached files (UID:GID 5000:5000).

  • postgresql-data: contains PostgreSQL database data (UID:GID 999:999).

agents_config.yaml

This config file is used to define agent types (such as Basic and Large machines in the cloud version of Datalore). It has the following schema:

k8s: instances: - id: <Unique instance ID> label: <Instance name> description: <Short description of what the instance is> minAllowed: <Minimum number of instances to be preserved in the pool> maxAllowed: <Maximum number of instances to be preserved in the pool> yaml: <Kubernetes config of Pod to be used for the instance> - id: <Another type with the same schema as above> ...

The minAllowed and maxAllowed fields are used to configure the number of pre-created instances, which will speed up the process of starting up notebooks.

images_config.yaml

This config file is used to define Datalore and PostgreSQL container images. Most likely, you will need to change this only to update your installation with newer versions of on-premises images.

logback.xml

This is the Logback configuration file that will be used to collect logs from Datalore and agents. We provide the default one, which prints requested information to stdout, but you can configure it any way you like. Find more information on how to configure Logback in the official documentation.

Docker Hub token

Use the code below to create a secret to pull images from a private repository:

kubectl create secret docker-registry regcred --docker-username=datalorecustomer --docker-password=<datalore token>

Run Datalore

Use the following commands:

  • Start: kubectl apply -k ./datalore/

  • Stop: kubectl delete -k ./datalore/

(Optional) Run Datalore in a non-default namespace

The procedure below uses datalore as an example of a non-default namespace:

  1. Specify the namespace when running Datalore:

    kubectl -n datalore apply -k ./datalore/

  2. Open the agents_config.yaml file and add the namespace as shown in the code below:

    k8s: namespace: datalore instances: ...

Admin user and licenses

Set up an admin user

Create a user with admin rights to access the admin panel feature.

  1. Log into Hub (HUB_ROOT_URL) as the user you want to grant an admin role to. Make sure this user's email is set.
  2. Log into Datalore with the same user (DATALORE_ROOT_URL) and accept Terms of Service.
  3. Send a POST request to http://<DATALORE_ROOT_URL>/api/admin/user/role?email=<EMAIL_OF_ADMIN_USER>&role=<NEW_USER_ROLE> with the header Authorization: <ADMIN_API_AUTH_TOKEN>.

  4. Choose of the following user roles:
    • REGULAR: regular user. Can be used to demote the user from the admin role.

    • ADMIN: admin user with access to the admin panel.

    • SUPER_ADMIN: admin user who can also change other users' roles via the admin panel.

Add a license

To use Datalore, you need to activate your license (provided in license.key). Only with an activated license, you can start computations and create more than one user. You have the following options:

  • (Recommended) Use the admin panel:
    1. Set up your admin user.

    2. Open http://<DATALORE_ROOT_URL>/admin/license.

    3. Add your license.key in the Add new license field.

    Once submitted and verified, the license will be immediately activated (no restart needed). Licenses are persisted in the database, so they will work even after restart.

  • Use the admin REST API as an alternative to the admin panel:
    1. Send a POST request to http://<DATALORE_ROOT_URL>/api/admin/license with the header Authorization: <ADMIN_API_AUTH_TOKEN> (token from this procedure).

    2. Place your license.key in the request body.

Last modified: 13 September 2021