Datalore Help

Install Datalore Enterprise using Kubernetes

Infrastructure and process

The diagram below shows the Datalore on-premises infrastructure using Kubernetes.

Datalore on-premises infrastructure using Kubernetes

To install Datalore on-premises, first install and configure Hub, which provides a single point of entry for user management. The procedures below describe both stages of the process.

Install and configure Hub

If you have already installed Hub, go to the Configuration procedure. You can find more details about the Hub installation process here.

Install Hub

  1. Configure Hub persistent volumes: change the emptyDir values in the volumes section of the ./hub/statefulSet.yaml file to the volumes available in your kubernetes cluster.
  2. Run Hub using the kubectl apply -k ./hub/ command.

  3. (Optional) It is assumed that that you can access Hub at http://localhost:8082. For it to work, forward the port with the following command:

    kubectl port-forward --address 0.0.0.0 service/hub 8082

  4. Check the container output using the kubectl logs service/hub command. It should contain a line like this:
    JetBrains Hub 2020.1 Configuration Wizard will listen inside the container on {0.0.0.0:8080}/ after start and can be accessed by this URL: [http://<put-your-docker-HOST-name-here>:<put-host-port-mapped-to-container-port-8080-here>/?wizard_token=pPXTShp4NXceXqGYzeAq].

    Copy the wizard_token value to the clipboard.

  5. Go to http://localhost:8082/ and insert the token from the previous step into the Token field.

  6. Click the Log in button.

  7. Click the Set Up link.

  8. Generate a URL (referred to as HUB_ROOT_URL later) to access Hub from Datalore. Consider the following:
    • The URL must be accessible from both the cluster pods and the browser (by the end users of your Datalore installation).

    • The URL must point to the / path of your Hub installation, i.e. http://127.0.0.1:8080/ inside the container where Hub is launched (by default, the hub-0) pod.

    • How you set up your cluster to serve such a URL depends on the specifics of your cluster configuration.

  9. In Base URL, enter HUB_ROOT_URL. Do not change the Application Listen Port setting.

  10. Click the Next button.

  11. Configure the admin account (set the admin password).

  12. Click the Finish button and wait for the Hub startup.

Configure Hub

Go to HUB_ROOT_URL and log into Hub via admin account.

Configure the Datalore service

  1. Create one more URL (referred to as DATALORE_ROOT_URL later) to access Datalore. Consider the following:
    • The URL must be accessible from the browser (by the end users of your Datalore installation).

    • The URL must point to the / path of your Datalore installation, i.e. http://127.0.0.1:8080/ inside the container where Datalore will be launched (by default, it is pod datalore-on-premise-0 ).

    • How you set up your cluster to serve such a URL depends on the specifics of your cluster configuration.

  2. Go to Services (${HUB_ROOT_URL}/hub/services) and click the New service button. Use the name datalore and enter DATALORE_ROOT_URL in Home URL.

  3. Copy the ID field value and save it somewhere: it is used when configuring Datalore ($HUB_DATALORE_SERVICE_ID property).

  4. Click the Change... button next to the Secret label.

  5. Copy the generated secret and save somewhere: it will be used when configuring Datalore ($HUB_DATALORE_SERVICE_SECRET property).

  6. Click the Change secret button.

  7. Enter DATALORE_ROOT_URL in the Base URLs field.

  8. Enter the line /api/hub/openid/login in the Redirect URIs field.

  9. Click the Trust button in the upper-right corner.

  10. Click the Save button.

Create a Hub token

  1. Go to Users (${HUB_ROOT_URL}/hub/users ).

  2. Click your admin username.

  3. Go to the Authentication tab.

  4. Click the New token... button.

  5. Add Hub and Datalore into Scope. You can use any Name. Click the Create button. Remember the token. It will be used when configuring Datalore ($HUB_PERM_TOKEN property).

Force email verification

Datalore uses user emails from Hub; so it is recommended to force email verification in Hub. Users with unverified emails will not be able to use Datalore.

Configure the SMTP server

  1. Go to SMTP (${HUB_ROOT_URL}/hub/smtp-settings ).

  2. Click the Configure SMTP server... button.

  3. Configure your SMTP server parameters.

  4. Click the Save button.

  5. Click the Enable notifications button.

  6. (Optional) To make sure your configuration is working, click the Send Test message button.

Enable email verification

  1. Go to Auth Modules (${HUB_ROOT_URL}/hub/authmodules ).

  2. Open the Common settings page.

  3. Enable the Email verification option.

  4. Click the Save button.

Set and verify an admin user email

  1. Go to Users (${HUB_ROOT_URL}/hub/users ).

  2. Click your admin username.

  3. Set an email in the Email field.

  4. Click the Save button.

  5. Click the Send verification email link.

  6. Find the verification email in your inbox and click the Verify email address button.

(Optional) Ban a guest user

  1. Go to Users (${HUB_ROOT_URL}/hub/users ).

  2. Select a guest user.

  3. Click the Ban button.

(Optional) Enable auth modules

  1. Go to Auth Modules (${HUB_ROOT_URL}/hub/authmodules ).

  2. Add or remove auth modules (e.g. Google auth, GitHub auth, LDAP, etc.).

Install Datalore

To run Datalore, you need Kubernetes. We recommend using version 1.17.6, but other versions can also be applicable.

Configure Datalore

To simplify the configuration process, the Kubernetes config is split into small chunks and assembled with the Kustomize tool (-k flag of kubectl). Edit the following files in the datalore/configs directory to configure your Datalore installation.

user_config.yaml

Editing this file is mandatory to get everything working. The file has the following fields:

  • Required parameters:
    • FRONTEND_URL: URL by which Datalore is accessed (DATALORE_ROOT_URL ). It is used to generate links.
    • HUB_PUBLIC_BASE_URL: base public (should be accessible via browser) URL of your Hub installation (${HUB_ROOT_URL}/hub from the Install Hub section, i.e. https://hub.your.domain/hub ).

    • HUB_INTERNAL_BASE_URL: base internal (should be accessible from the datalore pod) URL of your Hub installation (in most cases can be equal to ${HUB_PUBLIC_BASE_URL} ).

    • HUB_DATALORE_SERVICE_ID: ID of the Datalore service in Hub (see Configure the Datalore service ).

    • HUB_DATALORE_SERVICE_SECRET: token of the Datalore service in Hub (see Configure the Datalore service ).

    • HUB_PERM_TOKEN: token for accessing Datalore and Hub scopes (see Create a Hub token ).

    • DEFAULT_INSTANCE_TYPE_ID: ID of the instance type that will be used by default (for more information, see agents_config.yaml ).

    • PASSWORD_SECRET: additional hash salt used to encrypt user passwords and prevent rainbow table attacks in case of a database leak. Can be any string.

  • Optional parameters:

    MAIL_ENABLED: set it to true to enable Datalore to send emails (welcome emails, sharing invitations, etc). When set to true, requires the following parameters:

    • MAIL_SENDER_EMAIL: sender's email

    • MAIL_SENDER_NAME: sender's name

    • MAIL_SENDER_USERNAME: username of SMTP user

    • MAIL_SENDER_PASSWORD password of SMTP user

    • MAIL_SMTP_SERVER: SMTP server host

    • MAIL_SMTP_PORT: SMTP server port

db_config.yaml

This config file is used to configure PostgreSQL connection from Datalore. There is one field to override:

  • ROOT_PASSWORD: root user's password. The database can be accessed on port 5432 with the username postgres and this password.

volumes_config.yaml

This config file is used to mount volumes for persisting Datalore's data between restarts. If you leave the default configuration, you will lose all data after the next Datalore restart. The config has two Kubernetes volumes described:

  • storage: contains workbook data, such as attached files.

  • postgresql-data: contains PostgreSQL database data.

agents_config.yaml

This config file is used to define agent types (such as Basic and Large machines in the cloud version of Datalore). It has the following schema:

k8s: instances: - id: <Unique instance ID> label: <Instance name> description: <Short description of what the instance is> minAllowed: <Minimum number of instances to be preserved in the pool> maxAllowed: <Maximum number of instances to be preserved in the pool> yaml: <Kubernetes config of Pod to be used for the instance> - id: <Another type with the same schema as above> ...

The minAllowed and maxAllowed fields are used to configure the number of pre-created instances, which will speed up the process of starting up notebooks.

images_config.yaml

This config file is used to define Datalore and PostgreSQL container images. Most likely, you will need to change this only to update your installation with newer versions of on-premises images.

logback.xml

This is the Logback configuration file that will be used to collect logs from Datalore and agents. We provide the default one, which prints requested information to stdout, but you can configure it any way you like. Find more information on how to configure Logback in the official documentation.

Docker Hub token

Use the code below to create a secret to pull images from a private repository:

kubectl create secret docker-registry regcred --docker-username=datalorecustomer --docker-password=<datalore token>

Run Datalore

Use the following commands:

  • Start: kubectl apply -k ./datalore/

  • Stop: kubectl delete -k ./datalore/

Admin user and licenses

Set up an admin user

To use the admin panel feature, you need a user with admin rights. Use the admin API token to create the first admin user.

  1. Define the ADMIN_API_AUTH_TOKEN environment variable inside user_config.yaml.

  2. Send a POST request to http://<DATALORE_ROOT_URL>/api/admin/user/role?email=<EMAIL_OF_ADMIN_USER>&role=<NEW_USER_ROLE> with the header Authorization: <ADMIN_API_AUTH_TOKEN>.

  3. Choose of the following user roles:
    • REGULAR: regular user. Can be used to demote the user from the admin role.

    • ADMIN: admin user with access to the admin panel.

    • SUPER_ADMIN: admin user who can also change other users' roles via the admin panel.

Add a license

To use Datalore, you need to activate your license (provided in license.key ). Only with an activated license, you can start computations and create more than one user. Add a license for your Datalore installation in one of the following ways:

  • (Recommended) Use the admin panel:
    1. Set up your admin user.

    2. Open http://<DATALORE_ROOT_URL>/admin/license.

    3. Add your license.key in the Add new license field.

    Once submitted and verified, the license will be immediately activated (no restart needed). Licenses are persisted in the database, so they will work even after restart.

  • Use the admin REST API as an alternative to the admin panel:
    1. Send a POST request to http://<DATALORE_ROOT_URL>/api/admin/license with the header Authorization: <ADMIN_API_AUTH_TOKEN> (token from this procedure ).

    2. Place your license.key in the request body.

  • Add a license to your Datalore installation as a file. To do so, define the LICENSE_PATHS environment variable with the comma-separated path to the respective file inside the Datalore container.
Last modified: 08 June 2021