Security
This section covers the following security-related procedures and advisories:
Permissions - Runtime
Datalore notebook agent relies on two things which require elevated access to the runtime: CRI-U and FUSE mounts within the containers. Both of these things require at least SYS_ADMIN
capability granted to the runtime, otherwise the reactive mode and attached files won't work properly.
For the same reason, Datalore operational capacity is limited on environments with limited permission scope, like AWS Fargate.
We are looking into ways of reducing the scope of the permissions required. If Datalore is planned to be operated within the communal infrastructure, it's advised to provision a dedicated set of host machines specifically for Datalore compute agents.
Permissions - Database
Postgres' user used to provision Datalore should have CREATE privileges, so the further ALTER TABLE/COLUMN commands derived from Datalore SQL migrations could be executed properly. EXECUTE privilege is also required.
Sensitive data flow within Datalore
By its nature, Datalore can be in possession of sensitive data, if it has been instructed to connect to such a data source (database or object storage).
However, only three Datalore components can technically process such a data: notebook agents, SQL session runtimes and notebook outputs. For more detailed explanation, please, continue reading this block.
The following diagrams are describing two different data flows, where external sensitive data (like database contents or private keys) are involved.
SQL cell execution data flow
As the notebook is one of the core concepts in Datalore, it can interact with a remote data source, if instructed by the user (e.g., a Datalore authenticated user who modifies the source code of the cell; for example, adding a SQL statement within the SQL cell connected to a pre-configured database connection).
An external actor calls the SQL cell computation event. This is an event triggered by user (or by user intention via Datalore Run API).
Datalore spawns new Notebook Agent container.
Notebook Agent contacts Datalore's REST API, passing over the SQL query string.
Datalore requests the database credentials and spawns the SQL Session container, passing over these credentials and the SQL query.
SQL Session compiles the query (performing all the variable substitutions, according to the selected SQL dialect) and queries the database.
Once the full dataset has been fetched, SQL Session container converts it to JSON and sends back to the Datalore's REST API.
REST API returns the calculated result to the notebook.
During the task teardown, Notebook Agent caches first N and last X values (those are currently non-configurable and set to 100 for both values) to the database over the Datalore's TCP API. This slice of the dataset is saved to Datalore's PostgreSQL database. Additionally, all the rendereded blobs (like generated images or chart cell outputs) are saved to Datalore's persistent storage using the same TCP API.
Database introspection data flow
Apart from the explicitly provided and executed SQL queries within the notebook context, Datalore will also perform a database introspection as a background task to improve user experience when using SQL cells.
An external actor calls the introspection task. This could be either user via UI, or the Datalore server itself as part of routine maintenance or cache update flow.
The DB Connection Check container is spawned. At this step, Server issues a one-time token for the container.
Once spawned, container calls Datalore's REST API and authenticates with the token from the previous step. The credentials are either requested from the database (if connection already exists) or they are taken from the user input (in case when UI task is invoked and no connection exists yet)
Credentials are passed over to the Connection Check container and it performs the database connection with the received credentials.
The response is passed back to the Datalore's REST API. Once completed, the Connection Check container is shut down.
Configure TLS certificates for Datalore
Datalore does not provide any TLS-related options to its end-users. Instead, it relies on a third-party load balancers (or reverse proxies) to perform such a termination. As a consequence, the Datalore app itself is not normally expected to be user-faced directly without some intermediary proxy deployed next to Datalore.
Configure TLS certificates for Docker-based deployment
This procedure describes creating another container with Nginx that will work as a reverse proxy with SSL termination.
Edit the docker-compose.yaml file as shown in the example below.
version: "3.9" services: datalore: image: jetbrains/datalore-server:2023.6 expose: [ "8080", "8081", "5050", "4060" ] networks: - datalore-agents-network - datalore-backend-network volumes: - "datalore-storage:/opt/data" - "/var/run/docker.sock:/var/run/docker.sock" environment: DATALORE_PUBLIC_URL: "https://datalore.example.com" #change to your domain name DB_PASSWORD: "changeme" #change to your password postgresql: image: jetbrains/datalore-postgres:2023.6 expose: [ "5432" ] networks: - datalore-backend-network volumes: - "postgresql-data:/var/lib/postgresql/data" environment: POSTGRES_PASSWORD: "changeme" #change to your password DATABASES_COMMAND_IMAGE: "jetbrains/datalore-database-command:2023.6" nginx: image: nginx:1.25 networks: - datalore-backend-network volumes: - nginx-selfsigned.crt:/etc/ssl/certs/nginx-selfsigned.crt #change to your cert - dhparam.pem:/etc/nginx/dhparam.pem #change to your DH parameters - nginx-selfsigned.key:/etc/ssl/private/nginx-selfsigned.key #change to your cert key - ssl.conf:/etc/nginx/conf.d/ssl.conf #change to your nginx config ports: - 80:80 - 443:443 volumes: postgresql-data: { } datalore-storage: { } networks: datalore-agents-network: name: datalore-agents-network datalore-backend-network: name: datalore-backend-networkEdit the nginx ssl.conf file as shown in the example below.
server { listen 443 ssl; server_name datalore.example.com; ssl_certificate /etc/ssl/certs/nginx-selfsigned.crt; #change to your cert ssl_certificate_key /etc/ssl/private/nginx-selfsigned.key; #change to your cert key ssl_protocols TLSv1.3; ssl_prefer_server_ciphers on; ssl_dhparam /etc/nginx/dhparam.pem; #change to your DH parameters ssl_ciphers EECDH+AESGCM:EDH+AESGCM; ssl_ecdh_curve secp384r1; ssl_session_timeout 10m; ssl_session_cache shared:SSL:10m; ssl_session_tickets off; # commented for self-signed certificate ssl_stapling on; #comment this parameter if you have self-signed certificate ssl_stapling_verify on; location / { proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade"; proxy_set_header Host $host; proxy_pass http://datalore:8080; } } server { listen 80 default_server; server_name _; return 301 https://$host$request_uri; }
Configure TLS certificates for Helm-based deployment
We suggest that you use one of the following methods:
Self-acquired certificate and private key
Let's Encrypt
Perform the following steps based on a selected method:
Create a Kubernetes TLS secret, following the official Kubernetes guidance.
Adjust the
datalore.values.yaml
file, as follows, replacingdatalore.example.com
with your actual FQDN you're going to use with Datalore.ingress: enabled: true tls: - secretName: datalore-tls hosts: - datalore.example.com hosts: - host: datalore.example.com paths: - path: / pathType: Prefix annotations: nginx.ingress.kubernetes.io/proxy-body-size: "8m" kubernetes.io/ingress.class: "nginx"
Install CertManager into your Kubernetes cluster.
Create a
letsencrypt.yaml
with the following content:apiVersion: cert-manager.io/v1 kind: Issuer metadata: name: letsencrypt-prod spec: acme: # The ACME server URL server: https://acme-v02.api.letsencrypt.org/directory # Email address used for ACME registration email: [PLACE YOUR EMAIL HERE] # Name of a secret used to store the ACME account private key privateKeySecretRef: name: letsencrypt-prod # Enable the HTTP-01 challenge provider solvers: - http01: ingress: ingressClassName: nginxApply the manifest:
kubectl apply -f letsencrypt.yaml
Check the
kubectl get issuer
. Eventually, it should become as follows:kubectl get issuer 1 ↵ NAME READY AGE letsencrypt-prod True 14dAdjust the
datalore.values.yaml
file, as follows, replacingdatalore.example.com
with your actual FQDN you're going to use with Datalore.ingress: enabled: true tls: - secretName: datalore-tls hosts: - datalore.example.com hosts: - host: datalore.example.com paths: - path: / pathType: Prefix annotations: nginx.ingress.kubernetes.io/proxy-body-size: "8m" kubernetes.io/ingress.class: "nginx" cert-manager.io/issuer: "letsencrypt-prod"
Set the
DATALORE_PUBLIC_URL
parameter in the samedatalore.values.yaml
file. Using the same value you provided to replace"https://datalore.example.com"
in the step above.dataloreEnv: DATALORE_PUBLIC_URL: "https://datalore.example.com"Apply the configuration and restart Datalore.
Check whether ingress controller registered the changes:
kubectl get ingress
. The expected result is adatalore
ingress with the 443 port exposed.Check whether the certificate is issued:
kubectl get certificates
. The expected output is similar to the one below.kubectl get certificates NAME READY SECRET AGE datalore-tls True datalore-tls 8m5s
Use Kubernetes native secrets for storing the database password
Modify the
databaseSecret
block in yourdatalore.values.yaml
, as follows:databaseSecret: create: false name: datalore-db-password key: DATALORE_DB_PASSWORDCreate a Kubernetes secret, using the value from the
name
key above as the secret name, and the desired password as the secret value.(if applicable) Remove the
password
key with its value from thedatabaseSecret
block.Proceed based on whether this is your fresh deployment or Datalore is already installed.
Proceed with the installation. No further action is required.
Apply the configuration
helm upgrade --install -f datalore.values.yaml datalore datalore/datalore --version 0.2.19
Database password rotation
Datalore requires a permanent connection to a PostgreSQL database to operate properly. Once Datalore is deployed, the database password is saved within the environment so Datalore can re-use it later once restarted.
However, you might want to change this password later due to various compliance or operational reasons.
Locate the
values.yaml
file being used for the deployment.Depending on the method used: either replace the password within the
databaseSecret
block, OR update the secret value, if the Kubernetes secret is used instead of the plain-text value.Update the Datalore deployment:
helm upgrade --install -f datalore.values.yaml datalore datalore/datalore --version 0.2.19
Restart Datalore. For further operational guidance, refer to Server lifecycle events.
Locate the
docker-compose.yaml
file being used for the deployment.Update the
DB_PASSWORD
block inenvironment
block.Restart Datalore. For further operational guidance, refer to Server lifecycle events.
Configure TLS between server and agent
Click the avatar in the upper right corner and select Admin panel from the menu.
From the Admin panel, select Configuration.
Select the Force agent SSL checkbox.
Click the avatar in the upper right corner and select Admin panel from the menu.
From the Admin panel, select Configuration.
Click the Reset secrets button.