Datalore 2024.3 Help

Healthcheck & monitoring


Use Datalore's in-built HTTP endpoint (accessible at /health) to verify whether the instance has become online and responsive.

This endpoint and returns OK when no issues are detected.

Use the same endpoint as Kubernetes liveness probe if default Helm charts are used for the deployment.


Datalore has a built-in metrics exporter, which is disabled by default and accessible at the /metrics path when enabled explicitly.

There are two mutually exclusive environment variables of the Datalore server that can be used to enable metrics:

Monitoring environment variables



Default value




Not defined

Enables the exporter and defines the authentication token required to collect metrics. Mutually exclusive with ENABLE_UNAUTHORIZED_METRICS.



Not defined

Enables the exporter. No authentication will be required to read metrics. Mutually exclusive with METRICS_AUTH_TOKEN.


  1. agent_pool_size: shows how many agents the pool currently has.

    • Prometheus query: sum by (instance_name)(agent_pool_size)

  2. agent_waiting_time_bucket: represents the timespan in which the user waited for an instance startup.

    • Prometheus query: sum(increase(agent_waiting_time_bucket[10m])) by (le)

  3. agent_in_pool_time_bucket: represents the timespan in which the agent was online and idle before being assigned to a specific notebook.

    • Prometheus query: sum(increase(agent_in_pool_time_bucket[10m])) by (le)

  4. agents_started_total: shows how many agents were started per minute.

    • Prometheus query: sum by (instance_name)(rate(agents_started_total[5m])) * 60

Last modified: 16 July 2024