Upsource distributed installation without Docker

Hardware requirements

Refer to Upsource cluster hardware requirements

Setting up Upsource Cluster

The moving parts

An Upsource cluster consists of the services listed below. The services can either be installed on the same server or distributed across multiple servers – physical or virtual – in any combination. For some services only one instance is allowed, while others (frontend, psi-agent, analyzer) can be scaled as necessary. We recommend running scaled services on different servers to improve performance and reliability.

cassandra	Manages Cassandra database included with Upsource.
frontend	Upsource web-based UI.
psi-agent	Provides code model services (code intelligence) based on IntelliJ IDEA.
psi-broker	Manages psi-agent tasks.
analyzer	Imports revisions from VCS to Cassandra database.
opscenter	Provides cluster monitoring facilities.
haproxy	Provides the entry point of the distributed Upsource cluster. Proxies all incoming requests to services.
file-clustering	Provides the backend for certain "smart" features of Upsource (review suggestions, revision suggestions, etc.) by computing file and revision similarity indices.

Additionally, Upsource depends on two external services: JetBrains Hub, which is used for user authentication and permissions management, and Apache Cassandra, which is the database engine used by Upsource. Cassandra itself can run both in single- and multi-node configurations, however, administration of Cassandra is beyond the scope of this document.

Prerequisites

Before installing the Upsource services you need to install the external ones: Apache Cassandra 3.10 and JetBrains Hub.

Installing Hub

Follow this instruction to install Hub: https://www.jetbrains.com/help/hub/Install-and-Configure-Hub.html

Installing Cassandra

Please consult the Cassandra documentation for instructions on deploying Cassandra. Note that the following additional requirements are in place:

Cassandra 3.10 should be used.
Additional libraries should be added to the Cassandra installation (typically under <cassandra_home>/libs ). The libraries for the specific build of Upsource can be downloaded from:
http://download.jetbrains.com/upsource/cassandra-deploy-libs-{upsource_version}.zip
e.g. http://download.jetbrains.com/upsource/cassandra-deploy-libs-2017.2.2197.zip
The following properties should be adjusted in the cassandra.yaml file as follows:
batch_size_warn_threshold_in_kb: 250 batch_size_fail_threshold_in_kb: 5000 compaction_throughput_mb_per_sec: 32
Here you can find the basic instruction for a single-node Cassandra configuration.
Please note, that Apache Cassandra is the only database engine Upsource can use: the nature of the data stored and manipulated by Upsource precludes the use of typical SQL databases.

Configuring an Upsource cluster

Download and unpack upsource-services.zip:

http://download.jetbrains.com/upsource/upsource-cluster-services-{upsource_version}.zip

e.g. http://download.jetbrains.com/upsource/upsource-cluster-services-2017.2.2197.zip

The ZIP distribution contains the seven services described above as well as two files with environment variables that will be described below:

upsource.common.env
service.specific.env

The upsource.common.env file contains the common properties that should be identical for all services:

                ############### upsource.common.env ################
                CASSANDRA_HOSTS= cassandra_hosts
                // Default value: 9042
                CASSANDRA_NATIVE_TRANSPORT_PORT=9042
                // Will depend on the number of nodes
                UPSOURCE_CASSANDRA_REPLICATION_FACTOR= 1
                // Set to “false” for multi-node
                UPSOURCE_CASSANDRA_SINGLE_NODE= true
                UPSOURCE_DATABASE=datastax
                HUB_URL= public_hub_url
                // If not provided, HUB_URL will be used
                HUB_URL_INTERNAL= internal_hub_url
                // ID of Upsource service in Hub
                UPSOURCE_SERVICE_ID= service_id
                // Secret of Upsource service in Hub
                UPSOURCE_SERVICE_SECRET= service_secret
                // The URL used by the end users to access Upsource
                UPSOURCE_URL= upsource_url
                // The port the Upsource web server will listen on, 8080 by default
                UPSOURCE_EXPOSED_PROXY_PORT= port
                UPSOURCE_HUB_CHECK_INTERVAL_SECONDS=600
                // Number of threads used for initializing new projects
                UPSOURCE_ANALYZER_THREADS_INIT_CLUSTER=2
                UPSOURCE_MONITORING_LISTEN_PORT= monitoring_port
                UPSOURCE_PSI_BROKER_HOST= psi_broker_host
                UPSOURCE_PSI_BROKER_LISTEN_PORT= psi_port
                // The path where Upsource backups will be stored (on the opscenter host)
                // <upsource_opscenter_home>/backup is used by default
                BUNDLE_BACKUP_LOCATION= backup_location
                // Usage is described here
                HUB_KEYSTORE_PATH=
                // Usage is described here
                HUB_KEYSTORE_PASSWORD=
                // Host where upsource-opscenter is running
                MONITORING_HOST=
                // Location of haproxy.cfg, by default at /usr/local/etc/haproxy
                HAPROXY_CONF_LOCATION=
                // By default at /opt/upsource-haproxy
                HAPROXY_SCRIPTS_LOCATION=
                ############### upsource.common.env ################
            

Properties should be exported as environment variables on each machine where services are running. For example, with the following command:

set -o allexport; source /path/to/file/upsource.common.env; set +o allexport

The service.specific.env file contains individual properties that will be different for each specific service:

                ############### service.specific.env ################
                UPSOURCE_SERVICE_MESSAGING_PORT=
                UPSOURCE_FRONTEND_PORT=
                // Leave it blank for singleton services
                UPSOURCE_SERVICE_INSTANCE_ID=
                // Temporary files. By default at <upsource_service_home>/tmp
                UPSOURCE_TEMP_LOCATION=
                // Application data. By default at <upsource_service_home>/data
                UPSOURCE_DATA_LOCATION=
                ############### service.specific.env ################
            

To run a service with an individual properties file you can use the following syntax:

                (set -a; . /path/to/service.specific.env; set +a;
                /path/to/upsource-<service>/bin/upsource-<service>.sh start)
            

Starting the Upsource cluster

Before starting the Upsource services make sure that Cassandra and JetBrains Hub are running. Having done that, run the upsource-cluster-init service. It will prepare the required keyspaces in Cassandra and exit.

The remaining Upsource services can be launched in any order using the following command:

                (set -a; . /path/to/service.specific.env; set +a;
                /path/to/upsource-<service>/bin/upsource-<service>.sh start)
            

The status of all services can be checked on the monitoring page at <upsource_url>/monitoring.

Scaling Upsource services in the cluster

The following services can be scaled:

frontend
psi-agent
analyzer

Scaling the frontend service

Frontend service should be scaled in installations with a large number of users as well as to improve availability. We suggest doing that using a combination of haproxy and Python scripts that are described below.

Before configuring haproxy and load balancer make sure that the following packages are installed on the server:

haproxy 1.6.7
python 2.7
python-pip
jinja2

The load balancer (can be downloaded from here) consists of the following scripts and configs:

run.sh
reloader.sh
loadbalancer.py
/conf/haproxy
- haproxy.cfg.tmpl
- haproxy_503.http.tmpl
- haproxy_504.http.tmpl
- initial.json

Put the haproxy configs from /conf/haproxy to ${ HAPROXY_CONF_LOCATION} /conf/haproxy .

Put the scripts to ${ HAPROXY_SCRIPTS_LOCATION} (/opt/upsource-haproxy by default).

Launch run.sh

Scaling the analyzer

When dealing with extremely large and/or active projects (hundreds of thousands of commits overall, thousands of daily commits across all projects) it may be necessary to set up a dedicated analyzer for processing them. No additional steps are required, Upsource will automatically assign projects to active Analyzer instances.

Scaling the psi-agent

The subset of projects the particular PSI works on is defined by the UPSOURCE_PSI_PROJECTS environment variable. Its value is specified using a mask where the following symbols have a special meaning:

+: stands for “include”

-: stands for “exclude”

.+ stands for “all projects”

For example:

UPSOURCE_PSI_PROJECTS=+:.+ means “process all projects”

UPSOURCE_PSI_PROJECTS=-:.+,+:Project-A means “exclude all projects, include Project-A” (process Project-A only)

UPSOURCE_PSI_PROJECTS=+:.+,-:Project-A means “include all projects, exclude Project-A” (process everything but Project-A)

UPSOURCE_PSI_PROJECTS=-:.+,+:Project-A,+:Project-B means “exclude all projects, include Project-A, include Project-B” (process Project-A and Project-B)

Last modified: 02 April 2021