Troubleshooting Upsource cluster
Upsource Cluster doesn't start/restart with the following error:
ERROR: for opscenter Cannot create container for service haproxy: Unable to find a node that satisfies the following conditions [port 8080 (Bridge mode)] [available container slots] [container==7f96d284c7d4a2477262a3a020e8c16c2e60a2d9d1a0fc2a42bb0b3d35ff3e56 (soft=false)]
This is a known integration issue between swarm and docker-compose. The error occurs when the swarm manager tries to recreate a service container that defines port mapping to the host machine port. See this article for details.
Remove haproxy container before starting the cluster:
./cluster.sh -H tcp://<swarm.master.host>:<swarm.master.port> rm haproxy
Swarm cluster issues
All nodes marked as Unknown in the docker output
-H tcp://<swarm.master.host>:<swarm.master.port> info
The problem may be caused by closed ports on some of the machines. Ensure that all ports used by the swarm cluster are opened. For instance:
Docker engine of the swarm cluster listens to some tcp ports (2375, 2376 by default)
Swarm master listens to some port (4000 by default)
Key-value storage listens to some port (8500 is the default for consul)
To check if the ports are open, run:
docker -H tcp://<swarm.master.host>:<swarm.master.port> info
Upsource services don't see each other
To check whether services see each other or not, you can issue a GET request to http://<address.of.node.where.opscenter.is.running>:10080/monitoring/frontends. It should return a non-empty list if the cluster is running.
If the link is not available it could mean that the haproxy/opscenter services are not finished starting yet.
If the link returns an empty list, then most likely a connection between services has not been established (note that the list might be empty for a short period of time on startup while the frontend service is not yet initialized).To restore the connection, try disabling the firewall on the swarm worker nodes in the cluster. For more details including what ports should be opened on the swarm cluster nodes in the firewall/access control lists, refer to this description.