Deploy MsPASS with Docker Compose

Prerequisites

Docker Compose is a tool to deploy and coordinate multiple Docker containers. To install Docker Compose on machines that you have root access, please refer to the guide here and follow the instructions for your specific platform. Docker Compose uses YAML files to run multiple containers. Please refer to its documentation for more details on using the tool.

Configure MsPASS Containers

The MsPASS runtime environment is composed of multiple components each serves a specific functionality. They are: frontend, scheduler, worker, db, dbmanager, and shard. The MsPASS container can be launched as one of these roles by specifying the MSPASS_ROLE environment variable. The options include:

  • frontend: Utilizing Jupyter Notebook to provide users an interactive development environment that connects to the other components.

  • scheduler: This is the Dask scheduler or the Spark master that coordinates all its corresponding workers.

  • worker: This is the Dask worker or Spark worker that does the computation in parallel.

  • db: This role runs a standalone MongoDB daemon process that manages data access.

  • dbmanager: This role runs MongoDB’s config server and router server to provide access to a sharded MongoDB cluster.

  • shard: Each shard contains a subset of the sharded data to provide horizontal scaling of the database.

  • all: This role is equivalent to frontend + scheduler + worker + db. Note that this is the default role, and it is how the container is launched in the previous section.

The following environment variables also need be set for the different roles to communicate:

  • MSPASS_SCHEDULER: User can use dask or spark to run parallel computations. dask is the default.

  • MSPASS_SCHEDULER_ADDRESS: This is the IP address or hostname of the scheduler. worker and frontend rely on this to communicate with the scheduler.

  • MSPASS_DB_ADDRESS: This is the IP address or hostname of the db or dbmanager. frontend rely on this to access the database.

  • MSPASS_SHARD_LIST: This is a space delimited string of format $HOSTNAME/$HOSTNAME:$MONGODB_PORT for all the shard. dbmanager rely on this to build the sharded database cluster.

  • MSPASS_SHARD_ID: This is used to assign each shard a unique name such that it can write to its own data_shard_${MSPASS_SHARD_ID} directory under the /db directory (in case the shards run on a shared filesystem).

  • MSPASS_JUPYTER_PWD: In the frontend, user can optionally set a password for Jupyter Notebook access. If set to an empty string, jupyter can be accessed with no password which may cause security issues. If unset, the Jupyter Notebook will generate a random token and print to stdout, which is the default behavior.

User may change the default ports of all the underlying components by setting the following variables:

  • JUPYTER_PORT: The default is 8888.

  • DASK_SCHEDULER_PORT: The default is 8786.

  • SPARK_MASTER_PORT: The default is 7077.

  • MONGODB_PORT: The default is 27017.

These variables are for experienced users only. The deployment can break if mismatching ports are set.

Deploy MsPASS Containers

Docker Compose can deploy multiple MsPASS Containers of different roles, which simulates a distributed environment. Below, we provide two exemplary Docker Compose configurations that have distributed setup for both the computation (with Dask or Spark) and the database (with MongoDB).

Listing 1 Docker Compose example that configures a MongoDB cluster of two shards and a Dask cluster of one worker.
  1version: '3.7'
  2
  3services:
  4
  5  mspass-dbmanager:
  6    image: mspass/mspass
  7    volumes:
  8      - "${PWD}/:/home"
  9    ports:
 10      - 27017:27017
 11    depends_on:
 12      - mspass-shard-0
 13      - mspass-shard-1
 14    environment:
 15      MSPASS_ROLE: dbmanager
 16      MSPASS_SHARD_LIST: mspass-shard-0/mspass-shard-0:27017 mspass-shard-1/mspass-shard-1:27017
 17      MONGODB_PORT: 27017
 18    healthcheck:
 19      test: echo 'db.runCommand("ping").ok' | mongosh localhost:27017/test --quiet 
 20      interval: 10s
 21      timeout: 60s
 22      retries: 5
 23      start_period: 5s
 24  
 25  mspass-shard-0:
 26    hostname: mspass-shard-0
 27    image: mspass/mspass
 28    volumes:
 29      - "${PWD}/:/home"
 30    environment:
 31      MSPASS_ROLE: shard
 32      MSPASS_SHARD_ID: 0
 33      MONGODB_PORT: 27017
 34      SHARD_DB_PATH: scratch
 35    healthcheck:
 36      test: echo 'db.runCommand("ping").ok' | mongosh localhost:27017/test --quiet 
 37      interval: 10s
 38      timeout: 60s
 39      retries: 5
 40      start_period: 5s
 41  
 42  mspass-shard-1:
 43    hostname: mspass-shard-1
 44    image: mspass/mspass
 45    volumes:
 46      - "${PWD}/:/home"
 47    environment:
 48      MSPASS_ROLE: shard
 49      MSPASS_SHARD_ID: 1
 50      MONGODB_PORT: 27017
 51      SHARD_DB_PATH: scratch
 52    healthcheck:
 53      test: echo 'db.runCommand("ping").ok' | mongosh localhost:27017/test --quiet
 54      interval: 10s
 55      timeout: 60s
 56      retries: 5
 57      start_period: 5s
 58
 59  mspass-scheduler:
 60    image: mspass/mspass
 61    volumes:
 62      - "${PWD}/:/home"
 63    ports:
 64      - 8786:8786
 65    environment:
 66      MSPASS_ROLE: scheduler
 67      MSPASS_SCHEDULER: dask
 68      DASK_SCHEDULER_PORT: 8786
 69    healthcheck:
 70      test: wget --no-verbose --tries=1 --spider http://localhost:8786
 71      interval: 10s
 72      timeout: 60s
 73      retries: 5
 74      start_period: 5s
 75  
 76  mspass-worker:
 77    image: mspass/mspass
 78    volumes:
 79      - "${PWD}/:/home"
 80    depends_on:
 81      - mspass-scheduler
 82    environment:
 83      MSPASS_ROLE: worker
 84      MSPASS_SCHEDULER: dask
 85      MSPASS_SCHEDULER_ADDRESS: mspass-scheduler
 86  
 87  mspass-frontend:
 88    image: mspass/mspass
 89    volumes:
 90      - "${PWD}/:/home"
 91    ports:
 92      - 8888:8888
 93    depends_on:
 94      - mspass-dbmanager
 95      - mspass-scheduler
 96    environment:
 97      MSPASS_ROLE: frontend
 98      MSPASS_SCHEDULER: dask
 99      MSPASS_SCHEDULER_ADDRESS: mspass-scheduler
100      MSPASS_DB_ADDRESS: mspass-dbmanager
101      MSPASS_JUPYTER_PWD: mspass
102      JUPYTER_PORT: 8888
Listing 2 Docker Compose example that configures a MongoDB cluster of two shards and a Spark cluster of one worker.
 1version: '3'
 2
 3services:
 4
 5  mspass-dbmanager:
 6    image: mspass/mspass
 7    volumes:
 8      - "${PWD}:/home"
 9    command: dockerize -wait tcp://mspass-shard-0:27017 -wait tcp://mspass-shard-1:27017 -timeout 240s /usr/sbin/start-mspass.sh
10    ports:
11      - 27017:27017
12    depends_on:
13      - mspass-shard-0
14      - mspass-shard-1
15    environment:
16      MSPASS_ROLE: dbmanager
17      MSPASS_SHARD_LIST: mspass-shard-0/mspass-shard-0:27017 mspass-shard-1/mspass-shard-1:27017
18      MONGODB_PORT: 27017
19  
20  mspass-shard-0:
21    hostname: mspass-shard-0
22    image: mspass/mspass
23    volumes:
24      - "${PWD}:/home"
25    environment:
26      MSPASS_ROLE: shard
27      MSPASS_SHARD_ID: 0
28      MONGODB_PORT: 27017
29  
30  mspass-shard-1:
31    hostname: mspass-shard-1
32    image: mspass/mspass
33    volumes:
34      - "${PWD}:/home"
35    environment:
36      MSPASS_ROLE: shard
37      MSPASS_SHARD_ID: 1
38      MONGODB_PORT: 27017
39
40  mspass-scheduler:
41    image: mspass/mspass
42    volumes:
43      - "${PWD}:/home"
44    ports:
45      - 7077:7077
46    environment:
47      MSPASS_ROLE: scheduler
48      MSPASS_SCHEDULER: spark
49      SPARK_MASTER_PORT: 7077
50  
51  mspass-worker:
52    image: mspass/mspass
53    volumes:
54      - "${PWD}:/home"
55    command: dockerize -wait tcp://mspass-scheduler:7077 -timeout 240s /usr/sbin/start-mspass.sh
56    depends_on:
57      - mspass-scheduler
58    environment:
59      MSPASS_ROLE: worker
60      MSPASS_SCHEDULER: spark
61      MSPASS_SCHEDULER_ADDRESS: mspass-scheduler
62  
63  mspass-frontend:
64    image: mspass/mspass
65    volumes:
66      - "${PWD}:/home"
67    command: dockerize -wait tcp://mspass-dbmanager:27017 -wait tcp://mspass-scheduler:7077 -timeout 240s /usr/sbin/start-mspass.sh
68    ports:
69      - 8888:8888
70    depends_on:
71      - mspass-dbmanager
72      - mspass-scheduler
73    environment:
74      MSPASS_ROLE: frontend
75      MSPASS_SCHEDULER: spark
76      MSPASS_SCHEDULER_ADDRESS: mspass-scheduler
77      MSPASS_DB_ADDRESS: mspass-dbmanager
78      MSPASS_JUPYTER_PWD: mspass
79      JUPYTER_PORT: 8888

To test out the multi-container setup, we can use the docker-compose command, which will deploy all the components locally. First, save the content of one of the two code blocks above to a file called docker-compose.yml, and make sure that you are running in a directory where you want to keep the files created by the containers (i.e., the db, logs, and work directories). Then, run the following command to start all the containers:

docker-compose -f docker-compose.yml up -d

This command will start all the containers as services running in the background and you will see all the containers started correctly with outputs like this:

$ docker-compose -f docker-compose.yml up -d
Creating network "mspass_default" with the default driver
Creating mspass_mspass-shard-1_1   ... done
Creating mspass_mspass-scheduler_1 ... done
Creating mspass_mspass-shard-0_1   ... done
Creating mspass_mspass-worker_1    ... done
Creating mspass_mspass-dbmanager_1 ... done
Creating mspass_mspass-frontend_1  ... done

You can then open http://127.0.0.1:8888/ in your browser to access the Jupyter Notebook frontend. Note that it may take a minute for the frontend to be ready. You can check the status of the frontend with this command:

docker-compose -f docker-compose.yml logs mspass-frontend

The notebook will ask for a password for access, just type in mspass there as you can tell that we have set the MSPASS_JUPYTER_PWD environment variable for the mspass-frontend service in the docker-compose.yml file.

When you are done with MsPASS, you can bring down the containers with:

docker-compose -f docker-compose.yml down

You should see similar outputs to the following indicating all the containers are correctly cleaned up:

$ docker-compose -f docker-compose.yml down
Stopping mspass_mspass-frontend_1  ... done
Stopping mspass_mspass-dbmanager_1 ... done
Stopping mspass_mspass-worker_1    ... done
Stopping mspass_mspass-shard-0_1   ... done
Stopping mspass_mspass-scheduler_1 ... done
Stopping mspass_mspass-shard-1_1   ... done
Removing mspass_mspass-frontend_1  ... done
Removing mspass_mspass-dbmanager_1 ... done
Removing mspass_mspass-worker_1    ... done
Removing mspass_mspass-shard-0_1   ... done
Removing mspass_mspass-scheduler_1 ... done
Removing mspass_mspass-shard-1_1   ... done
Removing network mspass_default