Run MsPASS with Docker

Prerequisites

Docker is required in normal use to run MsPASS on desktop systems. The alternative is a more complicated installation of the components built from source as described on this wiki page. Docker is the piece of software you will use to run and manage any containers on your desktop system.

Docker is well-supported on all current desktop operating systems and has simple install procedures described in detail in the product’s documentation found here The software can currently be downloaded at no cost, but you must have administrative privileges to install the software. The remainder of this page assumes you have successfully installed docker. For Windows or Apple user’s it may be convenient to launch the “docker desktop” as an alternative to command line tools.

Download MsPASS Container

The MsPASS container image is built and hosted on Docker Hub. It is also available in the GitHub Container Registry. Once you have docker setup properly, use the following command in a terminal to download the MsPASS image from Docker Hub to your local machine:

docker pull mspass/mspass

Be patient the first time you issue this command for your systems as this can take a few minutes depending on your internet speed. Note you can run this command from anywhere and the files are stored in a system directory (folder) whose location depends upon the host operating system. Be aware that the MsPASS container will consume of the order of 500 Mb of disk space on your system disk so you should be sure you are not pushing the limits of your system disk. When you pull the container docker loads data only in a system dependent data space so you will not see anything happen in the directory where you run this command. The recommended way to manage disk usage is through docker desktop or docker command line tools. See docker’s documentation for information now how to do that.

It can be confusing to understand where data is stored in a containerized environment because file paths are always mapped from local file path names to container file names. They are usually different. In the discussion below files names we reference that reside inside a container will be set in italics. File names on the physical system will be referred to with a normal font text.

Run MsPASS Container in All-in-one Mode

Most MsPASS processing on a desktop begins by running a variant of the following on the command line:

docker run -p 8888:8888 --mount src=/Users/myusername/myproject,target=/home,type=bind mspass/mspass

The -p 8888:8888 argument maps port 8888 on your system to the container’s 8888 port. That pair of arguments are needed to allow your local web browser to connect to the Juypter notebook server running in the container. 8888 is the default port for the Jupyter Notebook frontend. If there are collisions with 8888 port on your system (uncommon), change the first number to “map” the local system port number to 8888 in the container. For example, if you use -p 9999:8888 the URL you use to connect to the Jupyter notebook would need to be altered to use 9999 as the port number

The lengthy incantation in the argument following the --mount argument is used to “map” a local file system path to a defined mount point in the container. In this example the local system directory, “/Users/myusername/myproject”, will be mapped to the directry called /home in the container. /home is a standard mount point directory on the unix system the container runs. An standard alternative is /mnt, but most people prefer /home as the name makes more sense. That mapping is necessary to save your results to your local system. Without the --mount incantation any results you produce in a run will disappear when the container exits.

A useful, alternative way to launch docker on a linux or MacOS system is to use the shell cd command in the terminal you are using to make your project directory the “current directory”. Then you can cut-and-paste the following variation of the above into that terminal window and /home in the container will be mapped to your “current directory”:

docker run -p 8888:8888 --mount src=`pwd`,target=/home,type=bind mspass/mspass

When the container boots it splashes a bunch of text to the terminal from which it was launched announcing successful lauching of required MsPAS components. The last part of the output will look something like this

[I 11:02:38.655 NotebookApp] Serving notebooks from local directory: /home
[I 11:02:38.655 NotebookApp] Jupyter Notebook 6.2.0 is running at:
[I 11:02:38.655 NotebookApp] http://7b408535513f:8888/?token=ced2d40475df024c3544e7bd4aa0ea4676e0c88ae85be7db
[I 11:02:38.656 NotebookApp]  or http://127.0.0.1:8888/?token=ced2d40475df024c3544e7bd4aa0ea4676e0c88ae85be7db
[I 11:02:38.656 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 11:02:38.673 NotebookApp]

    To access the notebook, open this file in a browser:
        file:///root/.local/share/jupyter/runtime/nbserver-57-open.html
    Or copy and paste one of these URLs:
        http://7b408535513f:8888/?token=ced2d40475df024c3544e7bd4aa0ea4676e0c88ae85be7db
     or http://127.0.0.1:8888/?token=ced2d40475df024c3544e7bd4aa0ea4676e0c88ae85be7db

Use the standard cut-and-paste operation to paste the URL beginning with http://127.0.0.1:8888 to your favorite web browser (Note if you need to use port mapping, which is not common, you would need to change the 8888 to the mapped value - 9999 in the example above.). That URL should resolve and a Jupyter notebook home page should come up in the browser. This page assumes you know where to go from here. If you are not familiar with Jupyter Notebook, refer to the documentation found here .

The root directory of the notebook contains three different directories, db, logs, and work, that will have been created in your working directory the first time you launch the mspass container in that directory. db contains MongoDB’s database files. logs contains the logs generated by the database, the scheduler, and the worker. work is a local scratch space used by dask/spark. Other files in your project data should also show up in the file browser. (Note if you do not use the --mount option everything shown on the home page will disappear when the contaienr is exited. The default is what it is because the majority of “dockerized” applications are run as background processes and that approach makes cleanup automatic. That mode is rarely useful on a desktop use with MsPASS.) Normal use at this point is to open an existing notebook to be run (double-click the notebook’s file name) or create one with the New button on the notebook home page.

A final point worth noting is that it is often useful when working interactively with mspass on a desktop to open a “Terminal” in the container. The New button has a Terminal item in addition to the Python 3 button that is used to create a new notebook. If you select Terminal you will get a black web browser window (usually a tab on any newer browser) with the cryptic # prompt of the default Bourne shell. Most users will want to immediately launch a bash (Note we do not currently have any other advanced shell commands in the mspass container.) shell instead of the more primitive sh. i.e. we recommend you type bash in the new terminal window as it gives you things like line editing not available with the old-school Bourne shell. Be warned that with docker you are running as root in the container. You can thus run sysadmin commands. That can be useful, but it is a sharp knife that can cut you. Be sure you know what you are doing before you alter any files with bash commands in this terminal. A more standard use is to run common monitoring commands like top to monitor memory and cpu usage by the container.

If you are using dask on a desktop, we have found many algorithms perform badly because of a subtle issue with python and threads. That is, by default dask uses a “thread pool” for workers with the number of threads equal to the number of cores defined for the docker container. Threading with python is subject to poor performance because of something called the Global Interpreter Lock (GIL) that causes multithread python functions to not run in parallel at all with dask. The solution is to tell dask to run each worker task as a “process” not a thread. (Note pyspark does this by default.) A way to do that with dask is to launch docker with the following variant of above:

docker run -p 8888:8888 -e MSPASS_WORKER_ARG="--nworkers 4 --nthreads 1" --mount src=`pwd`,target=/home,type=bind mspass/mspass

where the value after –nworkers should be the number of worker tasks you want to have the container run. Normally that would be the number of cores defined for the container which be default is less than the number of cores for the machine running docker.

Finally, to exit close any notebook windows and the Jupyter notebook home page. You will usually need to type a ctrl-C in the terminal window you used to launch mpass via docker.