Conducting MITgcm Verification Experiments#

MITgcm#

MITgcm (Massachusetts Institute of Technology General Circulation Model) is a powerful and versatile numerical modeling tool used to simulate the circulation of the oceans, atmosphere, and climate systems. It serves as the primary numerical model for the Estimating the Circulation and Climate of the Ocean (ECCO) project. The MITgcm user manual and code are publicly accessible at MITgcm User Manual and MITgcm GitHub Repository, respectively.

MITgcm includes a set of verification experiments designed to help new users become familiar with the model while also providing sufficient features for experienced users and developers to test new functionalities, facilitate debugging, and more.

Section 5.5 of the MITgcm user manual provides detailed information about the verification experiments, including the testreport shell script utility, which generates executables, conducts runs, and compares the results with the benchmark results provided in the MITgcm repository. This documentation explains how to conduct these verification experiments on the P-Cluster.

Get the MITgcm Code from GitHub#

Log in to the P-Cluster using the following command: ssh -i /path/to/privatekey -X USERNAME@34.210.1.198

Once logged in, clone the MITgcm code from GitHub and navigate to the directory where the verification experiments reside:

cd /efs_ecco/USERNAME/
git clone https://github.com/MITgcm/MITgcm.git

Conducting Verification Experiments Using testreport Utility#

As an example, we use the shell script testreport to conduct the verification experiment global_ocean.90x40x15. Note that the names of the verification experiments are the same as the directory names. Additionally, each verification experiment may have multiple sub-experiments that test different MITgcm features.

Request an Interactive Node#

As explained in the P-Cluster introduction section, we first need to request an interactive node in order not to strain the head node, which has very limited resources. To do this, issue the following command:

salloc --ntasks=4 --ntasks-per-node=2 --partition=sealevel-c5xl-demand --time=01:00:00

Wait for the node to become available (when the prompt appears), then proceed to the next step below.

--ntasks-per-node=2, why? Notice that we're asking for 4 tasks (--ntasks=4), that means we need 4 cpus from somewhere. Each node in the partition "sealevel-c5xl-demand" has two cpus. When we say --ntasks-per-node=2, we're asking for both of the cpus in each node. So doing the math, you see we're requesting 2 nodes, each with 2 cpus, for our 4 tasks. Simple.

Conducting Verification Experiments on the P-Cluster#

So now that we’ve got 4 cpus, let’s use them to run some super small MITgcm simulations. These small simulations are part of a test suite that the MITgcm maintainers use to make sure code changes don’t break things. Verification experiments are generally pretty small and run only a few time steps, so they run fast.

Let’s run a verification experiment in two ways: (1) usingn only one CPU, and (2) using four CPUs.

Conduct a Verification Experiment with one CPU#

cd MITgcm/verification
./testreport -t "global_ocean.90x40x15"

By default, the testreport command will compile the model for one cpu. Easy.

After the experiment completes, testreport generates a summary of the status of the experiment, indicating whether the test passes or fails. A test passes if the results are sufficiently similar to the benchmark results, and it fails if the deviation exceeds an acceptable threshold.. Below is a snippet of the summary, which shows that there are three sub-experiments: global_ocean.90x40x15, global_ocean.90x40x15.dwnslp, and global_ocean.90x40x15.idemix. All three sub-experiments have passed the test (indicated by the word pass).

Thu Sep 26 16:34:17 UTC 2024
run: ./testreport -t global_ocean.90x40x15
on : Linux ip-10-20-22-69 5.15.0-1070-aws #76~20.04.1-Ubuntu SMP Mon Sep 2 12:20:36 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

No "OPTFILE" was specified ; genmake2 found and uses:
                         OPTFILE=../../../tools/build_options/linux_amd64_gfortran

default 10  ----T-----  ----S-----  ----U-----  ----V-----  --PTR 01--  --PTR 02--  --PTR 03--  --PTR 04--  --PTR 05--
G D M    c        m  s        m  s        m  s        m  s        m  s        m  s        m  s        m  s        m  s
e p a R  g  m  m  e  .  m  m  e  .  m  m  e  .  m  m  e  .  m  m  e  .  m  m  e  .  m  m  e  .  m  m  e  .  m  m  e  .
n n k u  2  i  a  a  d  i  a  a  d  i  a  a  d  i  a  a  d  i  a  a  d  i  a  a  d  i  a  a  d  i  a  a  d  i  a  a  d
2 d e n  d  n  x  n  .  n  x  n  .  n  x  n  .  n  x  n  .  n  x  n  .  n  x  n  .  n  x  n  .  n  x  n  .  n  x  n  .

Y Y Y Y>16<16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16  .  .  .  . pass  global_ocean.90x40x15
Y Y Y Y>16<16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16  .  .  .  . pass  global_ocean.90x40x15.dwnslp
Y Y Y Y>12<13 16 16 16 16 16 16 16 14 13 13 16 16 14 13 16 -- -- -- --  .  .  .  . pass  global_ocean.90x40x15.idemix
Start time:  Thu Sep 26 16:34:17 UTC 2024
End time:    Thu Sep 26 16:37:34 UTC 2024
======== End of testreport execution ========

Conduct a Verification Experiment with 2 CPUs#

OK, so to run this verfication experiment on 2 CPUs on the P-Cluster, we need to use MPI. It seems like every machine has its MPI libraries in a different place, and so in order to compile the model we need to tell the compiler where the MPI libraries are. For the P-Cluster, the location of the MPI libraries (and all of the other libraries we need) are stored in a fancy text file called an OPTFILE, and this OPTFILE is on github. So, first, let’s download it.

cd MITgcm/verification

wget https://raw.githubusercontent.com/ECCO-GROUP/ECCO-v4-Configurations/refs/heads/master/ECCOv4%20Release%205/code/linux_ifort_impi_aws_sysmodule

This is what I see:

ls -l linux_ifort_impi_aws_sysmodule 
-rw-r--r--  1 ifenty  staff  1981 May 17 16:46 linux_ifort_impi_aws_sysmodule

OK, so now we’re going to run the test report again, but with one extra flag: “-mpi”

./testreport -t "global_ocean.90x40x15" -optfile linux_ifort_impi_aws_sysmodule -mpi

The -optfile tells the compiler where to find the magical OPTFILE. Easy. The “-mpi” line tells testreport to use 2 CPUs (default is 2).

Just like with the one cpu example, testreport generates a summary of the status of the experiment, indicating whether the test passes or fails. Below is a snippet of the summary for the 2 CPUs job.

Notice the “-MPI 4” and the “-optfile linux…” lines below. If you see a lot of “Y” and “pass”, it worked. Go celebrate with a walk on the dunes.

Sat May 17 22:15:32 UTC 2025
run: ./testreport -t global_ocean.90x40x15 -optfile linux_ifort_impi_aws_sysmodule -mpi
on : Linux sealevel-c5xl-demand-dy-c5xlarge-1 5.15.0-1055-aws #60~20.04.1-Ubuntu SMP Thu Feb 22 15:49:52 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

  OPTFILE=/efs_ecco/ohoundeg/MITgcm/verification/linux_ifort_impi_aws_sysmodule

default 10  ----T-----  ----S-----  ----U-----  ----V-----  --PTR 01--  --PTR 02--  --PTR 03--  --PTR 04--  --PTR 05--
G D M    c        m  s        m  s        m  s        m  s        m  s        m  s        m  s        m  s        m  s
e p a R  g  m  m  e  .  m  m  e  .  m  m  e  .  m  m  e  .  m  m  e  .  m  m  e  .  m  m  e  .  m  m  e  .  m  m  e  .
n n k u  2  i  a  a  d  i  a  a  d  i  a  a  d  i  a  a  d  i  a  a  d  i  a  a  d  i  a  a  d  i  a  a  d  i  a  a  d
2 d e n  d  n  x  n  .  n  x  n  .  n  x  n  .  n  x  n  .  n  x  n  .  n  x  n  .  n  x  n  .  n  x  n  .  n  x  n  .

Y Y Y Y>16<16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16  .  .  .  . pass  global_ocean.90x40x15
Y Y Y Y>16<16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16  .  .  .  . pass  global_ocean.90x40x15.dwnslp
Y Y Y Y>12<13 16 16 16 16 16 16 16 14 13 12 14 13 16 13 13 -- -- -- --  .  .  .  . pass  global_ocean.90x40x15.idemix
Start time:  Sat May 17 22:15:32 UTC 2025
End time:    Sat May 17 22:22:16 UTC 2025
======== End of testreport execution ========

Other Verification Experiments#

The testreport utility also allows one conduct multiple verification experiments (with each having multiple sub-experiments). For example,

./testreport -t "global_ocean.90x40x15, lab_sea"