Reproducing ECCO Version 4 Release 4 (Forward Simulation)#
ECCO Version 4 Release 4 (V4r4) is ECCO’s latest publicly available central estimate (see its data repository on PO.DAAC, which has been used in numerous studies (e.g., Wu et al. (2020)). ECCO V4r4 is a forward simulation with optimized controls that have been adjusted through an iterative adjoint-based optimization process to minimize the model–data misfit. Wang et al. (2023) provides detailed instructions on how to reproduce the ECCO V4r4 estimate. In this tutorial, we follow those instructions with some modifications tailored for the P-Cluster and reproduce the ECCO V4r4 estimate.
Log in to P-Cluster#
Users first connect to the P-Cluster and change the directory to the user’s directory on /efs_ecco, as described in the P-Cluster introduction tutorial:
ssh -i /path/to/privatekey -X USERNAME@34.210.1.198
The directory /efs_ecco/USERNAME/
(replace USERNAME
with the user’s actual username) is where the run should be conducted. Users can change to that directory with the following command:
cd /efs_ecco/USERNAME/
Modules#
Modules on Linux allow users to easily configure their environment for specific software, such as compilers (e.g., GCC, Intel) and MPI libraries (e.g., OpenMPI, MPICH). Users can switch between versions without manually setting environment variables. Running ECCO on different machines and platforms often involves a different set of modules tailored to the system’s architecture and operating system. The modules used in the P-Cluster differ from those specified in Section 5 of Wang et al. (2023). They have been loaded in the example .bashrc, which should have been downloaded and renamed to /home/USERNAME/.bashrc
as described in the P-Cluster introduction tutorial, so that the required modules are loaded automatically. Specificity, the modules loaded in example .bashrc are as follows:
Module Load Command |
---|
|
|
|
|
|
With these modules pre-loaded by .bashrc, one can skip the module-loading step in the first box of Section 5.1 of Wang et al. (2023) and proceed directly to the compilation steps in the second box of Section 5.1 of Wang et al. (2023).
Code, Namelists, and Input Files#
For the sake of time, the MITgcm code (checkpoint66g), V4-specific code, and V4 namelist files have been downloaded to the P-Cluster at /efs_ecco/ECCO/V4/r4/
, follwing Sections 2 and 3 of Wang et al. (2023). Copy the files to the user’s own directory at /efs_ecco/USERNAME/r4/
using the following command (be sure to replace USERNAME
with the actual username and maintain the same directory structure as described here):
rsync -av /efs_ecco/ECCO/V4/r4/WORKINGDIR /efs_ecco/USERNAME/r4/
Everyone has a directory at /efs_ecco/USERNAME/
. There is no need to manually create the subdirectory /efs_ecco/USERNAME/r4/
; the rsync
command above will create it automatically.
The input files—such as atmospheric forcing and initial conditions, as described in Section 4 of Wang et al. (2023), are several hundreds gigabytes in size. These input files have also been downloaded and stored on the P-Cluster in /efs_ecco/ECCO/V4/r4/input/
. Do not copy them to the user’s own directory. Instead, create a symbolic link in the user’s own directory pointing to the input file directory using the following command:
cd /efs_ecco/USERNAME/r4/
ln -s /efs_ecco/ECCO/V4/r4/input .
The symbolic link will be used to access the input files in the example run script described below.
The directory structure under /efs_ecco/USERNAME/r4/
now looks like the following:
┌── WORKINGDIR
│ ├── ECCO-v4-Configurations
│ ├── ECCOV4
│ │ └── release4
│ │ ├── code
│ │ └── namelist
│ └── MITgcm
└── input
Compile#
The steps for compiling the code the same as described in the second box of Section 5.1 of Wang et al. (2023) except for one important change: you need to specify the optfile
as ../code/linux_ifort_impi_aws_sysmodule
.
cd WORKINGDIR/ECCOV4/release4
mkdir build
cd build
export ROOTDIR=../../../MITgcm
../../../MITgcm/tools/genmake2 -mods=../code -optfile=../code/linux_ifort_impi_aws_sysmodule -mpi
make depend
make all
cd ..
The optfile
linux_ifort_impi_aws_sysmodule
has been specifically customized for the P-Cluster. If successful, the executable mitgcmuv
will be generated in the build
directory.
Run the Model#
After successfully compiling the code and generating the executable mitgcmuv
in the build
directory (WORKINGDIR/ECCOV4/release5/build/mitgcmuv
), one can proceed with running the model. For this purpose, we provide an example V4r4 run script that will integrate the model for three months (See below for how to change the run script to conduct a run over the period of 1992-2017, the entire V4r4 integration period). The example run script is also available on the P-Cluster at /efs_ecco/ECCO/V4/r4/scripts/run_script_slurm.bash
.
SLURM Directives#
As described in the P-Cluster introduction tutorial, the P-Cluster uses SLURM as the batch system. There are a few SLURM directives at the beginning of the run script that request the necessary resources for conducting the run. These SLURM directives are as follows:
SBATCH Commands |
Description |
---|---|
#SBATCH -J ECCOv4r4 |
Job name is ECCOv4r4. |
#SBATCH –nodes=3 |
Request three nodes. |
#SBATCH –ntasks-per-node=36 |
Each node has 36 tasks (processes). |
#SBATCH –time=24:00:00 |
Request a wall clock time of 24 hours. |
#SBATCH –exclusive |
No other jobs will be scheduled on the same nodes while the job is running. |
#SBATCH –partition=sealevel-c5n18xl-demand |
Request an on-demand Amazon EC2 C5n instances (Tab C5n under Product Details) that has 72 vCPU and 192 GB memory. |
#SBATCH –mem-per-cpu=1GB |
Each CPU/process/task has 1GB memory. |
#SBATCH -o ECCOv4r4-%j-out |
Batch output log file. |
#SBATCH -e ECCOv4r4-%j-out |
Batch error log file. |
Submit the Run and Check Job Status#
Change into the /efs_ecco/USERNAME/r4/WORKINGDIR/ECCOV4/release4
directory, and copy the run script from /efs_ecco/ECCO/V4/r4/scripts/run_script_slurm.bash
into this directory (replace USERNAME
with your actual username, but keep the directory structure the same). Then submit the script using sbatch
with the following commands:
cd /efs_ecco/USERNAME/r4/WORKINGDIR/ECCOV4/release4
cp /efs_ecco/ECCO/V4/r4/scripts/run_script_slurm.bash .
sbatch run_script_slurm.bash
Once submitting the job, SLURM will generate a job id and show the following message:
Submitted batch job 123
Users can then check the status the job by using the following command:
squeue
Usually, SLURM takes several minutes to configure a job, with the status (ST
) showing CF
(for configuring):
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
123 sealevel- ECCOv4r4 USERNAME CF 0:53 3 sealevel-c5n18xl-demand-dy-c5n18xlarge-[1-3]
After a while, squeue will show the status changing to R
(for run) as shown in following:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
123 sealevel- ECCOv4r4 USERNAME R 3:30 3 sealevel-c5n18xl-demand-dy-c5n18xlarge-[1-3]
The run directory is /efs_ecco/USERNAME/ECCO/V4/r4/WORKINGDIR/ECCOV4/release4/run/
. The 3-month integration takes less than 20 minutes to complete. NORMAL END
inside the batch log file /efs_ecco/USERNAME/ECCO/V4/r4/WORKINGDIR/ECCOV4/release4/ECCOv4r4-123-out
(replace 123
with the actual job ID) indicates a successfully completed run. Another way to check if the run ended normally is to examine the last line of the file STDOUT.0000
in the run directory. If that line is PROGRAM MAIN: Execution ended Normally
, then the run completed successfully.
The run will output monthly-mean core variables (SSH, OPB, and UVTS) in the subdirectory diags/
of the run directory. These fields can be analyzed using Jupyter Notebooks presented in some of the ECCO Summer School tutorials.
Note
If you want to output other variables, modify data.diagnostics following the format of WORKINGDIR/ECCOV4/release4/namelist/data.diagnostics.monthly.inddiag
(for monthly output) or data.diagnostics.daily.inddiag
(for daily output). To help manage disk usage and I/O performance, please consider outputting only the variables that are essential for your analysis.
To conduct the entire 26-year (1992–2017) run, comment out the following three lines in the script:
unlink data
cp -p ../namelist/data .
sed -i '/#nTimeSteps=2160,/ s/^#//; /nTimeSteps=227903,/ s/^/#/' data
References#
Wang, O., & Fenty, I. (2023). Instructions for reproducing ECCO Version 4 Release 4 (1.5). Zenodo. https://doi.org/10.5281/zenodo.10038354
Wu, W., Zhan, Z., Peng, S., Ni, S., & Callies, J. (2020). Seismic ocean thermometry. Science, 1515(6510), 1510–1515. https://doi.org/10.1126/science.abb9519