Running ECCO Adjoint and Optimization#
ECCO Ocean State Estimation uses an iterative optimization process to adjust control variables—including ocean surface forcing, mixing parameters, and initial conditions—to minimize the weighted sum of model–data misfits, called the cost function (J), in a least-squares sense. This iterative process involves tens of iterations. Each iteration includes one forward run to compute the model–data misfit (J), one adjoint run to compute the adjoint gradients (i.e., the sensitivity of J to the controls), and an optimization step that uses these gradients to estimate updated control adjustments. The typical steps for conducting multiple iterations, starting from initial control variables (called first-guess controls), are as follows:
Execute the forward model using a set of first-guess model control variables to compute the model-data misfits and the initial value of the cost function, J.
Run the adjoint model, forced by the model–data misfits from the forward simulation as inputs. Upon completion, the adjoint gradients of J with respect to the control variables are computed. These adjoint gradients will be used in step 3 to compute updated control adjustments.
Compute control adjustments, called optimization, by utilizing the adjoint gradients and J fromm steps 1 and 2 to compute a set of adjustments to the control variables using the method of steepest descent.
Execute the forward model again using the updated control variables—i.e., the sum of the first-guess values (or those from the previous iteration) and the adjustments computed in step 3—and compute the new value of J.
This step is equivalent to step 1, except the control variables are no longer the first guess.Run the adjoint model (repeating step 2) to run the adjoint model and compute a new set of adjoint gradients.
Update control adjustments, another optimization step by applying the adjoint gradients, updated control variables, and J from the previous two or more iterations (up to 4 iterations to limit memory usage when producing ECCO V4 estimates) to calculate a new set of control adjustments using a quasi-Newton method, such as Limited-memory Broyden–Fletcher–Goldfarb–Shanno algorithm (L-BFGS). Except for using the L-BFGS method, step 6 is essentially the same as step 3.
Repeat step 4 using the latest control variable adjustments, and continue this cycle until J is sufficiently minimized. The final control adjustments, combined with the first-guess controls, constitute the optimized controls used to conduct a forward simulation to produce the final ECCO release, such as ECCO Version 4 Release 4.
In practice, a forward run and an adjoint run, such as steps 1 and 2 (or steps 4 and 5) are often executed in a single model run that has both forward and adjoint modes. We have described how to conduct a forward simulation in Reproducing ECCO Version 4 Release 4. In this tutorial, we describe how to run the ECCO adjoint model and conduct optimization (steps 3 or 5).
Steps 3 and 6 conduct a line search that identifies a direction along which J can be reduced and then calculates a step size to adjust the controls in order to reduce J. The two line search methods are:
Steepest descent (Step 3): a first-derivative method that uses the gradient as the search direction
Quasi-Newton method (Step 6): a method that uses second-derivative (curvature) information and often achieves faster J reduction
Log in to P-Cluster#
Follow the same step in Reproducing ECCO Version 4 Release 4 to log in to the P-Cluster.
Modules#
As described in Reproducing ECCO Version 4 Release 4, the required modules should have been loaded automatically if users have updated their /home/USERNAME/.bashrc
file with the example .bashrc (you may need to check out the latest version of example .bashrc).
In addition to the modules listed in Reproducing ECCO Version 4 Release 4, users also need the module intel-oneapi-mkl-2021.2.0-gcc-11.1.0-idxgd2d
for the optimization step. This module should also have been loaded automatically by /home/USERNAME/.bashrc
if users have updated it with the latest version of the example .bashrc. Alternatively — though not recommended — the module can be loaded manually if it is not loaded automatically:
Module Load Command |
---|
|
Code, Namelists, Input Files, and Scripts#
Code, Namelists, Input Files#
Note
Obtaining the code and namelist files, as well as creating the symbolic link to the input files, is the same as in Reproducing ECCO Version 4 Release 4. You may skip to Scripts and Optimization Code if you’ve already completed these steps and the files and symbolic link are still intact.
Following the instructions in Reproducing ECCO Version 4 Release 4, copy the code
and namelist
directories to /efs_ecco/USERNAME/r4/
(be sure to replace USERNAME
with your actual username):
rsync -av /efs_ecco/ECCO/V4/r4/WORKINGDIR /efs_ecco/USERNAME/r4/
Then, change into /efs_ecco/USERNAME/r4/
and create a symbolic link input
pointing to the input directory /efs_ecco/ECCO/V4/r4/input
, as also described in Reproducing ECCO Version 4 Release 4, using the following commands:
cd /efs_ecco/USERNAME/r4/
ln -s /efs_ecco/ECCO/V4/r4/input .
This symbolic link will be used to access the input files in the example run script described below.
Scripts and Optimization Code#
In addition, copy the scripts
, lsopt
, and optim
directories from ECCO-v4-Configurations/ECCOv4 Release 4/
into ECCOV4/release4/
. The scripts
directory contains various useful scripts, while lsopt
and optim
contain code required for the optimization step.
cd /efs_ecco/$USERNAME/r4/WORKINGDIR/
cp -r "ECCO-v4-Configurations/ECCOv4 Release 4/scripts/" ECCOV4/release4/
cp -r "ECCO-v4-Configurations/ECCOv4 Release 4/optimization/lsopt/" ECCOV4/release4/
cp -r "ECCO-v4-Configurations/ECCOv4 Release 4/optimization/optim/" ECCOV4/release4/
Directory Structure#
The directory structure under /efs_ecco/USERNAME/r4/
now looks like the following:
┌── WORKINGDIR
│ ├── ECCO-v4-Configurations
│ ├── ECCOV4
│ │ └── release4
│ │ ├── code
│ │ ├── namelist
│ │ ├── build
│ │ ├── run
│ │ ├── scripts
│ │ ├── lsopt
│ │ └── optim
│ └── MITgcm
└── input
Compile#
Compile Code for Adjoint Runs#
The commands to compile the code and generate the executable for an adjoint run are shown in the following code block:
cd WORKINGDIR/ECCOV4/release4
mkdir build_ad
cd build_ad
export ROOTDIR=../../../MITgcm
../../../MITgcm/tools/genmake2 -mods=../code -optfile=../code/linux_ifort_impi_aws_sysmodule -mpi
make depend
make adtaf
make adall
cd ..
Note that a new build directory, build_ad
, has been created for adjoint runs, distinguishing it from the existing build
directory used for forward runs.
The commands are similar to those used for reproducing ECCO V4r4, a forward simulation (Reproducing ECCO Version 4 Release 4. However, there are some important differences, involving two commands: make adtaf
and make adall
.
make adtaf
sends the code to the TAF server to generate adjoint code.make adall
uses the TAF-generated adjoint code to build the executable used for conducting adjoint runs.
A successful compilation will generate the executable build_ad/mitgcm_uv
. As stated earlier, in practice, an adjoint run contains both a forward mode and an adjoint mode. The forward mode computes the model–data misfits and their weighted sum, J. During the adjoint mode, the adjoint model—forced by the model–data misfits—computes the gradients of J with respect to the control variables.
Compile Optimization Code#
The code for the optimization steps (steps 3 and 6) is compiled separately to generate its own executable. The commands to compile the optimization code are as follows:
cd WORKINGDIR/ECCOV4/release4
cd lsopt
make clean
make
cd ../optim
make clean
make
cd ..
A successful compilation will generate the executable optim/optim.x
.
Conduct Iterations#
This section demonstrates how to run the ECCO adjoint model and optimization over multiple iterations using an automated script. We begin with a high-level overview of the workflow and output, followed by a detailed walkthrough of the script itself.
Overview of the Iteration Workflow#
Here, we use an example run script to describe the detailed steps for conducting adjoint runs and optimization. The script automates the iteration process by performing three iterations (iterations 0 to 2) over a short 3-day model integration, from 12Z on January 1, 1992, to 12Z on January 3, 1992. It uses the ECCO V4r4 configuration but starts from a cold start, in which all control adjustments are set to zero. The run is named v4r4_coldstart
, and the total wall clock time for completing the three iterations is under 45 minutes.
The script is available on the P-Cluster at /efs_ecco/owang/r4/WORKINGDIR/ECCOV4/release4/run_script_slurm_autoopt_coldstart_v4r4.bash
. Copy it to your working directory at /efs_ecco/USERNAME/r4/WORKINGDIR/ECCOV4/release4
(replace USERNAME
with your actual username, but keep the directory structure the same). Then submit the script using sbatch
with the following commands:
cd /efs_ecco/USERNAME/r4/WORKINGDIR/ECCOV4/release4
cp /efs_ecco/ECCO/V4/r4/scripts/run_script_slurm_autoopt_coldstart_v4r4.bash .
sbatch run_script_slurm_autoopt_coldstart_v4r4.bash
Upon completing the three iterations, five new directories will be generated by the script, as shown by the output of the following commands:
cd /efs_ecco/USERNAME/r4/WORKINGDIR/ECCOV4/release4
ls -1 | grep v4r4_coldstart
The output (order rearranged) shows five directories, along with the script run_script_slurm_autoopt_coldstart_v4r4.bash
:
run_script_slurm_autoopt_coldstart_v4r4.bash
v4r4_coldstart.iter0
v4r4_coldstart.iter1
v4r4_coldstart.iter2
ctrlvec.v4r4_coldstart
optim.v4r4_coldstart
The three directories starting with v4r4_coldstart
correspond to the three run directories for the three iterations. The other two directories, ctrlvec.v4r4_coldstart
(hereinafter referred to as the ctrldir
direcotry) and optim.v4r4_coldstart
(hereinafter referred to as the optimdir
direcotry), are used in the line search during the optimization step to calculate updated control adjustments for the next iteration.
Each run directory, such as v4r4_coldstart.iter0
, contains the following files:
ecco_ctrl_MIT_CE_000.opt0000
Packed control adjustments (hereinafter referred to as theecco_ctrl
file).ecco_cost_MIT_CE_000.opt0000
Packed adjoint gradients (hereinafter referred to as theecco_cost
file).costfunction0000
Includes the total cost function J (calledfc
in the model) in the first line
(e.g.,fc = 2079042.47585259 0.0000000E+00
), as well as individual
cost values for different observation types.
The 4-digit number at the end of each filename corresponds to the iteration number.
The total cost function J, or fc
, is the quantity that the iterative optimization process seeks to minimize. As shown below, it has been reduced to 0.618 at iteration 2 relative to its value at iteration 0:
Iteration Number |
Cost ( |
Cost Ratio w.r.t. Iteration 0 |
---|---|---|
0 |
2079042.47585259 |
1.000 |
1 |
2070774.52785874 |
0.996 |
2 |
1284971.72606061 |
0.618 |
Walkthrough of the Example Run Script#
To further help users understand the iterative optimization process, a detailed explanation of the example run script is presented below.
Setup Slurm directivies:#
#!/bin/bash
#SBATCH -J ECCOv4r4_autoopt
#SBATCH --nodes=3
#SBATCH --ntasks-per-node=36
#SBATCH --time=24:00:00
#SBATCH --exclusive
#SBATCH --partition=sealevel-c5n18xl-demand
#SBATCH --mem-per-cpu=1GB
#SBATCH -o ECCOv4r4_autoopt-%j-out
#SBATCH -e ECCOv4r4_autoopt-%j-out
Configure Shell Environment, Load modules, and Setup Environment Variables#
# Initialize and set up the environment
umask 022
ulimit -s unlimited
source /etc/profile
source /shared/spack/share/spack/setup-env.sh
source /usr/share/modules/init/sh
# Load required modules
module purge
module load intel-oneapi-compilers-2021.2.0-gcc-11.1.0-adt4bgf
module load intel-oneapi-mpi-2021.2.0-gcc-11.1.0-ibxno3u
module load netcdf-c-4.8.1-gcc-11.1.0-6so76nc
module load netcdf-fortran-4.5.3-gcc-11.1.0-d35hzyr
module load hdf5-1.10.7-gcc-9.4.0-vif4ht3
module load intel-oneapi-mkl-2021.2.0-gcc-11.1.0-idxgd2d
module list
# Set environment variables
export FORT_BUFFERED=1
export MPI_BUFS_PER_PROC=128
export MPI_DISPLAY_SETTINGS=""
Set Up Run-Specific Variables#
The script then sets up run-specific variables, such as the number of processors.
# Run-specific variables
nprocs=96
basedir="/efs_ecco/$USER/r4/WORKINGDIR/ECCOV4/release4/"
inputdir=../../../../input/
# Specify starting iteration number
whichiter=0
swhichiter=$(printf "%010d" ${whichiter})
# Specify the final iteration number (inclusive)
# For example, setting maxiter=2 will run iterations 0, 1, and 2
maxiter=$((whichiter + 2))
# Offset iteration: starting iteration index (usually 0).
# Do not change unless restarting iterations with a steepest descent line search.
offsetiter=0
runnm='v4r4_coldstart'
It uses 96 processes (nprocs=96
) to conduct the runs under the directory /efs_ecco/$USER/r4/WORKINGDIR/ECCOV4/release4/
(basedir
). The input files are located in /efs_ecco/$USER/r4/input/
(inputdir
); symbolic links to these files will be created by the run script and accessed by the model. The example script starts at iteration 0 (whichiter=0
) and ends at iteration 2 (with the final iteration maxiter
set to 2).
The run is called v4r4_coldstart
.
The variable offsetiter
is the starting iteration number for conducting a complete set of iterations (from steps 1 to 7, as described at the beginning of the tutorial), and it is typically set to 0. Occasionally, we may want to restart a new set of iterations from step 1. This can happen, for example, if a new set of observations is added. In such cases, offsetiter
should be set to the current iteration number, from which the new iteration cycle begins.
Setup Optimization#
Two directories are created for the optimization step. The ctrldir
stores control adjustments and adjoint gradients from previous iterations. The control
optimdir
is used to run optimization and generate the next ecco_ctrl
file. The new ecco_ctrl
file is saved to ctrldir
and unpacked by the model to obtain updated controls
ctrldir=${basedir}/ctrlvec.${runnm}
optimdir=${basedir}/optim.${runnm}
mkdir -p "${ctrldir}" "${optimdir}"
The script then copies the namelists and the executable optim.x
from the optim
directory, where the executable was generated, to the optimization directory optimdir
, where the control adjustments will be generated. Note that these namelists are specific for the optimization step and diff from those used for conducting forward or adjoint runs.
cp -p ${basedir}/optim/data* "$optimdir"
cp -p ${basedir}/optim/optim.x "$optimdir"
Loop Through Iterations#
The while
block loops through the iterations and conducts an adjoint run for each iteration. However, depending on certain switches (see unpack
and skip_optim
below), the script may skip the optimization step—for example, in iteration 0.
while [ ${whichiter} -le ${maxiter} ]; do
done
Switches for Skipping Optimization#
Inside the while
block, the following section sets up two switches (unpack
and skip_optim
) as well as the previous iteration number (iterm1
). If skip_optim
is set to true
, the optimization step is skipped. This is the case for iteration 0, where there is no need to generate a ecco_ctrl
file because all control adjustments are zero.
The unpack
switch serves a similar purpose—if unpack
is set to 0, it means the control adjustments for individual control variables already exist, so there is no need to unpack an ecco_ctrl
file, and the optimization step will be skipped.
# unpack=1: obtain control adjustments by unpacking ecco_ctrl
# unpack=0: no unpacking needed; individual control adjustment fields already exist
unpack=1
if [ ${whichiter} -eq ${offsetiter} ]; then
unpack=0
fi
# Previous iteration number
iterm1=$((whichiter - 1))
yiter=$(printf "%04d" ${whichiter})
yiterm1=$(printf "%04d" ${iterm1})
# Determine whether to skip optimization, i.e., skip generating ecco_ctrl for next iteration
if [ ${unpack} -eq 0 ] || ([ ${whichiter} -eq 0 ] && [ ${unpack} -eq 0 ]); then
skip_optim=true
else
skip_optim=false
fi
# Skip optimization if ecco_ctrl already exists
if [ -f ${ctrldir}/ecco_ctrl_MIT_CE_000.opt${yiter} ]; then
skip_optim=true
fi
Optimization Step#
The following if
block is where the optimization is conducted to generate the ecco_ctrl
file for the next iteration.
if [ "$skip_optim" = false ]; then
# Perform optimization to compute ecco_ctrl for the next iteration
cd "$optimdir"
# optimcycle
optimcycle=$((whichiter - (offsetiter + 1)))
nextcycle=$((optimcycle + 1))
yoptimcycle=$(printf "%04d" ${optimcycle})
ynextcycle=$(printf "%04d" ${nextcycle})
# Abort if required inputs are missing
if [ ! -f ${ctrldir}/ecco_ctrl_MIT_CE_000.opt${yiterm1} ] || \
[ ! -f ${ctrldir}/ecco_cost_MIT_CE_000.opt${yiterm1} ]; then
echo 'run aborted'
exit 1
fi
# Link previous iteration's ecco_ctrl and ecco_cost files
ln -s ${ctrldir}/ecco_ctrl_MIT_CE_000.opt${yiterm1} ecco_ctrl_MIT_CE_000.opt${yoptimcycle}
ln -s ${ctrldir}/ecco_cost_MIT_CE_000.opt${yiterm1} ecco_cost_MIT_CE_000.opt${yoptimcycle}
# Update data.optim: 1) optimcycle, 2) fmin (set once for steepest descent; fmin is computed automatically below)
sed -i "/optimcycle=/c\\ optimcycle=${optimcycle}," data.optim
if [ ${optimcycle} -eq 0 ]; then
# Steepest descent step.
# Abort if OPWARMI or OPWARMD already exists — remove them and resubmit the script.
# These files are only needed for the Quasi-Newton method.
if [ -f OPWARMI ] || [ -f OPWARMD ]; then
echo "Error: OPWARMI or OPWARMD already exists. Remove them and resubmit the script."
exit 99
fi
sed -i "/fmin=/c\\ fmin=${fmin}," data.optim
fi
cp data.optim data.optim_i${iterm1}
# Generate new ecco_ctrl
./optim.x > op_i${iterm1}
cp -f OPWARMI OPWARMI.${iterm1}
# move and rename new ecco_ctrl to ctrldif directory
mv ecco_ctrl*.opt${ynextcycle} ${ctrldir}/ecco_ctrl_MIT_CE_000.opt${yiter}
fi
In the code block above, the script first changes into the optimdir
direcory and sets the current optimization iteration (optimcycle
), which is the number of iterations since offsetiter
, as well as the next optimization iteration (nextcycle
). It then checks whether either the ecco_ctrl
or ecco_cost
file from the previous iteration exists. If not, the script terminates, as both are needed for the line search.
Next, the script creates symbolic links to the ecco_ctrl
and ecco_cost
files in the ctrldir
directory. After that, two important changes are made to the namelist file data.optim
using the Linux stream editor sed
:
The
optimcycle
namelist is updated with the currentoptimcycle
value.The
fmin
namelist is set to a value related to the cost reduction target, which is automatically estimated by the script (see below) based on the cost functionf0
from iteration 0 (oroffsetiter
).
In addition to these two sed
commands, there is a safety check for iteration 0 (or offsetiter
) that aborts the script if either OPWARMI
or OPWARMD
already exists. These two files contain information such as gradients from previous iterations and should not exist for iteration 0 (or offsetiter
). They may be leftover from a previous failed run and must be removed before restarting the iterations.
The cp
command cp data.optim data.optim_i${iterm1}
saves the current data.optim
file for archival purposes. The same applies to the command cp -f OPWARMI OPWARMI.${iterm1}
.
The optimization is actually performed by the executable in the following command:
./optim.x > op_i${iterm1}
After that, the generated ecco_ctrl
file (with the next optimization cycle number) is renamed and moved to the ctrldir
directory using the current iteration number in the filename. It will be loaded and unpacked to retrieve the control adjustments during the next iteration.
Conduct Adjoint Run#
The next code block conducts an adjoint run.
mkdir ${basedir}/${runnm}.iter${whichiter}
cd ${basedir}/${runnm}.iter${whichiter}
# Link input files from various sources
ln -s ../namelist/* .
ln -s ${inputdir}/input_init/error_weight/data_error/* .
ln -s ${inputdir}/input_init/* .
ln -s ${inputdir}/data_constraints/data_error/*/* .
ln -s ${inputdir}/data_constraints/*/* .
ln -s ${inputdir}/input_forcing/unadjusted/eccov4r4* .
ln -s ${inputdir}/input_forcing/other/*.bin .
ln -s ${inputdir}/input_forcing/control_weights/* .
ln -s ${inputdir}/input_forcing/control_weights/atm_ctrls/* .
ln -s ${inputdir}/native_grid_files/tile*.mitgrid .
python ../scripts/mkdir_subdir_diags.py
# Namelist setup
rm -f data
cp -p data.iter0.3d data
rm -f data.exf
cp -p data.exf.iter0 data.exf
rm -f data.gmredi
cp -p data.gmredi.iter0 data.gmredi
if [ ${whichiter} -eq 0 ]; then
rm -f data.ctrl
cp -p data.ctrl.iter0.inclatmctrl data.ctrl
elif [ ${whichiter} -eq ${offsetiter} ] && [ ${unpack} -eq 0 ]; then
rm -f data.ctrl
cp -p data.ctrl_itXX.inclatmctrl data.ctrl
else
rm -f data.ctrl
cp -p data.ctrl.unpack.inclatmctrl data.ctrl
# Copy ecco_ctrl file to run directory
cp -f ${ctrldir}/ecco_ctrl*${whichiter} .
fi
# Turn off the profiles package, as there are issues with using netCDF on the P-Cluster
unlink data.pkg
cp -p ../namelist/data.pkg .
sed -i '/useProfiles=.TRUE./ s/^/#/' data.pkg
# Create data.optim
rm -f data.optim
cat > data.optim <<EOF
&OPTIM
optimcycle=${whichiter},
&
EOF
# Run the model
cp -p ../build_ad/mitgcmuv_ad .
mpirun -np "${nprocs}" ./mitgcmuv_ad
The code block first creates a run directory, such as v4r4_coldstart.iter0
, and changes into it. The script then creates symbolic links to the input files. The line python ../scripts/mkdir_subdir_diags.py
creates the diags
directory in the run directory, along with a list of subdirectories under diags
, based on the information in the namelist file data.diagnostic
. These are used by the model to output diagnostics such as model state at user-specified time intervals.
Next, the script replaces some default ECCO V4r4 namelist files with versions specific to this experiment—a cold start of a 3-day model integration. For the namelist file data.ctrl
, three different versions are used:
One for iteration 0, where the model sets all control adjustments to zero.
A second for iteration
offsetiter
, or whenunpack=0
, where pre-unpacked control adjustment files are used as input instead of unpacking aecco_ctrl
file.A third for all other cases, where a
ecco_ctrl
file is copied fromctrldir
to the run directory and unpacked by the model to obtain control adjustments.
Two additional namelist changes include temporarily disabling using profile data on the P-Cluster (due to issues with netCDF modules), and setting the current iteration in data.optim
.
The mpirun
command below then launches the executable mitgcmuv_ad
as a multi-process (96-process) job:
mpirun -np "${nprocs}" ./mitgcmuv_ad
Post Run Processing#
The remaining code block in the following does some post-run processing and prepares for the next iteration:
# Save cost and control outputs to ctrldir for optimization
rsync -av ecco_cost_MIT_CE_000.opt${yiter} ${ctrldir}
rsync -av ecco_ctrl_MIT_CE_000.opt${yiter} ${ctrldir}
# Compute fmin from costfunction0000 output (used in data.optim)
# If the target cost reduction is 0.4% relative to the cost from iteration offsetiter (f0),
# then fmin is set to (1 - 0.5 * 0.004) * f0 = 0.998 * f0
if [ ${whichiter} -eq 0 ] || [ ${whichiter} -eq ${offsetiter} ]; then
f0=$(grep " fc = " costfunction0000 | awk '{print $3}')
echo "To have 0.4% cost reduction, set fmin = 0.998 * f0"
fmin=$(echo "${f0} * 0.998" | bc)
echo "fmin: ${fmin}"
fi
cd ..
whichiter=$((whichiter + 1))
echo ${whichiter}
First, the ecco_ctrl
and ecco_cost
files are copied to the ctrldir
directory, where they will be used by the optimization code to generate updated control adjustments for the next iteration.
Another important post-processing step for iteration 0 is to estimate fmin
, a value closely related to the target cost reduction. The script extracts the total cost fc
(referred to as f0
) from the costfunction0000
file in the run directory for iteration 0 (which is v4r4_coldstart.iter0
in this case). The target cost reduction is set to 0.4% relative to f0
, and accordingly, fmin
is computed as 99.8% of f0
. fmin
is used during the line search in the steepest descent step to estimate a step size. This step size is expected to produce control adjustments that yield a cost reduction matching the target.
A complete iteration is now finished. The iteration number whichiter
is incremented by 1, and the script proceeds to the next iteration.