Changing Run Length#
How to set run length for long simulations using variables $STOP_OPTION
, $STOP_N
and $RESUBMIT
?
1. Number of submissions and run length#
Recall that we can use STOP_N and STOP_OPTION to control the run length of each batch job submission.
A typical long model simulation (say you want to run the model for 100 years) is comprised of many job submissions. This is because we have limited batch wallclock time for each job submission. For example, on derecho, the regular queue wallclock limit is 12 hours.
We can specify the number of times to resubmit the run using the $RESUBMIT
variable in env_run.xml
to complete the long run.
The tutorial version of the CESM model on derecho simulates ~10 model years per wallclock day. The maximum wallclock request is 12 hours.
If you want to run the model for 100 years, what values should be set for STOP_OPTION
, STOP_N
and RESUBMIT
?
Hint!
How to set
STOP_N
andSTOP_OPTION
for each submission, given the wallclock limit?How many times to resubmit the job to reach the 100 years?
The number of total submission = the initial submission + the number of resubmission.
Click here for the solution
Assume we want to use the full 12 hours for each job submission.The model runs 10 years / wallclock day, which means that 12 hours would give us 5 years per job submission.
For a total of 100 years, we will need 20 submissions.
STOP_OPTION='nyears', STOP_N=5, RESUBMIT=19
so that initial run of 5 years + (19 resubmits x 5 years per job) = 100 years.
2. RESUBMIT and CONTINUE_RUN#
In the exercise above, the first submission is the initial run, where CONTINUE_RUN
is by default set to FALSE
. When you want to continue the run after running the first 5 years, you will need to tell the model to continue by setting CONTINUE_RUN=TRUE
.
If you have set RESUBMIT>0
, your script will automatically change CONTINUE_RUN=TRUE
after completion of the first submission for all subsequent submissions into the queue.
In the previous exercise, if we have set RESUBMIT
=19 before the initial run, what are the value of CONTINUE_RUN
and RESUBMIT
at the time of:
the initial submission of 5 years
the next submission of 5 years
the 3rd run (2nd resubmission) of 5 years?
Click here for the solution
the initial submission of 5 years:
CONTINUE_RUN=FALSE, RESUBMIT=19
because when the run is first initialized,
CONTINUE_RUN=FALSE
.the next submission of 5 years:
CONTINUE_RUN=TRUE, RESUBMIT=18
because
RESUBMIT>0
,CONTINUE_RUN
will automatically switch toTRUE
after completing the initial run.the 3rd run (2nd resubmission) of 5 years:
CONTINUE_RUN=TRUE, RESUBMIT=17
because
CONTINUE_RUN
stays to beTRUE
,RESUBMIT
decreases by 1.