Exercise 2#

Exercise: Working out the optimum segment length

Consider a simulation with a Model Cost of 11 simulated years per day.

If your goal is to run a 30 year simulation with each segment having an integer number of years and you have a 12 hour wall clock limit on your machine’s queue, what values of STOP OPTION, STOP N and RESUBMIT would you select to minimize the number of individual resubmissions?

Note: For simplicity, we aim for each run to have a duration that is a whole number of years and you should also take care not to risk having your segment not being completed by the time 12 hours is over.

Click here for hints
  • Use the Model Cost to determine STOP_N and STOP_OPTION for each submission, taking into account the wall clock limit.

  • Next, calculate how many times you need to resubmit the job to reach the 30-year goal.

  • Then, set RESUBMIT using The total number of submissions = the initial submission + the number of resubmissions.

Click here for the solution

Suppose that you are aiming for a 30-year simulation and you find in the timing files that that your model throughput is 11 simulated years per day. If your wallclock limit is 12-hour, it means you can run roughly 5.5 years per submission.

However, it’s advisable to run a little less than what might appear optimum from this run because the exact throughput can vary. Additionally, it’s best to run an integer number of years if possible.

You can probably safely run 5 years within a 12-hour wall clock limit. So, you can set:

STOP OPTION=nyears
STOP N=5

Since you want to run 30 years, that means you need to submit the run 6 times (5 years x 6 = 30 years).

As “the total number of submissions = the initial submission + the number of resubmissions”, that means that you need 5 re-submissions after your first one. So you would set:

RESUBMIT=5

This is illustrated in Figure 1.
timing

Figure 1: Number of submissions

This can be achieved with

./xmlchange STOP_OPTION=nyears,STOP_N=5,RESUBMIT=5