Exercise#
Exercise: Deliberately setting up a run that crashed and then finding out what went wrong
Create another new case called b1850_high_freq_bugfixing
in your cases
directory using the B1850-tutorial
compset and f19_g17
resolution.
Use the default 5 day run length.
Now in addition to the default monthly output add the following
an
h1
file containing daily averages ofT2M
set your namelist so that there is one file per day for this daily averaged output
Set up, build and submit your case.
See if it runs successfully and, if not, try to find out what went wrong.
Click here for hints
Setting up the case
To create this case you need to run create_newcase
from within $CESMROOT/cime/scripts/
to set up a case located in ~/cases/b1850_high_freq_bugfixing
with resolution f19_g17
and then go into the case directory and run case.setup
.
Now the namelist files should have appeared in your case directory and you can add the requested output to user_nl_cam
with the following entry
fincl2='T2M:A'
nhtfrq=0,-24
mfilt=1,1
Then build the case and submit.
qcmd -- ./case.build
./case.submit
Wait until you no longer see it running in the queue.
Checking if your run has finished successfully
To check if your run has finished successfully, you can see if there’s an archive directory for the run and determine whether it contains the output files. You should find that it doesn’t in this case.
You can also look in the run directory too and see whether there’s any evidence of restart files or rpointer files having been created for this run. You should find that this hasn’t happened.
You can also go into your case directory and examine the contents of the file CaseStatus to see whether the run has successfully finished. Fig 1 shows an example of the kind of information you can find in the CaseStatus file. It makes it clear that there is an error.
Figure 1: Example output in the CaseStatus file when a simulation has crashed
Troubleshooting
Now that you have determined the run has crashed, you need to examine the log files to see what went wrong. If you go into the run directory and search for the log files, you should see that there are three: one for the coupled (cpl.log.*
), one for CAM (atm.log.*
), and one for CESM (cesm.log.*
).
We can search for the first occurence of the word ERROR in the cesm log file. You can do this by opening up the file with your text editor or by searching for the word ERROR using grep
i.e.,
grep 'ERROR' cesm.log.*
You should find this produces the following
ERROR: FLDLST: 1 errors found, see log
This tells us that something has gone wrong with the list of output variables that we have asked for. More information can then be found in the CAM log file (atm.log.*
). Looking at the very end of that file you should see:
FLDLST: T2M in fincl(1,2) not found
ERROR: FLDLST: 1 errors found, see log
This tells us that T2M
is not a valid history variable for CAM. That’s because the correct variable for near surface temperature is TREFHT
, as we used in the previous example. T2M
is not a CAM history field and this has caused CESM to crash on initialization.
Click here for the solution
cd $CESMROOT/cime/scripts
./create_newcase --case ~/cases/b1850_high_freq_bugfixing --compset B1850-tutorial --res f19_g17
cd ~/cases/b1850_high_freq_bugfixing
./case.setup
Now add the following into the file user_nl_cam
fincl2='T2M:A'
nhtfrq=0,-24
mfilt=1,1
Then build and submit your case
qcmd -- ./case.build
./case.submit
Now when it crashes, go into your run directory and look in the logs
cd ~/scratch/b1850_high_freq_bugfixing/run
You can see the logs and when they were creating using
ls -ltr
Search for ERROR
in the cesm log
grep 'ERROR' cesm.log.*
Now look at the end of the atmosphere log file
tail atm.log.*`