Debugging CAM#
Create a case called b1850_high_freq_bugfixing
using the compset B1850
at f19_g17
resolution.
Set the run length to 1 month
.
Now in addition to the default monthly output, add the following output:
an
h1
file containing daily averages ofT2M
and set your namelist so that there is one file per day for this daily averaged output.
Set up, build and submit your case.
Your goal is to make the model crash. And then to troubleshoot why it crashed.
Click here for hints
Tip to add a h1
file
For more information about how to add a h1
file, check the section about namelist modifications.
If you don’t have time to check the section immediately, the way to add an h1
file with daily averages of T2M
and create one file per day for this daily averaged output is:
add the following lines in user nl cam
fincl2 = ’T2M:A’
nhtfrq = 0,-24
mfilt = 1,1
Tip to for troubleshooting
Check the derecho queue and wait until your run doesn’t show in the queue anymore.
When your run is not in the queue anymore:
Go to the
archive
directory: can you see the history files in the archive directory? The answer should be no. Why?Go to the
run
directory: Is there any evidence of history files or restart files being created by the run? The answer, again, should be no. This is because we have tricked you, with a bug.
Look at the log files in the RUNDIR
to try to understand why the run crashed.
Click here for the solution
# Create a new case
Create a new case b1850_high_freq_bugfixing
with the command:
cd /glade/u/home/$USER/code/my_cesm_code/cime/scripts/
./create_newcase --case ~/cases/b1850_high_freq_bugfixing --compset B1850 --res f19_g17
# Setup
Invoke case.setup with the command:
cd ~/cases/b1850_high_freq_bugfixing
./case.setup
# Customize namelists
Add the daily output of T2M
by editing the file user_nl_cam
and adding the lines:
fincl2 = 'T2M:A'
nhtfrq = 0,-24
mfilt = 1,1
# Set run length
Set the run length to 1 month:
./xmlchange STOP_N=1,STOP_OPTION=nmonths
# Change the job queue and account number
If needed, change job queue
and account number
.
For instance, to run in the queue tutorial
and the project number UESM0013
(you should use the project number given for this tutorial), use the command:
./xmlchange JOB_QUEUE=tutorial,PROJECT=UESM0013 --force
# Build and submit
Build the model and submit your job:
qcmd -- ./case.build
./case.submit
# Look at what happened
Your run should crash !!!. This is normal. The goal of the exercise is to troubleshooting.
What you should find in your run directory is three log files.
One for the coupler
cpl.log.*
,one for CAM
atm.log.*
and one for CESM
cesm.log.*
.
Somewhere in these log files is information about what has gone wrong, but it is often not entirely straightforward to find.
Often at the bottom of the log file, there are errors that are not relative to your problem because they are just demonstrating that individual processes are exiting.
Often the relevant error lies above this and can sometimes be found by searching for the first occurrence of ERROR or ABORT or cesm.exe.
In this case, searching for the first occurrence of ERROR in cesm.log.*
gives us some relevant information. We find
ERROR: FLDLST: 1 errors found, see log
This tells us is that something has gone wrong with the list of output variables that we have asked for.
More information can then be found in the CAM log file atm.log.*
.
Look at the very end of that file and you should see
FLDLST: T2M in fincl(1, 2) not found
ERROR: FLDLST: 1 errors found, see log
This tells us that T2M
is not a valid history variable for CAM. That’s because the correct variable
for near surface temperature is TREFHT
. T2M
is not a CAM
history field and this has caused CESM to crash.