Debugging CAM

Debugging CAM#

Exercise: Add an additional output variable

Create a case called b1850_high_freq_bugfixing using the compset B1850 at f19_g17 resolution. Set the run length to 1 month.

Now in addition to the default monthly output, add the following output:

  • an h1 file containing daily averages of T2M and set your namelist so that there is one file per day for this daily averaged output.

Set up, build and submit your case.

Your goal is to make the model crash. And then to troubleshoot why it crashed.

Click here for hints

Tip to add a h1 file

For more information about how to add a h1 file, check the section about namelist modifications.

If you don’t have time to check the section immediately, the way to add an h1 file with daily averages of T2M and create one file per day for this daily averaged output is:

add the following lines in user nl cam

 fincl2 = ’T2M:A’
 nhtfrq = 0,-24
 mfilt = 1,1

Tip to for troubleshooting

Check the derecho queue and wait until your run doesn’t show in the queue anymore.

When your run is not in the queue anymore:

  • Go to the archive directory: can you see the history files in the archive directory? The answer should be no. Why?

  • Go to the run directory: Is there any evidence of history files or restart files being created by the run? The answer, again, should be no. This is because we have tricked you, with a bug.

Look at the log files in the RUNDIR to try to understand why the run crashed.

Click here for the solution

# Create a new case

Create a new case b1850_high_freq_bugfixing with the command:

 cd /glade/u/home/$USER/code/my_cesm_code/cime/scripts/
 ./create_newcase --case ~/cases/b1850_high_freq_bugfixing --compset B1850 --res f19_g17 

# Setup

Invoke case.setup with the command:

 cd ~/cases/b1850_high_freq_bugfixing
 ./case.setup

# Customize namelists

Add the daily output of T2M by editing the file user_nl_cam and adding the lines:

 fincl2 = 'T2M:A'
 nhtfrq = 0,-24
 mfilt = 1,1

# Set run length

Set the run length to 1 month:

./xmlchange STOP_N=1,STOP_OPTION=nmonths

# Change the job queue and account number

If needed, change job queue and account number.
For instance, to run in the queue tutorial and the project number UESM0013 (you should use the project number given for this tutorial), use the command:

./xmlchange JOB_QUEUE=tutorial,PROJECT=UESM0013 --force

# Build and submit

Build the model and submit your job:

qcmd -- ./case.build
./case.submit

# Look at what happened

Your run should crash !!!. This is normal. The goal of the exercise is to troubleshooting.

What you should find in your run directory is three log files.

  • One for the coupler cpl.log.*,

  • one for CAM atm.log.*

  • and one for CESM cesm.log.*.

Somewhere in these log files is information about what has gone wrong, but it is often not entirely straightforward to find.

  • Often at the bottom of the log file, there are errors that are not relative to your problem because they are just demonstrating that individual processes are exiting.

  • Often the relevant error lies above this and can sometimes be found by searching for the first occurrence of ERROR or ABORT or cesm.exe.

In this case, searching for the first occurrence of ERROR in cesm.log.* gives us some relevant information. We find

ERROR: FLDLST: 1 errors found, see log

This tells us is that something has gone wrong with the list of output variables that we have asked for.

More information can then be found in the CAM log file atm.log.*. Look at the very end of that file and you should see

FLDLST: T2M in fincl(1, 2) not found
ERROR: FLDLST: 1 errors found, see log

This tells us that T2M is not a valid history variable for CAM. That’s because the correct variable for near surface temperature is TREFHT. T2M is not a CAM history field and this has caused CESM to crash.