Exercise#

Exercise: Deliberately setting up a run that crashed and then finding out what went wrong

Create another new case called b1850_high_freq_bugfixing in your cases directory using the B1850-tutorial compset and f19_g17 resolution.

Use the default 5 day run length.

Now in addition to the default monthly output add the following

  • an h1 file containing daily averages of T2M

  • set your namelist so that there is one file per day for this daily averaged output

Set up, build and submit your case.

See if it runs successfully and, if not, try to find out what went wrong.

Click here for hints

Setting up the case

To create this case you need to run create_newcase from within $CESMROOT/cime/scripts/ to set up a case located in ~/cases/b1850_high_freq_bugfixing with resolution f19_g17 and then go into the case directory and run case.setup.

Now the namelist files should have appeared in your case directory and you can add the requested output to user_nl_cam with the following entry

fincl2='T2M:A'
nhtfrq=0,-24
mfilt=1,1

Then build the case and submit.

qcmd -- ./case.build

./case.submit

Wait until you no longer see it running in the queue.

Checking if your run has finished successfully

To check if your run has finished successfully, you can see if there’s an archive directory for the run and determine whether it contains the output files. You should find that it doesn’t in this case.

You can also look in the run directory too and see whether there’s any evidence of restart files or rpointer files having been created for this run. You should find that this hasn’t happened.

You can also go into your case directory and examine the contents of the file CaseStatus to see whether the run has successfully finished. Fig 1 shows an example of the kind of information you can find in the CaseStatus file. It makes it clear that there is an error.

xmlfiles

Figure 1: Example output in the CaseStatus file when a simulation has crashed

Troubleshooting

Now that you have determined the run has crashed, you need to examine the log files to see what went wrong. If you go into the run directory and search for the log files, you should see that there are three: one for the coupled (cpl.log.*), one for CAM (atm.log.*), and one for CESM (cesm.log.*).

We can search for the first occurence of the word ERROR in the cesm log file. You can do this by opening up the file with your text editor or by searching for the word ERROR using grep i.e.,

grep 'ERROR' cesm.log.*

You should find this produces the following

ERROR: FLDLST: 1 errors found, see log

This tells us that something has gone wrong with the list of output variables that we have asked for. More information can then be found in the CAM log file (atm.log.*). Looking at the very end of that file you should see:

FLDLST: T2M in fincl(1,2) not found
ERROR: FLDLST: 1 errors found, see log

This tells us that T2M is not a valid history variable for CAM. That’s because the correct variable for near surface temperature is TREFHT, as we used in the previous example. T2M is not a CAM history field and this has caused CESM to crash on initialization.

Click here for the solution
cd $CESMROOT/cime/scripts
./create_newcase --case ~/cases/b1850_high_freq_bugfixing --compset B1850-tutorial --res f19_g17 

cd ~/cases/b1850_high_freq_bugfixing 
./case.setup

Now add the following into the file user_nl_cam

fincl2='T2M:A' 
nhtfrq=0,-24 
mfilt=1,1

Then build and submit your case

qcmd -- ./case.build

./case.submit

Now when it crashes, go into your run directory and look in the logs

cd ~/scratch/b1850_high_freq_bugfixing/run

You can see the logs and when they were creating using

ls -ltr

Search for ERROR in the cesm log

grep 'ERROR' cesm.log.*

Now look at the end of the atmosphere log file

tail atm.log.*`