CUPiD

CUPiD#

The CESM Unified Postprocessing and Diagnostics (CUPiD) package is a new python-based system for running post-processing routines and diagnostics across all CESM components with a common user and developer interface. This notebook is a chance to try out CUPiD and run its’ time series generation tool on your own model simulation.

Note that the underlying python code is very similar to the routines shown in the component-specific diagnostics, which is why we recommend trying those notebooks first. Additional info, including the source code, can be found on Github here.

BEFORE BEGINNING THIS EXERCISE - Check that your kernel (upper right corner, above) is Bash. This should be the default kernel, but if it is not, click on that button and select Bash.

CUPiD is currently a command line tool. This means that instead of running python code directly, this notebook will run unix commands CUPiD provides in order to generate the relevant diagnostics. To start, we need to clone CUPiD from Github:

#Delete old CUPiD directory if one exists:
if [ -d "CUPiD" ]; then
  rm -rf CUPiD
fi

#Clone CUPiD source code from Github repo:
git clone --recurse-submodules https://github.com/NCAR/CUPiD.git
cd CUPiD #Need to enter CUPiD directory for remaining commands

We’ll also need to grab some external libraries, which is done the same way as CESM via checkout_externals:

./manage_externals/checkout_externals

This downloads two additional diagnostics packages that CUPiD will use. One is the AMWG Diagnostics Framework (ADF), which is a command-line tool that can be used to generate CAM diagnostics, and mom6-tools, which is a python package that can be used to analyze MOM6, which is the ocean model that will be used in CESM3 (but for this tutorial we’ll ignore).

Next we need to setup the proper python environment using conda/mamba, and activate the cupid-dev environment:

#Load conda to your environment:
module load conda

#Install 'cupid-dev' environment if it doesn't already exist:
if ! { conda env list | grep 'cupid-dev'; } >/dev/null 2>&1; then
  mamba env create -f environments/dev-environment.yml
fi

#Install 'cupid-analysis' environment if ti doesn't already exist:
if ! { conda env list | grep 'cupid-analysis'; } >/dev/null 2>&1; then
  mamba env create -f environments/cupid-analysis.yml
fi

#Activate CUPiD conda environemnt:
conda activate cupid-dev
#NOTE: You may see a red ": 1" message below, but it can be ignored.

#Check that cupid-run can be accessed appropriately:
which cupid-run
if [ $? -ne 0 ]; then
  #If not then use pip to install:
  pip install -e .
fi

CUPiD is controlled via a config YAML file. Here we create a new directory and write the relevant config file for our tutorial simulation. Please note that if your tutorial simulations didn’t finish then you can use the provided simulations instead:

cd examples         #Go to the examples directory
if ! [ -d "cesm_tutorial" ]; then #Check if CESM tutorial directory already exists.
  mkdir cesm_tutorial #If not, then make a new directory to hold our config file
fi 
cd cesm_tutorial    #Go to newly made CESM tutorial example directory
cat << EOF > config.yml
################## SETUP ##################

#NOTE:  CUPiD ocean diagnostics are currently only designed for upcoming MOM6
#       ocean model, so for this tutorial we will only do example atmosphere,
#       land, and sea ice diagnostics.

################
# Data Sources #
################
data_sources:
    # sname is any string used as a nickname for this configuration. It will be
    ### used as the name of the folder your computed notebooks are put in
    sname: cesm_tutorial_quick_run

    # run_dir is the path to the folder you want
    ### all the files associated with this configuration
    ### to be created in
    run_dir: .

    # nb_path_root is the path to the folder that cupid will
    ### look for your template notebooks in. It doesn't have to
    ### be inside run_dir, or be specific to this project, as
    ### long as the notebooks are there
    nb_path_root: ../nblibrary

######################
# Computation Config #
######################

computation_config:

    # default_kernel_name is the name of the environment that
    ### the notebooks in this configuration will be run in by default.
    ### It must already be installed on your machine. You can also
    ### specify a different environment than the default for any
    ### notebook in NOTEBOOK CONFIG

    default_kernel_name: cupid-analysis

############# NOTEBOOK CONFIG #############

############################
# Notebooks and Parameters #
############################

# All parameters under global_params get passed to all the notebooks

global_params:
  CESM_output_dir: /glade/derecho/scratch/${USER}/archive
  #Uncomment code here if you need a complete CESM tutorial simulation:
  #CESM_output_dir: /glade/campaign/cesm/tutorial/tutorial_2023_archive
  lc_kwargs:
    threads_per_worker: 1

timeseries:
  # This section of the config file controls the time series generator, which
  # takes standard CESM history (time-slice) files and converts them into single
  # variable time series files.
 
  num_procs: 8
  ts_done: [False]
  overwrite_ts: [False]
  ts_output_dir: /glade/derecho/scratch/${USER}/archive
  case_name: 'b1850.run_length'

  #Variables can either be provided as a list (e.g. ['X', 'Y', 'Z']) or,
  #if you want to convert everything on the file, by using the ['process_all']
  #keyword.  For the example below we'll only convert a single variable
  #from each component.

  atm:
    vars: ['PSL']
    derive_vars: []
    hist_str: 'h0'
    start_years: [1]
    end_years: [3]
    level: 'lev'

  lnd:
    vars: ['ALTMAX']
    derive_vars: []
    hist_str: 'h0'
    start_years: [1]
    end_years: [3]
    level: 'lev'

  ocn:
    vars: [] # Not doing ocean analyses
    derive_vars: []
    hist_str: 'h'
    start_years: [1]
    end_years: [3]
    level: 'lev'

  ice:
    vars: ['hi']
    derive_vars: []
    hist_str: 'h'
    start_years: [1]
    end_years: [3]
    level: 'lev'

  glc:
    vars: ['usurf']
    derive_vars: []
    hist_str: 'initial_hist'
    start_years: [1]
    end_years: [3]
    level: 'lev'

compute_notebooks:

  # This is where all the notebooks you want run and their
  # parameters are specified. Several examples of different
  # types of notebooks are provided.

  # The first key (here infrastructure) is the name of the
  # notebook from nb_path_root, minus the .ipynb

    infrastructure:
      index:
        parameter_groups:
          none: {}

    atm:
      nmse_PSL:
        parameter_groups:
          none:
            validation_path: '/glade/campaign/cesm/development/cross-wg/diagnostic_framework/nmse_validation/fv1.9x2.5'
            start_date: '0001-01-01'
            end_date: '0004-01-01'
            regridded_output: False


########### JUPYTER BOOK CONFIG ###########

##################################
# Jupyter Book Table of Contents #
##################################
book_toc:

  # See https://jupyterbook.org/en/stable/structure/configure.html for
  # complete documentation of Jupyter book construction options

  format: jb-book

  # All filenames are notebook filename without the .ipynb, similar to above

  root: infrastructure/index # root is the notebook that will be the homepage for the book
  parts:

    # Parts group notebooks into different sections in the Jupyter book
    # table of contents, so you can organize different parts of your project.
    # Each chapter is the name of one of the notebooks that you executed
    # in compute_notebooks above, also without .ipynb

    - caption: Atmosphere
      chapters:
        - file: atm/nmse_PSL


#####################################
# Keys for Jupyter Book _config.yml #
#####################################
book_config_keys:

  title: CESM Tutorial - CUPiD  # Title of your jupyter book

  # Other keys can be added here, see https://jupyterbook.org/en/stable/customize/config.html
  ### for many more options   
EOF

Now we are ready to run CUPiD!

Generating time series and Computing NMSE diagnostic#

One of CUPiD’s currently-working functions is to help convert CESM history files into single-variable time series files, which are required for various different diagnostic systems, as well for submitting to CMIP. Here we can create some time series files from the tutorial simulation using the config file we just created above and the -ts flag.

module load nco #Currently the NetCDF Operators are needed by CUPiD in order to run the time series generator
cupid-run --time-series

Now let’s check if the time series files were generated successfully, by examining the directory we are writing the time series files to. In the below command replace “COMP” with your components of choice (atm, lnd, ocn, ice, or glc):

ts_data_path="/glade/derecho/scratch/${USER}/archive/COMP/proc/tseries"
ls $ts_data_path

Do you see a NetCDF file there? How does the naming convention compare to standard CESM history output? Now let’s check inside the file:

ncdump -h $ts_data_path/*.nc

How do these results compare to a standard CESM history file? What is the same? What is different?

Looking at Diagnostics#

Besides creating time series data sets, the cupid-run command also ran the examples/nblibrary/atm/nmse_PSL.ipynb notebook with parameters set in config.yml. The notebooks in examples/nblibrary/atm/ are templates, but the completed notebook is in examples/key_metrics/computed_notebooks/key_metrics/atm/nmse_PSL.ipynb. You can open this notebook directly via JupyterHub, or CUPiD can use Jupyter Book to create a webpage:

cupid-build

When this command finishes running, you will have a new directory: examples/key_metrics/computed_notebooks/key_metrics/_build/html/. The easiest way to view these files is to look at them via JupyterHub, although its HTML parsing is not the best. You could also run firefox from casper (note that the browser is not installed on derecho!):

firefox &

and then point your browser to examples/key_metrics/computed_notebooks/key_metrics/_build/html/index.html, or copy the entire html/ directory to your laptop.