Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

CONUS 404 diagnostic plots

Data Access

  • This notebook illustrates how to make diagnostic plots using the CONUS 404 dataset hosted on NCAR’s Geoscience Data Exchange (GDEX).

  • https://gdex.ucar.edu/datasets/d559000/

  • This data is open access and can be accessed via 3 protocols

    1. POSIX (if you have access to NCAR’s HPC systems: Casper or Derecho)

    2. HTTPS

    3. OSDF using intake-ESM catalogs.

  • Learn about intake-ESM catalogs: https://intake-esm.readthedocs.io/en/stable/

# Imports 
import intake
import numpy as np
import pandas as pd
import xarray as xr
import seaborn as sns
import matplotlib.pyplot as plt
import os
import dask 
from dask_jobqueue import PBSCluster
from dask.distributed import Client
# Catalog URLs
cat_url     = 'https://data.gdex.ucar.edu/d559000/catalogs/d559000-posix.json' # POSIX access
# cat_url     = 'https://osdf-director.osg-htc.org/ncar/gdex/d559000/catalogs/d559000-osdf.json'
print(cat_url)
https://data.gdex.ucar.edu/d559000/catalogs/d559000-posix.json
# Set up your scratch folder path
username       = os.environ["USER"]
glade_scratch  = "/glade/derecho/scratch/" + username
print(glade_scratch)
/glade/derecho/scratch/harshah

Create a PBS cluster

# Create a PBS cluster object
cluster = PBSCluster(
    job_name = 'dask-wk25-hpc',
    cores = 1,
    memory = '10GiB',
    processes = 1,
    local_directory = glade_scratch+'/dask/spill/',
    log_directory = glade_scratch + '/dask/logs/',
    resource_spec = 'select=1:ncpus=1:mem=10GB',
    queue = 'casper',
    walltime = '5:00:00',
    #interface = 'ib0'
    interface = 'ext'
)
/glade/u/home/harshah/.conda/envs/osdf/lib/python3.11/site-packages/distributed/node.py:188: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 34393 instead
  warnings.warn(
2026-02-23 16:11:58,638 - tornado.application - ERROR - Uncaught exception GET /status/ws (127.0.0.1)
HTTPServerRequest(protocol='http', host='jupyterhub.hpc.ucar.edu', method='GET', uri='/status/ws', version='HTTP/1.1', remote_ip='127.0.0.1')
Traceback (most recent call last):
  File "/glade/u/home/harshah/.conda/envs/osdf/lib/python3.11/site-packages/tornado/websocket.py", line 965, in _accept_connection
    open_result = handler.open(*handler.open_args, **handler.open_kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/glade/u/home/harshah/.conda/envs/osdf/lib/python3.11/site-packages/tornado/web.py", line 3388, in wrapper
    return method(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/glade/u/home/harshah/.conda/envs/osdf/lib/python3.11/site-packages/bokeh/server/views/ws.py", line 149, in open
    raise ProtocolError("Token is expired. Configure the app with a larger value for --session-token-expiration if necessary")
bokeh.protocol.exceptions.ProtocolError: Token is expired. Configure the app with a larger value for --session-token-expiration if necessary
2026-02-23 16:12:01,394 - tornado.application - ERROR - Uncaught exception GET /status/ws (127.0.0.1)
HTTPServerRequest(protocol='http', host='jupyterhub.hpc.ucar.edu', method='GET', uri='/status/ws', version='HTTP/1.1', remote_ip='127.0.0.1')
Traceback (most recent call last):
  File "/glade/u/home/harshah/.conda/envs/osdf/lib/python3.11/site-packages/tornado/websocket.py", line 965, in _accept_connection
    open_result = handler.open(*handler.open_args, **handler.open_kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/glade/u/home/harshah/.conda/envs/osdf/lib/python3.11/site-packages/tornado/web.py", line 3388, in wrapper
    return method(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/glade/u/home/harshah/.conda/envs/osdf/lib/python3.11/site-packages/bokeh/server/views/ws.py", line 149, in open
    raise ProtocolError("Token is expired. Configure the app with a larger value for --session-token-expiration if necessary")
bokeh.protocol.exceptions.ProtocolError: Token is expired. Configure the app with a larger value for --session-token-expiration if necessary
2026-02-23 16:12:30,319 - tornado.application - ERROR - Uncaught exception GET /status/ws (127.0.0.1)
HTTPServerRequest(protocol='http', host='jupyterhub.hpc.ucar.edu', method='GET', uri='/status/ws', version='HTTP/1.1', remote_ip='127.0.0.1')
Traceback (most recent call last):
  File "/glade/u/home/harshah/.conda/envs/osdf/lib/python3.11/site-packages/tornado/websocket.py", line 965, in _accept_connection
    open_result = handler.open(*handler.open_args, **handler.open_kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/glade/u/home/harshah/.conda/envs/osdf/lib/python3.11/site-packages/tornado/web.py", line 3388, in wrapper
    return method(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/glade/u/home/harshah/.conda/envs/osdf/lib/python3.11/site-packages/bokeh/server/views/ws.py", line 149, in open
    raise ProtocolError("Token is expired. Configure the app with a larger value for --session-token-expiration if necessary")
bokeh.protocol.exceptions.ProtocolError: Token is expired. Configure the app with a larger value for --session-token-expiration if necessary
2026-02-23 16:12:38,796 - tornado.application - ERROR - Uncaught exception GET /status/ws (127.0.0.1)
HTTPServerRequest(protocol='http', host='jupyterhub.hpc.ucar.edu', method='GET', uri='/status/ws', version='HTTP/1.1', remote_ip='127.0.0.1')
Traceback (most recent call last):
  File "/glade/u/home/harshah/.conda/envs/osdf/lib/python3.11/site-packages/tornado/websocket.py", line 965, in _accept_connection
    open_result = handler.open(*handler.open_args, **handler.open_kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/glade/u/home/harshah/.conda/envs/osdf/lib/python3.11/site-packages/tornado/web.py", line 3388, in wrapper
    return method(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/glade/u/home/harshah/.conda/envs/osdf/lib/python3.11/site-packages/bokeh/server/views/ws.py", line 149, in open
    raise ProtocolError("Token is expired. Configure the app with a larger value for --session-token-expiration if necessary")
bokeh.protocol.exceptions.ProtocolError: Token is expired. Configure the app with a larger value for --session-token-expiration if necessary
2026-02-23 16:12:56,045 - tornado.application - ERROR - Uncaught exception GET /status/ws (127.0.0.1)
HTTPServerRequest(protocol='http', host='jupyterhub.hpc.ucar.edu', method='GET', uri='/status/ws', version='HTTP/1.1', remote_ip='127.0.0.1')
Traceback (most recent call last):
  File "/glade/u/home/harshah/.conda/envs/osdf/lib/python3.11/site-packages/tornado/websocket.py", line 965, in _accept_connection
    open_result = handler.open(*handler.open_args, **handler.open_kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/glade/u/home/harshah/.conda/envs/osdf/lib/python3.11/site-packages/tornado/web.py", line 3388, in wrapper
    return method(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/glade/u/home/harshah/.conda/envs/osdf/lib/python3.11/site-packages/bokeh/server/views/ws.py", line 149, in open
    raise ProtocolError("Token is expired. Configure the app with a larger value for --session-token-expiration if necessary")
bokeh.protocol.exceptions.ProtocolError: Token is expired. Configure the app with a larger value for --session-token-expiration if necessary
# Scale the cluster and display cluster dashboard URL
n_workers = 4
client = Client(cluster)
cluster.scale(n_workers)
client.wait_for_workers(n_workers = n_workers)
cluster
Loading...

Load CONUS 404 data from GDEX using an intake catalog

col = intake.open_esm_datastore(cat_url)
col
Loading...
  • col.df turns the catalog object into a pandas dataframe!

  • (Actually, it accesses the dataframe attribute of the catalog)

col.df
Loading...

Select data and plot

What if you don’t know the variable names ?

  • Use pandas logic to print out the short_name and long_name

col.df[['variable','long_name']]
Loading...

Temperature

  • Plot temperature for a random date

cat_temp = col.search(variable='T2')
cat_temp.df.head()
Loading...
cat_temp.df.head().values
array([['/glade/campaign/collections/rda/data/d559000/kerchunk/wy1980.2d.json', 'T2', 'reference', 'T2', <NA>, 'K', '1979-10-01', '1980-09-30 23:00:00', <NA>, <NA>, '0 days 01:00:00'], ['/glade/campaign/collections/rda/data/d559000/kerchunk/wy1981.2d.json', 'T2', 'reference', 'T2', <NA>, 'K', '1980-10-01', '1981-09-30 23:00:00', <NA>, <NA>, '0 days 01:00:00'], ['/glade/campaign/collections/rda/data/d559000/kerchunk/wy1982.2d.json', 'T2', 'reference', 'T2', <NA>, 'K', '1981-10-01', '1982-09-30 23:00:00', <NA>, <NA>, '0 days 01:00:00'], ['/glade/campaign/collections/rda/data/d559000/kerchunk/wy1983.2d.json', 'T2', 'reference', 'T2', <NA>, 'K', '1982-10-01', '1983-09-30 23:00:00', <NA>, <NA>, '0 days 01:00:00'], ['/glade/campaign/collections/rda/data/d559000/kerchunk/wy1984.2d.json', 'T2', 'reference', 'T2', <NA>, 'K', '1983-10-01', '1984-09-30 23:00:00', <NA>, <NA>, '0 days 01:00:00']], dtype=object)
%%time
test = xr.open_dataset('/gdex/data/d559000/kerchunk/wy1980.2d.json')
test
/glade/u/home/harshah/.conda/envs/osdf/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
CPU times: user 48.7 s, sys: 5.18 s, total: 53.9 s
Wall time: 54.2 s
Loading...
date = "1980-09-30"
test.T2.sel(Time=date,method='nearest').values
array([[297.47467, 297.46545, 297.45624, ..., 300.5632 , 300.6054 , 300.63654], [297.47638, 297.51083, 297.50455, ..., 300.54544, 300.5817 , 300.59372], [297.48605, 297.51703, 297.5169 , ..., 300.52386, 300.55124, 300.5514 ], ..., [286.27594, 286.27704, 286.28522, ..., 272.12216, 272.30072, 271.19397], [286.26907, 286.2781 , 286.27634, ..., 271.8276 , 271.85138, 270.7123 ], [286.26056, 286.263 , 286.26395, ..., 270.1154 , 270.14557, 270.44562]], shape=(1015, 1367), dtype=float32)
  • The data is organized in (virtual) zarr stores with one water year’s worth of data in one file

  • Select a year. This is done by selcting the start time to be Oct 1 of that year or the end time to be Sep 30 of the same year

  • This also means that if you want to request data for other days, say Jan 1 for the year YYYY, you first have to load the data for one year i.e., YYYY and then select the data for that particular day. This example is discussed below.

date = "2020-10-01"
# year = "2021"
cat_temp_subset = cat_temp.search(start_time = date)
cat_temp_subset
Loading...

Load data into xarray

%%time
# Load catalog entries for subset into a dictionary of xarray datasets, and open the first one.
dsets = cat_temp_subset.to_dataset_dict(xarray_open_kwargs={'engine':'kerchunk',"chunks": {}})
#
print(f"\nDataset dictionary keys:\n {dsets.keys()}")

--> The keys in the returned dictionary of datasets are constructed as follows:
	'variable.short_name'
Loading...
Loading...
# Load the first dataset and display a summary.
dataset_key = list(dsets.keys())[0]
# store_name = dataset_key + ".zarr"
print(dsets.keys())
ds = dsets[dataset_key]
ds = ds.T2
ds
%%time
desired_time = "2021-01-01T00"
ds.sel(Time=desired_time,method='nearest').plot(cmap='inferno')
cluster.close()