Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Annual Maximum JJA Heat Index at Boulder, CO and Chicago, IL (1990–2005)

ERA5 and JRA-3Q Reanalyses from NCAR GDEX

Scientific motivation

This notebook computes the annual maximum June–July–August (JJA) heat index at two mid-latitude United States cities — Boulder, Colorado (semi-arid, elevation 1655 m) and Chicago, Illinois (humid continental, Lake Michigan influence) — using two independent global reanalyses spanning 1990–2005.

The analysis follows the methodology of Romps (2024), who demonstrated using ERA5 that the annual maximum summer heat index in Texas has increased at a rate several times larger than the contemporaneous increase in dry-bulb temperature. The physical basis for this amplification lies in the nonlinearity of the heat index with respect to temperature and humidity: at near-saturated conditions, the Clausius–Clapeyron relation implies that a given increment of warming produces a disproportionately large increase in physiological heat stress. Chicago, situated in a persistently humid air mass, is expected to exhibit stronger amplification than semi-arid Boulder, where the heat index remains close to the dry-bulb temperature throughout the summer.

Datasets

ProductGDEX identifierTemporal resolutionQuantity archivedAccess format
ERA5 surface analysisd633000Hourly instantaneousVAR_2T, VAR_2DNative Zarr
JRA-3Q surface analysis (anl_surf125)d6400006-hourly instantaneousTMP_GDS0_HTGL, RH_GDS0_HTGLkerchunk reference

Because ERA5 provides 24 candidate values per day versus 4 for JRA-3Q, computing the JJA season maximum from the full hourly ERA5 stream would systematically inflate ERA5 relative to JRA-3Q through a sample-size effect alone. To remove this artifact, ERA5 is subsampled to one randomly selected observation per 6-hour window prior to computing the annual maximum, matching the JRA-3Q cadence. A random draw, rather than a fixed synoptic hour, is used to avoid introducing a systematic diurnal-phase bias into the sampled ERA5 distribution. We re-sample instead of taking a mean over a 6-hour window because the average might miss an extreme heat index value in this window.

Heat index formulation

The heat index is computed using the heatindex Python package of Lu & Romps (2025), which implements the extended formulation of Lu & Romps (2022) in python. The function takes air temperature in Kelvin and relative humidity on the interval [0,1][0,\,1] and returns the heat index in Kelvin:

hi.heatindex(T_K, rh)  →  HI_K

ERA5 does not archive near-surface relative humidity directly. It is derived from the 2-m temperature (VAR_2T) and 2-m dew-point temperature (VAR_2D) via the August–Roche–Magnus approximation, consistent with the ECMWF IFS documentation. JRA-3Q provides relative humidity in percent as RH_GDS0_HTGL in the surface analysis product.

Required Packages

Please make sure to install the packages before moving forward

  • intake

  • intake-esm >= 2025.12.12

  • xarray

  • dask

  • zarr

  • kerchunk

  • numpy

  • pandas

  • matplotlib

  • heatindex >= 0.0.2

!pip install heatindex
Requirement already satisfied: heatindex in /glade/u/home/harshah/.conda/envs/osdf/lib/python3.11/site-packages (0.0.2)
Requirement already satisfied: numpy in /glade/u/home/harshah/.conda/envs/osdf/lib/python3.11/site-packages (from heatindex) (2.3.5)
import matplotlib.pyplot as plt
import numpy as np
import os
import xarray as xr
import intake
import intake_esm
import pandas as pd
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import dask 
from dask_jobqueue import PBSCluster
from dask.distributed import Client
/glade/u/home/harshah/.conda/envs/osdf/lib/python3.11/site-packages/pyproj/network.py:59: UserWarning: pyproj unable to set PROJ database path.
  _set_context_ca_bundle_path(ca_bundle_path)

Locate the Dataset

On the NCAR GDEX portal, go to the Data Access tab for the ERA5/ JRA-3Q dataset to find the intake-ESM catalogs needed to access data. In this notebook we will use GDEX OSDF catalog.

# Set up your scratch folder path - Ignore these lines if you are not on NCAR's cluster
username       = os.environ["USER"]
glade_scratch  = "/glade/derecho/scratch/" + username
print(glade_scratch)
/glade/derecho/scratch/harshah
# OSDF based data access 
# era5_catalog  = 'https://data.gdex.ucar.edu/d633000/catalogs/d633000-osdf.json'

# jra3q_catalog = 'https://data.gdex.ucar.edu/d640000/catalogs/d640000-osdf.json' 
era5_catalog  = 'https://data.gdex.ucar.edu/d633000/catalogs/d633000-https.json'
jra3q_catalog = 'https://data.gdex.ucar.edu/d640000/catalogs/d640000-https.json'

Step 2 - Set up cluster

  • Please set these two flags to False if you are not on NCAR’s HPC cluster or not using a dask gateway.

  • Setting these flags to False immediately selects a local cluster which can run on your personal device

USE_PBS_SCHEDULER = True # Use NCAR's HPC cluster 
USE_DASK_GATEWAY  = False 
# Create a PBS cluster object
def get_pbs_cluster():
    """ Create cluster through dask_jobqueue.   
    """
    from dask_jobqueue import PBSCluster
    cluster = PBSCluster(
        job_name = 'dask-osdf-24',
        cores = 1,
        memory = '8GiB',
        processes = 1,
        local_directory = glade_scratch + '/dask/spill',
        log_directory = glade_scratch + '/dask/logs/',
        resource_spec = 'select=1:ncpus=1:mem=8GB',
        queue = 'casper',
        walltime = '3:00:00',
        #interface = 'ib0'
        interface = 'ext'
    )
    return cluster

def get_gateway_cluster():
    """ Create cluster through dask_gateway
    """
    from dask_gateway import Gateway

    gateway = Gateway()
    cluster = gateway.new_cluster()
    cluster.adapt(minimum=2, maximum=4)
    return cluster

def get_local_cluster():
    """ Create cluster using the Jupyter server's resources
    """
    from distributed import LocalCluster, performance_report
    cluster = LocalCluster()    

    cluster.scale(5)
    return cluster
# Obtain dask cluster in one of three ways
if USE_PBS_SCHEDULER:
    cluster = get_pbs_cluster()
elif USE_DASK_GATEWAY:
    cluster = get_gateway_cluster()
else:
    cluster = get_local_cluster()

# Connect to cluster
from distributed import Client
client = Client(cluster)
/glade/u/home/harshah/.conda/envs/osdf/lib/python3.11/site-packages/distributed/node.py:188: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 36673 instead
  warnings.warn(
# Scale the cluster and display cluster dashboard URL
n_workers =5
cluster.scale(n_workers)
client.wait_for_workers(n_workers = n_workers)
cluster
Loading...

Open the catalog, find and load the variable of interest

%%time
era5_cat = intake.open_esm_datastore(era5_catalog)
jra3q_cat = intake.open_esm_datastore(jra3q_catalog)
era5_cat
CPU times: user 104 ms, sys: 10.8 ms, total: 115 ms
Wall time: 1.61 s
Loading...
# ── Time range ────────────────────────────────────────────────────────────────
YEAR_START_ERA5  = 1990
YEAR_START_JRA3Q = 1990
YEAR_END         = 2005

JJA_MONTHS = [6, 7, 8]

# ── Site coordinates (degrees north, degrees east 0–360) ──────────────────────
SITES = {
    "Boulder_CO" : {"lat":  40.01, "lon": 254.73},
    "Chicago_IL" : {"lat":  41.88, "lon": 272.37},
}

# ── Reproducibility ───────────────────────────────────────────────────────────
RANDOM_SEED = 42
  • Search for humidity and temperature variables in the catalog using the long_name and/or short_name columns. We only show the JRA-3Q example in the code below

jra3q_cat.df[['variable', 'short_name', 'long_name']].drop_duplicates().loc[
    jra3q_cat.df['long_name'].str.contains('2m temp|humid', case=False, na=False)]
Loading...
# ERA5: 2-m temperature and 2-m dew-point (GDEX convention)
ERA5_T_VAR  = "VAR_2T"
ERA5_TD_VAR = "VAR_2D"

era5_search = era5_cat.search(variable=[ERA5_T_VAR, ERA5_TD_VAR])
print(f"ERA5 matched: {len(era5_search.df):,} entries")

# JRA-3Q: 2-m temperature and 2-m relative humidity
# Variable names confirmed from the discovery step in the previous cell:
#   tmp2m-hgt-an-ll125  →  2-metre temperature, height level, analysis
#   rh2m-hgt-an-ll125   →  2-metre relative humidity, height level, analysis
JRA3Q_T_VAR  = "tmp2m-hgt-an-ll125"
JRA3Q_RH_VAR = "rh2m-hgt-an-ll125"

jra3q_search = jra3q_cat.search(variable=[JRA3Q_T_VAR, JRA3Q_RH_VAR])
print(f"JRA-3Q matched: {len(jra3q_search.df):,} entries")
ERA5 matched: 2 entries
JRA-3Q matched: 2 entries
#Use paths to the load the dataset
era5_t_path  = era5_search.df.loc[era5_search.df['variable'] == ERA5_T_VAR,  'path'].item()
era5_td_path = era5_search.df.loc[era5_search.df['variable'] == ERA5_TD_VAR, 'path'].item()

jra3q_t_path  = jra3q_search.df.loc[jra3q_search.df['variable'] == JRA3Q_T_VAR,  'path'].item()
jra3q_rh_path = jra3q_search.df.loc[jra3q_search.df['variable'] == JRA3Q_RH_VAR, 'path'].item()

print(era5_t_path)
print(era5_td_path)
print(jra3q_t_path)
print(jra3q_rh_path)
https://data.gdex.ucar.edu/d633000/e5.oper.an.sfc.zarr/e5.oper.an.sfc.2t.zarr
https://data.gdex.ucar.edu/d633000/e5.oper.an.sfc.zarr/e5.oper.an.sfc.2d.zarr
https://data.gdex.ucar.edu/d640000/kerchunk/anl_surf125-remote-https.parq
https://data.gdex.ucar.edu/d640000/kerchunk/anl_surf125-remote-https.parq
# Open using kerhcunk file paths
test_jra = xr.open_dataset('https://data.gdex.ucar.edu/d640000/kerchunk/anl_surf125-remote-osdf.parq',engine='kerchunk')
jra_temp = test_jra[JRA3Q_T_VAR]
jra_temp
/glade/u/home/harshah/.conda/envs/osdf/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
Loading...
test_jra = xr.open_dataset( 'https://data.gdex.ucar.edu/d640000/kerchunk/anl_surf125-remote-osdf.parq',engine='kerchunk')

jra_pt = test_jra[JRA3Q_T_VAR].sel(lat=SITES['Boulder_CO']['lat'],lon=SITES['Boulder_CO']['lon'],\
         method='nearest').sel(time=slice(str(YEAR_START_JRA3Q), str(YEAR_END)))
jra_pt
Loading...
# Load time coordinate into memory before boolean indexing
jra_pt = jra_pt.load()
jra_pt = jra_pt.sel(time=jra_pt.time.dt.month.isin(JJA_MONTHS))
print(jra_pt)
<xarray.DataArray 'tmp2m-hgt-an-ll125' (time: 5888)> Size: 24kB
dask.array<getitem, shape=(5888,), dtype=float32, chunksize=(1,), chunktype=numpy.ndarray>
Coordinates:
  * time     (time) datetime64[ns] 47kB 1990-06-01 ... 2005-08-31T18:00:00
    lat      float64 8B 40.0
    lon      float64 8B 255.0
Attributes: (12/75)
    group:                                               anl_surf125
    jma_group_description:                               Surface analysis fie...
    rda_group_description:                               JRA-3Q 1.25 degree s...
    rda_group_number:                                    25
    short_name:                                          tmp2m-hgt-an-ll125
    jma_short_name:                                      tmp
    ...                                                  ...
    gas_constant_for_dry_air:                            287.04 J K-1 kg-1
    specific_heat_of_dry_air_at_constant_pressure:       1004.6 J K-1 kg-1
    latent_heat_of_vaporization:                         2.507e+6 J kg-1
    solar_constant:                                      1365 W m-2
    jma_saturated_vapor_pressure:                        The saturated vapor ...
    jma_local_issues_with_sea_ice_parameters:            JRA-3Q contains erro...
jra_pt.plot()
/glade/u/home/harshah/.conda/envs/osdf/lib/python3.11/site-packages/distributed/client.py:3375: UserWarning: Sending large graph of size 10.70 MiB.
This may cause some slowdown.
Consider loading the data with Dask directly
 or using futures or delayed objects to embed the data into the graph without repetition.
See also https://docs.dask.org/en/stable/best-practices.html#load-data-with-dask for more information.
  warnings.warn(
---------------------------------------------------------------------------
error                                     Traceback (most recent call last)
Cell In[17], line 1
----> 1 jra_pt.plot()

File ~/.conda/envs/osdf/lib/python3.11/site-packages/xarray/plot/accessor.py:48, in DataArrayPlotAccessor.__call__(self, **kwargs)
     46 @functools.wraps(dataarray_plot.plot, assigned=("__doc__", "__annotations__"))
     47 def __call__(self, **kwargs) -> Any:
---> 48     return dataarray_plot.plot(self._da, **kwargs)

File ~/.conda/envs/osdf/lib/python3.11/site-packages/xarray/plot/dataarray_plot.py:277, in plot(darray, row, col, col_wrap, ax, hue, subplot_kws, **kwargs)
    226 def plot(
    227     darray: DataArray,
    228     *,
   (...)    235     **kwargs: Any,
    236 ) -> Any:
    237     """
    238     Default plot of DataArray using :py:mod:`matplotlib:matplotlib.pyplot`.
    239 
   (...)    273     xarray.DataArray.squeeze
    274     """
    275     darray = darray.squeeze(
    276         d for d, s in darray.sizes.items() if s == 1 and d not in (row, col, hue)
--> 277     ).compute()
    279     plot_dims = set(darray.dims)
    280     plot_dims.discard(row)

File ~/.conda/envs/osdf/lib/python3.11/site-packages/xarray/core/dataarray.py:1242, in DataArray.compute(self, **kwargs)
   1212 """Trigger loading data into memory and return a new dataarray.
   1213 
   1214 Data will be computed and/or loaded from disk or a remote source.
   (...)   1239 Variable.compute
   1240 """
   1241 new = self.copy(deep=False)
-> 1242 return new.load(**kwargs)

File ~/.conda/envs/osdf/lib/python3.11/site-packages/xarray/core/dataarray.py:1168, in DataArray.load(self, **kwargs)
   1138 def load(self, **kwargs) -> Self:
   1139     """Trigger loading data into memory and return this dataarray.
   1140 
   1141     Data will be computed and/or loaded from disk or a remote source.
   (...)   1166     Variable.load
   1167     """
-> 1168     ds = self._to_temp_dataset().load(**kwargs)
   1169     new = self._from_temp_dataset(ds)
   1170     self._variable = new._variable

File ~/.conda/envs/osdf/lib/python3.11/site-packages/xarray/core/dataset.py:558, in Dataset.load(self, **kwargs)
    555 chunkmanager = get_chunked_array_type(*chunked_data.values())
    557 # evaluate all the chunked arrays simultaneously
--> 558 evaluated_data: tuple[np.ndarray[Any, Any], ...] = chunkmanager.compute(
    559     *chunked_data.values(), **kwargs
    560 )
    562 for k, data in zip(chunked_data, evaluated_data, strict=False):
    563     self.variables[k].data = data

File ~/.conda/envs/osdf/lib/python3.11/site-packages/xarray/namedarray/daskmanager.py:85, in DaskManager.compute(self, *data, **kwargs)
     80 def compute(
     81     self, *data: Any, **kwargs: Any
     82 ) -> tuple[np.ndarray[Any, _DType_co], ...]:
     83     from dask.array import compute
---> 85     return compute(*data, **kwargs)

File ~/.conda/envs/osdf/lib/python3.11/site-packages/dask/base.py:685, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
    682     expr = expr.optimize()
    683     keys = list(flatten(expr.__dask_keys__()))
--> 685     results = schedule(expr, keys, **kwargs)
    687 return repack(results)

File ~/.conda/envs/osdf/lib/python3.11/site-packages/xarray/core/indexing.py:670, in __array__()
    666 def __array__(
    667     self, dtype: DTypeLike | None = None, /, *, copy: bool | None = None
    668 ) -> np.ndarray:
    669     if Version(np.__version__) >= Version("2.0.0"):
--> 670         return np.asarray(self.get_duck_array(), dtype=dtype, copy=copy)
    671     else:
    672         return np.asarray(self.get_duck_array(), dtype=dtype)

File ~/.conda/envs/osdf/lib/python3.11/site-packages/xarray/core/indexing.py:675, in get_duck_array()
    674 def get_duck_array(self):
--> 675     return self.array.get_duck_array()

File ~/.conda/envs/osdf/lib/python3.11/site-packages/xarray/core/indexing.py:908, in get_duck_array()
    907 def get_duck_array(self):
--> 908     return self.array.get_duck_array()

File ~/.conda/envs/osdf/lib/python3.11/site-packages/xarray/coding/common.py:80, in get_duck_array()
     79 def get_duck_array(self):
---> 80     return self.func(self.array.get_duck_array())

File ~/.conda/envs/osdf/lib/python3.11/site-packages/xarray/core/indexing.py:748, in get_duck_array()
    745 from xarray.backends.common import BackendArray
    747 if isinstance(self.array, BackendArray):
--> 748     array = self.array[self.key]
    749 else:
    750     array = apply_indexer(self.array, self.key)

File ~/.conda/envs/osdf/lib/python3.11/site-packages/xarray/backends/zarr.py:262, in __getitem__()
    260 elif isinstance(key, indexing.OuterIndexer):
    261     method = self._oindex
--> 262 return indexing.explicit_indexing_adapter(
    263     key, array.shape, indexing.IndexingSupport.VECTORIZED, method
    264 )

File ~/.conda/envs/osdf/lib/python3.11/site-packages/xarray/core/indexing.py:1140, in explicit_indexing_adapter()
   1118 """Support explicit indexing by delegating to a raw indexing method.
   1119 
   1120 Outer and/or vectorized indexers are supported by indexing a second time
   (...)   1137 Indexing result, in the form of a duck numpy-array.
   1138 """
   1139 raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support)
-> 1140 result = raw_indexing_method(raw_key.tuple)
   1141 if numpy_indices.tuple:
   1142     # index the loaded duck array
   1143     indexable = as_indexable(result)

File ~/.conda/envs/osdf/lib/python3.11/site-packages/xarray/backends/zarr.py:225, in _getitem()
    224 def _getitem(self, key):
--> 225     return self._array[key]

File ~/.conda/envs/osdf/lib/python3.11/site-packages/zarr/core/array.py:2868, in __getitem__()
   2866     return self.vindex[cast("CoordinateSelection | MaskSelection", selection)]
   2867 elif is_pure_orthogonal_indexing(pure_selection, self.ndim):
-> 2868     return self.get_orthogonal_selection(pure_selection, fields=fields)
   2869 else:
   2870     return self.get_basic_selection(cast("BasicSelection", pure_selection), fields=fields)

File ~/.conda/envs/osdf/lib/python3.11/site-packages/zarr/core/array.py:3339, in get_orthogonal_selection()
   3337     prototype = default_buffer_prototype()
   3338 indexer = OrthogonalIndexer(selection, self.shape, self.metadata.chunk_grid)
-> 3339 return sync(
   3340     self.async_array._get_selection(
   3341         indexer=indexer, out=out, fields=fields, prototype=prototype
   3342     )
   3343 )

File ~/.conda/envs/osdf/lib/python3.11/site-packages/zarr/core/sync.py:159, in sync()
    156 return_result = next(iter(finished)).result()
    158 if isinstance(return_result, BaseException):
--> 159     raise return_result
    160 else:
    161     return return_result

File ~/.conda/envs/osdf/lib/python3.11/site-packages/zarr/core/sync.py:119, in _runner()
    114 """
    115 Await a coroutine and return the result of running it. If awaiting the coroutine raises an
    116 exception, the exception will be returned.
    117 """
    118 try:
--> 119     return await coro
    120 except Exception as ex:
    121     return ex

File ~/.conda/envs/osdf/lib/python3.11/site-packages/zarr/core/array.py:1565, in _get_selection()
   1562         _config = replace(_config, order=self.order)
   1564     # reading chunks and decoding them
-> 1565     await self.codec_pipeline.read(
   1566         [
   1567             (
   1568                 self.store_path / self.metadata.encode_chunk_key(chunk_coords),
   1569                 self.metadata.get_chunk_spec(chunk_coords, _config, prototype=prototype),
   1570                 chunk_selection,
   1571                 out_selection,
   1572                 is_complete_chunk,
   1573             )
   1574             for chunk_coords, chunk_selection, out_selection, is_complete_chunk in indexer
   1575         ],
   1576         out_buffer,
   1577         drop_axes=indexer.drop_axes,
   1578     )
   1579 if isinstance(indexer, BasicIndexer) and indexer.shape == ():
   1580     return out_buffer.as_scalar()

File ~/.conda/envs/osdf/lib/python3.11/site-packages/zarr/core/codec_pipeline.py:473, in read()
    467 async def read(
    468     self,
    469     batch_info: Iterable[tuple[ByteGetter, ArraySpec, SelectorTuple, SelectorTuple, bool]],
    470     out: NDBuffer,
    471     drop_axes: tuple[int, ...] = (),
    472 ) -> None:
--> 473     await concurrent_map(
    474         [
    475             (single_batch_info, out, drop_axes)
    476             for single_batch_info in batched(batch_info, self.batch_size)
    477         ],
    478         self.read_batch,
    479         config.get("async.concurrency"),
    480     )

File ~/.conda/envs/osdf/lib/python3.11/site-packages/zarr/core/common.py:116, in concurrent_map()
    113     async with sem:
    114         return await func(*item)
--> 116 return await asyncio.gather(*[asyncio.ensure_future(run(item)) for item in items])

File ~/.conda/envs/osdf/lib/python3.11/site-packages/zarr/core/common.py:114, in run()
    112 async def run(item: tuple[Any]) -> V:
    113     async with sem:
--> 114         return await func(*item)

File ~/.conda/envs/osdf/lib/python3.11/site-packages/zarr/core/codec_pipeline.py:275, in read_batch()
    269 else:
    270     chunk_bytes_batch = await concurrent_map(
    271         [(byte_getter, array_spec.prototype) for byte_getter, array_spec, *_ in batch_info],
    272         lambda byte_getter, prototype: byte_getter.get(prototype),
    273         config.get("async.concurrency"),
    274     )
--> 275     chunk_array_batch = await self.decode_batch(
    276         [
    277             (chunk_bytes, chunk_spec)
    278             for chunk_bytes, (_, chunk_spec, *_) in zip(
    279                 chunk_bytes_batch, batch_info, strict=False
    280             )
    281         ],
    282     )
    283     for chunk_array, (_, chunk_spec, chunk_selection, out_selection, _) in zip(
    284         chunk_array_batch, batch_info, strict=False
    285     ):
    286         if chunk_array is not None:

File ~/.conda/envs/osdf/lib/python3.11/site-packages/zarr/core/codec_pipeline.py:195, in decode_batch()
    190     chunk_bytes_batch = await bb_codec.decode(
    191         zip(chunk_bytes_batch, chunk_spec_batch, strict=False)
    192     )
    194 ab_codec, chunk_spec_batch = ab_codec_with_spec
--> 195 chunk_array_batch = await ab_codec.decode(
    196     zip(chunk_bytes_batch, chunk_spec_batch, strict=False)
    197 )
    199 for aa_codec, chunk_spec_batch in aa_codecs_with_spec[::-1]:
    200     chunk_array_batch = await aa_codec.decode(
    201         zip(chunk_array_batch, chunk_spec_batch, strict=False)
    202     )

File ~/.conda/envs/osdf/lib/python3.11/site-packages/zarr/abc/codec.py:159, in decode()
    143 async def decode(
    144     self,
    145     chunks_and_specs: Iterable[tuple[CodecOutput | None, ArraySpec]],
    146 ) -> Iterable[CodecInput | None]:
    147     """Decodes a batch of chunks.
    148     Chunks can be None in which case they are ignored by the codec.
    149 
   (...)    157     Iterable[CodecInput | None]
    158     """
--> 159     return await _batching_helper(self._decode_single, chunks_and_specs)

File ~/.conda/envs/osdf/lib/python3.11/site-packages/zarr/abc/codec.py:467, in _batching_helper()
    463 async def _batching_helper(
    464     func: Callable[[CodecInput, ArraySpec], Awaitable[CodecOutput | None]],
    465     batch_info: Iterable[tuple[CodecInput | None, ArraySpec]],
    466 ) -> list[CodecOutput | None]:
--> 467     return await concurrent_map(
    468         list(batch_info),
    469         _noop_for_none(func),
    470         config.get("async.concurrency"),
    471     )

File ~/.conda/envs/osdf/lib/python3.11/site-packages/zarr/core/common.py:116, in concurrent_map()
    113     async with sem:
    114         return await func(*item)
--> 116 return await asyncio.gather(*[asyncio.ensure_future(run(item)) for item in items])

File ~/.conda/envs/osdf/lib/python3.11/site-packages/zarr/core/common.py:114, in run()
    112 async def run(item: tuple[Any]) -> V:
    113     async with sem:
--> 114         return await func(*item)

File ~/.conda/envs/osdf/lib/python3.11/site-packages/zarr/abc/codec.py:480, in wrap()
    478 if chunk is None:
    479     return None
--> 480 return await func(chunk, chunk_spec)

File ~/.conda/envs/osdf/lib/python3.11/site-packages/zarr/codecs/_v2.py:41, in _decode_single()
     39 if self.filters:
     40     for f in reversed(self.filters):
---> 41         chunk = await asyncio.to_thread(f.decode, chunk)
     43 # view as numpy array with correct dtype
     44 chunk = ensure_ndarray_like(chunk)

File ~/.conda/envs/osdf/lib/python3.11/asyncio/threads.py:25, in to_thread()
     23 ctx = contextvars.copy_context()
     24 func_call = functools.partial(ctx.run, func, *args, **kwargs)
---> 25 return await loop.run_in_executor(None, func_call)

File ~/.conda/envs/osdf/lib/python3.11/site-packages/numcodecs/zlib.py:37, in decode()
     34     out = ensure_contiguous_ndarray(out)
     36 # do decompression
---> 37 dec = _zlib.decompress(buf)
     39 # handle destination - Python standard library zlib module does not
     40 # support direct decompression into buffer, so we have to copy into
     41 # out if given
     42 return ndarray_copy(dec, out)

error: Error -3 while decompressing data: incorrect header check
%%time
jra_pt = jra_temp.sel(time=slice(str(YEAR_START_JRA3Q), str(YEAR_END))).sel(time=jra_temp.time.dt.month.isin(JJA_MONTHS))\
.sel(lat=lat,lon=lon,method="nearest")
#
jra_pt
jra3q_dsets = jra3q_search.to_dataset_dict(xarray_open_kwargs={'engine':'kerchunk',"chunks": {}})
# Select only required variables from JRA-3Q immediately
jra3q_dsets = {k: v[[JRA3Q_T_VAR, JRA3Q_RH_VAR]] for k, v in jra3q_dsets.items()}
jra3q_dsets
%%time
era5_dsets = era5_search.to_dataset_dict()
#Inspect keys 
print(era5_dsets.keys())
jra3q_dsets.keys()
jra3q_dsets['tmp2m-hgt-an-ll125.tmp2m-hgt-an-ll125']

Extract data

  • Resample ERA5 data to 6hr frequency

  • Subset the data in space (Colorado and Chicago) and time

# Subset to analysis period and JJA
era5_points  = {}
jra3q_points = {}

for site, coords in SITES.items():
    lat, lon = coords["lat"], coords["lon"]

    # ERA5: merge variables, subset time and JJA
    era5_pt = xr.merge(
        [era5_dsets['VAR_2T.2t'][[ERA5_T_VAR]],
         era5_dsets['VAR_2D.2d'][[ERA5_TD_VAR]]]
    ).sel(time=slice(str(YEAR_START_ERA5), str(YEAR_END)))
    era5_pt = era5_pt.sel(time=era5_pt.time.dt.month.isin(JJA_MONTHS))

    # JRA-3Q: select variables, subset time and JJA
    jra3q_pt = jra3q_dsets[list(jra3q_dsets.keys())[0]][[JRA3Q_T_VAR, JRA3Q_RH_VAR]]
    jra3q_pt = jra3q_pt.sel(time=slice(str(YEAR_START_JRA3Q), str(YEAR_END)))
    jra3q_pt = jra3q_pt.sel(time=jra3q_pt.time.dt.month.isin(JJA_MONTHS))

    # Extract nearest grid cell for each site
    era5_pt   = era5_pt.sel(latitude=lat,  longitude=lon, method="nearest")
    jra3q_pt  = jra3q_pt.sel(lat=lat,      lon=lon,       method="nearest")

    # Load JRA-3Q into memory immediately — the full global field has
    # 114,572 single-timestep chunks; once reduced to a single grid cell
    # the data is only ~100 kB and loads instantly
    print(f"Loading JRA-3Q / {site} into memory ...")
    jra3q_pt = jra3q_pt.load()

    print(f"ERA5   / {site}: requested ({lat:.2f}°N, {lon:.2f}°E) → "
          f"nearest ({float(era5_pt.latitude):.3f}°N, {float(era5_pt.longitude):.3f}°E)")
    print(f"JRA-3Q / {site}: requested ({lat:.2f}°N, {lon:.2f}°E) → "
          f"nearest ({float(jra3q_pt.lat):.3f}°N, {float(jra3q_pt.lon):.3f}°E)")

    era5_points[site]  = era5_pt
    jra3q_points[site] = jra3q_pt
# Randomly subsample ERA5 to one observation per 6-hour window
# to match the JRA-3Q temporal cadence prior to computing the annual maximum

def subsample_6h_random(ds, seed=RANDOM_SEED):
    """
    Retain one randomly selected observation per 6-hour window.
    The selection mask is built from the time coordinate only,
    leaving data variables Dask-lazy.
    """
    times        = pd.DatetimeIndex(ds["time"].values)
    window_start = times.floor("6h")
    unique_wins  = np.unique(window_start)

    rng           = np.random.default_rng(seed)
    offsets_h     = rng.integers(0, 6, size=len(unique_wins)).astype(int)
    win_to_offset = dict(zip(unique_wins, offsets_h))

    pos_h = ((times - window_start).total_seconds() / 3600).astype(int)
    keep  = np.fromiter(
        (pos_h[i] == win_to_offset[window_start[i]] for i in range(len(times))),
        dtype=bool,
        count=len(times),
    )

    print(f"  {ds.sizes['time']:,} → {keep.sum():,} time steps retained "
          f"({keep.sum() / len(times) * 100:.1f}%, expected ≈16.7%)")
    return ds.isel(time=keep)


for site in SITES:
    era5_points[site] = subsample_6h_random(era5_points[site])
    print(f"ERA5 / {site}: subsampling complete")
# Derive relative humidity from ERA5 2-m temperature and dew-point temperature.
# Following Romps (2024, supplementary section 1.5), relative humidity is computed
# as the ratio of the saturation vapor pressure evaluated at the dew-point temperature
# to that evaluated at the air temperature. The saturation vapor pressure is
# approximated using the August-Roche-Magnus formula (Alduchov and Eskridge, 1996).
#
# Alduchov, O.A. and Eskridge, R.E., 1996. Improved Magnus form approximation of
# saturation vapor pressure. Journal of Applied Meteorology and Climatology, 35(4),
# pp.601-609. DOI: 10.1175/1520-0450(1996)035<0601:IMFAOS>2.0.CO;2
#
# JRA-3Q provides relative humidity directly in %; convert to [0, 1].

def rh_from_T_Td(T_K, Td_K):
    a, b = 17.625, 243.04
    T_C  = T_K  - 273.15
    Td_C = Td_K - 273.15
    rh   = np.exp(a * Td_C / (b + Td_C)) / np.exp(a * T_C  / (b + T_C))
    return rh.clip(0.0, 1.0)


for site in SITES:
    era5_points[site] = era5_points[site].assign(
        rh=rh_from_T_Td(era5_points[site][ERA5_T_VAR],era5_points[site][ERA5_TD_VAR]))
    
    jra3q_points[site] = jra3q_points[site].chunk({"time": -1})
    #jra3q_points[site] = jra3q_points[site].compute()
    
    jra3q_points[site] = jra3q_points[site].assign(rh=(jra3q_points[site][JRA3Q_RH_VAR] / 100.0).clip(0.0, 1.0))

print("Relative humidity prepared for all sites.")

Data Analysis

%%time
from heatindex import heatindex

def compute_heatindex_xr(T_K, rh):
    hi_K = xr.apply_ufunc(
        heatindex, T_K, rh,
        dask="parallelized",
        output_dtypes=[float],
    )
    hi_K.attrs = {"units": "K", "long_name": "Heat Index (Lu and Romps, 2022)"}
    return hi_K

results = {}

# ── ERA5 first — stable Zarr-backed Dask computation ─────────────────────────
for site in SITES:
    results[site] = {}
    ds   = era5_points[site]
    T_K  = ds[ERA5_T_VAR]
    rh   = ds["rh"]
    hi_K = compute_heatindex_xr(T_K, rh)
    ann_max = hi_K.groupby("time.year").max(dim="time")
    print(f"Computing ERA5 / {site} ...")
    results[site]["ERA5"] = (ann_max - 273.15).compute()
    print(f"  Done. Years: {int(results[site]['ERA5'].year[0])}–"
          f"{int(results[site]['ERA5'].year[-1])}")

print("ERA5 complete. Starting JRA-3Q ...")

# ── JRA-3Q — load point data into memory and process locally ─────────────────
for site in SITES:
    ds  = jra3q_points[site].load()
    T_K = ds[JRA3Q_T_VAR]
    rh  = ds["rh"]

    hi_K    = xr.apply_ufunc(heatindex, T_K, rh, output_dtypes=[float])
    ann_max = hi_K.groupby("time.year").max(dim="time")

    print(f"Computing JRA-3Q / {site} ...")
    results[site]["JRA-3Q"] = (ann_max - 273.15)
    print(f"  Done. Years: {int(results[site]['JRA-3Q'].year[0])}–"
          f"{int(results[site]['JRA-3Q'].year[-1])}")
# %%time
# from heatindex import heatindex

# def compute_heatindex_xr(T_K, rh):
#     """
#     Compute the extended heat index (Lu and Romps, 2022) using the
#     Python implementation of Romps (2024).
    
#     Parameters
#     ----------
#     T_K : xr.DataArray  — 2-m temperature in Kelvin
#     rh  : xr.DataArray  — relative humidity in [0, 1]

#     Returns
#     -------
#     xr.DataArray — heat index in Kelvin
#     """
#     hi_K = xr.apply_ufunc(
#         heatindex,
#         T_K,
#         rh,
#         dask="parallelized",
#         output_dtypes=[float],
#     )
#     hi_K.attrs = {"units": "K", "long_name": "Heat Index (Lu and Romps, 2022)"}
#     return hi_K


# # Compute heat index and annual maximum JJA value for each site
# results = {}

# for site in SITES:
#     results[site] = {}
#     for label, points in [("ERA5", era5_points), ("JRA-3Q", jra3q_points)]:
#         ds   = points[site]
#         T_K  = ds[ERA5_T_VAR]  if label == "ERA5" else ds[JRA3Q_T_VAR]
#         rh   = ds["rh"]

#         hi_K     = compute_heatindex_xr(T_K, rh)
#         ann_max  = hi_K.groupby("time.year").max(dim="time")

#         print(f"Computing {label} / {site} ...")
#         results[site][label] = (ann_max - 273.15).compute()
#         print(f"  Done. Years: {int(results[site][label].year[0])}–"
#               f"{int(results[site][label].year[-1])}")
results
%%time
fig, ax = plt.subplots(figsize=(10, 5))

colors = {"Boulder_CO": "tab:blue", "Chicago_IL": "tab:red"}
ls     = {"ERA5": "-", "JRA-3Q": "--"}

for site in SITES:
    for label, da in results[site].items():
        ax.plot(
            da.year,
            da.values,
            color     = colors[site],
            linestyle = ls[label],
            linewidth = 1.5,
            label     = f"{site.replace('_', ', ')} — {label}",
        )

ax.set_xlabel("Year")
ax.set_ylabel("Annual maximum JJA heat index (°C)")
ax.set_title("Annual Maximum JJA Heat Index\nBoulder, CO and Chicago, IL (1990–2005)")
ax.legend(framealpha=0.9)
ax.grid(linestyle=":", alpha=0.5)
plt.tight_layout()
plt.show()
  • The plot confirms our intuition: Chicago, a more humid place, has a higher JJA maximum heat index than Boulder

  • We see the infamous 1995 Chicago Heat Wave in the plot, but it is not captured by JRA-3Q!

  • For both Chicago and Boulder, the predictions from both the ERA5 and JRA-3Q reanalyses seem to broadly agree!

# Close the cluster
cluster.close()

References

  • Romps, D.M., 2024. Heat index extremes increasing several times faster than the air temperature, ERL, 2024 Environmental Research Letters

  • Lu, Y.-C. and Romps, D.M., 2022. Extending the heat index. Journal of Applied Meteorology and Climatology, 61(10), 1367–1383. DOI: Lu & Romps (2022)

  • Lu, Y.-C. and Romps, D.M., 2025. heatindex: Tools for Calculating Heat Stress, version 0.0.2. https://heatindex.org

  • Hersbach, H., et al., 2020. The ERA5 global reanalysis. Quarterly Journal of the Royal Meteorological Society, 146(730), 1999–2049. DOI: Hersbach et al. (2020)

  • Kosaka, Y., et al., 2024. The JRA-3Q Reanalysis. Journal of the Meteorological Society of Japan, 102, 49–109. DOI: KOSAKA et al. (2024)

References
  1. Lu, Y.-C., & Romps, D. M. (2022). Extending the Heat Index. Journal of Applied Meteorology and Climatology, 61(10), 1367–1383. 10.1175/jamc-d-22-0021.1
  2. Hersbach, H., Bell, B., Berrisford, P., Hirahara, S., Horányi, A., Muñoz‐Sabater, J., Nicolas, J., Peubey, C., Radu, R., Schepers, D., Simmons, A., Soci, C., Abdalla, S., Abellan, X., Balsamo, G., Bechtold, P., Biavati, G., Bidlot, J., Bonavita, M., … Thépaut, J. (2020). The ERA5 global reanalysis. Quarterly Journal of the Royal Meteorological Society, 146(730), 1999–2049. 10.1002/qj.3803
  3. KOSAKA, Y., KOBAYASHI, S., HARADA, Y., KOBAYASHI, C., NAOE, H., YOSHIMOTO, K., HARADA, M., GOTO, N., CHIBA, J., MIYAOKA, K., SEKIGUCHI, R., DEUSHI, M., KAMAHORI, H., NAKAEGAWA, T., TANAKA, T. Y., TOKUHIRO, T., SATO, Y., MATSUSHITA, Y., & ONOGI, K. (2024). The JRA-3Q Reanalysis. Journal of the Meteorological Society of Japan. Ser. II, 102(1), 49–109. 10.2151/jmsj.2024-004