API Reference#

Core Features#

x4c.core.load_dataset(path, shift_time=False, comp=None, hstr=None, grid=None, vn=None, **kws)#

Load a netCDF file and form a xarray.Dataset

Parameters:
  • path (str) – path to the netCDF file

  • shift_time (bool) – shift the time of the xarray.Dataset (the CESM1 output has a time shift)

  • comp (str) – the tag for CESM component, including “atm”, “ocn”, “lnd”, “ice”, and “rof”

  • grid (str) – the grid tag for the CESM output (e.g., ne16, g16)

  • vn (str) – variable name

x4c.core.open_mfdataset(paths, shift_time=False, comp=None, hstr=None, grid=None, vn=None, **kws)#

Open multiple netCDF files and form a xarray.Dataset in a lazy load mode

Parameters:
  • path (str) – path to the netCDF file

  • shift_time (bool) – shift the time of the xarray.Dataset (the default CESM output has a time shift)

  • comp (str) – the tag for general CESM components, including “atm”, “ocn”, “lnd”, “ice”, and “rof”

  • grid (str) – the grid tag for the CESM output (e.g., ne16, g16)

  • vn (str) – variable name

class x4c.core.XDataset(ds=None)#
annualize(months=None, days_weighted=False, time2year=False)#

Annualize/seasonalize a xarray.Dataset

Parameters:

months (list of int) – a list of integers to represent month combinations, e.g., None means calendar year annualization, [7,8,9] means JJA annualization, and [-12,1,2] means DJF annualization

property anom#

Compute monthly anomalies relative to the climatology.

This property subtracts the monthly climatology (from XDataset.climo) from the dataset to produce anomalies for each time step. The climatology is aligned by month before subtraction so that, e.g., all Januaries are compared against the January climatology.

Returns:

dataset of anomalies with the same coordinates as the original dataset.

Return type:

xarray.Dataset

property climo#

Compute the climatology (monthly mean) of the dataset.

This property groups the dataset by calendar month and computes the mean over the time dimension for each month. It also records the climo_period as a tuple (start_year, end_year) in the returned dataset’s attributes and preserves comp/grid attributes when present. If the grouping result uses a month coordinate it is renamed to time to keep downstream interfaces consistent.

Returns:

monthly climatology where the time coordinate indexes months (1-12). ds.attrs[‘climo_period’] documents the original temporal coverage used to compute the climatology.

Return type:

xarray.Dataset

property da#

get its xarray.DataArray version

get_plev(ps, vn=None, lev_mode='hybrid', **kws)#

Interpolate a hybrid-level field to pressure levels and return a Dataset.

This method converts a 3D atmospheric variable that is on hybrid model levels (a/k/a k-levels) into pressure levels using the provided surface pressure ps (either an xarray.DataArray or an xarray.Dataset that contains a variable named “PS”). It wraps geocat.comp.interpolation.interp_hybrid_to_pressure and returns a copy of the original Dataset with the requested variable replaced by its pressure-level version.

Parameters:
  • ps (xarray.DataArray or xarray.Dataset) – surface pressure. If a Dataset is passed the method will look for the variable named “PS”. Dimensions must align with the variable being interpolated.

  • vn (str, optional) – variable name in self.ds to interpolate. If not provided the method will use the dataset attribute ds.attrs[‘vn’] and self.da.

  • lev_mode (str, optional) – currently only supports “hybrid”. (Reserved for future expansion.)

  • **kws – additional keyword arguments forwarded to geocat.comp.interpolation.interp_hybrid_to_pressure. By default lev_dim is set to ‘lev’. If the dataset contains hyam/hybm arrays they will be passed automatically.

Returns:

a copy of self.ds with vn replaced by the pressure-level DataArray produced by the interpolation.

Return type:

xarray.Dataset

Notes

  • Requires geocat.comp to be available and the dataset to include the hybrid coefficients (hyam, hybm) when using hybrid vertical coordinates.

  • The returned dataset preserves the original dataset attributes and coordinate structure except that the specified variable is now on pressure levels.

regrid(dlon=1, dlat=1, weight_file=None, gs='T', method='bilinear', periodic=True)#

Regrid the CESM output to a normal lat/lon grid

Supported atmosphere regridding: ne16np4, ne16pg3, ne30np4, ne30pg3, ne120np4, ne120pg4 TO 1x1d / 2x2d. Supported ocean regridding: any grid similar to g16 TO 1x1d / 2x2d. For any other regridding, weight_file must be provided by the user.

For the atmosphere grid regridding, the default method is area-weighted; while for the ocean grid, the default is bilinear.

Parameters:
  • dlon (float) – longitude spacing

  • dlat (float) – latitude spacing

  • weight_file (str) – the path to an ESMF-generated weighting file for regridding

  • gs (str) – grid style in ‘T’ or ‘U’ for the ocean grid

  • method (str) – regridding method for the ocean grid

  • periodic (bool) – the assumption of the periodicity of the data when perform the regrid method

zavg(depth_top, depth_bot, vn=None)#

Vertically average an ocean/column field between two depths and return a Dataset.

The method selects the vertical range along the z_t coordinate from depth_top to depth_bot, applies area/volume weights provided by the dataset variable dz, computes the weighted mean over the vertical dimension, and returns a copy of the original Dataset with the specified variable replaced by its vertically averaged version.

Parameters:
  • depth_top (float) – upper bound of the vertical slice (same units as z_t).

  • depth_bot (float) – lower bound of the vertical slice (same units as z_t).

  • vn (str, optional) – variable name in self.ds to average. If not provided the method will use the dataset attribute ds.attrs[‘vn’] and self.da.

Returns:

a copy of self.ds with vn replaced by the vertically averaged DataArray.

Return type:

xarray.Dataset

Notes

  • This method expects a vertical coordinate named z_t and a thickness/weight variable named dz in the dataset. The weighting is dz (e.g., layer thickness) and the mean is taken over the z_t dimension.

class x4c.core.XDataArray(da=None)#
annualize(months=None, days_weighted=False)#

Annualize/seasonalize a xarray.DataArray

Parameters:

months (list of int) – a list of integers to represent month combinations, e.g., [7,8,9] means JJA annualization, and [-12,1,2] means DJF annualization

property ds#

get its xarray.Dataset version

geo_mean(ind=None, latlon_range=(-90, 90, 0, 360), **kws)#

The lat-weighted mean given a lat/lon range or a climate index name

Parameters:
  • latlon_range (tuple or list) – the lat/lon range for lat-weighted average in format of (lat_min, lat_max, lon_min, lon_max)

  • ind (str) –

    a climate index name; supported names include:

    • ’nino3.4’

    • ’nino1+2’

    • ’nino3’

    • ’nino4’

    • ’tpi’

    • ’wp’

    • ’dmi’

    • ’iobw’

get_plev(**kws)#

See: https://geocat-comp.readthedocs.io/en/v2024.04.0/user_api/generated/geocat.comp.interpolation.interp_hybrid_to_pressure.html

property gm#

the global area-weighted mean

property gs#

the global area-weighted sum

nearest2d(lat=None, lon=None, lat_coord='lat', lon_coord='lon', lat_dim='lat', lon_dim='lon')#

Select the nearest non-NaN grid point(s) for the given lat/lon targets.

Given one or more target lat/lon pairs, this method finds the nearest valid (non-NaN across non-spatial dims) grid cell in the DataArray and returns a concatenated DataArray with a new dimension site indexing the selected points.

Parameters:
  • lat (float or array-like) – target latitude(s).

  • lon (float or array-like) – target longitude(s).

  • lat_coord (str) – name of latitude coordinate in the DataArray.

  • lon_coord (str) – name of longitude coordinate in the DataArray.

  • lat_dim (str) – latitude dimension name.

  • lon_dim (str) – longitude dimension name.

Returns:

concatenated selections at nearest grid points with a new site coordinate.

Return type:

xarray.DataArray

property nhm#

the NH area-weighted mean

property nhs#

the NH area-weighted sum

plot(title=None, figsize=None, ax=None, latlon_range=None, add_clabels=False, clevels=None, clabel_kwargs=None, projection='Robinson', transform='PlateCarree', central_longitude=180, proj_args=None, bad_color='dimgray', add_gridlines=False, gridline_labels=True, gridline_style='--', ssv=None, log=False, vmin=None, vmax=None, coastline_zorder=99, coastline_width=1, site_markersizes=100, df_sites=None, colname_dict=None, gs='T', ux=False, site_marker_dict=None, site_color_dict=None, count_site_num=False, lgd_kws=None, legend=True, return_im=False, **kws)#

The plotting functionality

Parameters:
  • title (str) – figure title

  • figsize (tuple or list) – figure size in format of (w, h)

  • ax (matplotlib.axes) – a matplotlib.axes

  • latlon_range (tuple or list) – lat/lon range in format of (lat_min, lat_max, lon_min, lon_max)

  • projection (str) – a projection name supported by Cartopy

  • transform (str) – a projection name supported by Cartopy

  • central_longitude (float) – the central longitude of the map to plot

  • proj_args (dict) – other keyword arguments for projection

  • add_gridlines (bool) – if True, the map will be added with gridlines

  • gridline_labels (bool) – if True, the lat/lon ticklabels will appear

  • gridline_style (str) – the gridline style, e.g., ‘-’, ‘–’

  • ssv (xarray.DataArray) – a sea surface variable used for plotting the coastlines

  • gs (str) – grid style in ‘T’ or ‘U’ for the ocean grid

  • coastline_zorder (int) – the layer order for the coastlines

  • coastline_width (float) – the width of the coastlines

  • df_sites (pandas.DataFrame) – a pandas.DataFrame that stores the information of a collection of sites

  • colname_dict (dict) – a dictionary of column names for df_sites in the “key:value” format “assumed name:real name”

regrid(**kws)#

Regrid this DataArray by delegating to the parent Dataset regrid.

This wraps XDataset.regrid by converting the DataArray to a temporary Dataset, calling the dataset-level regrid helper, then extracting and returning the regridded DataArray. Any dataset-level lat/lon attributes added during the transformation are removed from the returned DataArray attributes for cleanliness.

Forwarded kwargs are the same as XDataset.regrid (e.g., dlon, dlat, weight_file, gs, method, periodic).

property shm#

the SH area-weighted mean

property shs#

the SH area-weighted sum

property somin#

the Southern Ocean min

property zm#

the zonal mean

CESM Postprocessing#

class x4c.case.History(root_dir, comps=['atm', 'ocn', 'lnd', 'ice', 'rof'], comps_info=None, casename=None, path_pattern='comp/hist/casename.hstr.date.nc', avoid_list=None)#

Handle CESM history files for a single case.

Provides utilities to discover history file paths, list time-series variables, split (isolate) variables into separate files, and re-merge them across time ranges. Designed to work with NCO tools and MPI for parallel operations.

bigbang(comp, hstr, output_dirpath, timespan=None, overwrite=True, nproc=1, vns=None)#

Split history files into per-variable files in parallel using MPI.

Each MPI rank handles a subset of (file,variable) tasks.

bigcrunch(comp, hstr, input_dirpath, output_dirpath, timespan=None, overwrite=True, nproc=1, compression=1, vns=None)#

Merge per-variable files back into timeseries files in parallel.

Coordinates work across MPI ranks similar to bigbang.

gen_ts(output_dirpath, staging_dirpath=None, comps=['atm', 'ocn', 'lnd', 'ice', 'rof'], timespan=None, timestep=None, timestep_unit='year', dir_structure='comp/proc/tseries/hstr', overwrite=True, nproc=1, compression=1)#

Generate timeseries files for selected components and timespans.

This orchestrates splitting (bigbang) and merging (bigcrunch) stages and moves results from staging to final output directories.

get_hstr_based_on_vn(vn)#

Return the first hstr that contains variable vn.

This searches across all components and hstrs and returns the matching hstr string or None if not found.

get_paths(comp, hstr, timespan=None)#

Return history file paths for a component/hstr optionally filtered by a timespan.

timespan may be provided in a variety of formats accepted by utils.parse_timespan.

get_ts_vns(comp, hstr, exclude_vars=['time', 'time_bnds', 'time_bounds', 'time_bound', 'time_written', 'date', 'datesec', 'date_written'])#

Return list of time-varying variable names for a given component and hstr by inspecting the first history file.

isolate_vn(vn, comp, hstr, in_path, output_dirpath, overwrite=True)#

Create a new netCDF file containing only variable vn from the input history file in_path.

Uses ncks to drop other variables and writes result to output_dirpath with a standardized filename.

merge_vn(hstr, vn, input_dirpath, output_dirpath, timespan=None, overwrite=True, compression=1)#

Concatenate per-variable files across time into a single file.

Uses ncrcat with optional compression level to produce an aggregated timeseries file for vn and hstr.

rm_timespan(timespan, comps=['atm', 'ice', 'ocn', 'rof', 'lnd'], nworkers=None, rehearsal=True)#

Rename the archive files within a timespan

Parameters:

timespan (tuple or list) – [start_year, end_year] with elements being integers

CESM Diagnostics#

class x4c.case.Timeseries(root_dir, grid_dict=None, casename=None, cesm_ver=3)#

CESM Timeseries case helper.

Manages discovery and loading of preprocessed CESM timeseries files produced by CESM postprocessing. Provides convenience methods to locate paths, load raw or derived diagnostics, compute spells, and create plots and seasonal means.

calc(spell: str, comp=None, timespan=None, load_idx=-1, recalculate=False, verbose=True, **kws)#

Compute a diagnostic spell and cache the result.

The spell string controls regridding, slicing, spatial/vertical averaging and other modifiers parsed by Spell. The final xarray DataArray is stored in self.diags[spell].

clear_ds(vn=None)#

Clear the existing .ds property

copy()#

Return a deep copy of this Timeseries instance.

get_comp_hstr(vn)#

Find all (component, hstr) pairs where vn is present.

get_paths(comp, hstr, vn, timespan=None)#

Return list of timeseries file paths for vn under comp/hstr.

If timespan is provided it filters the returned paths to those fully covering the requested interval.

get_ts(vn, comp, timespan=None, slicing=False, regrid=False, dlat=1, dlon=1)#

Open and return a Dataset for vn on comp.

Applies optional slicing and regridding before returning the Dataset; does not cache the result.

load(vn, vtype=None, comp=None, hstr=None, timespan=None, load_idx=-1, verbose=True, reload=False, **kws)#

Load a variable or derived diagnostic into self.ds.

Automatically detects whether vn is a raw timeseries or a derived diagnostic and loads or computes it. Results are stored in self.ds[vn].

plot(spell, t_idx=None, regrid=False, gs='T', ssv='SSH', recalculate_ssv=False, timespan=None, **kws)#

Plot a computed diagnostic spell.

Detects plot type (map, ts, zm, yz) from the DataArray and dispatches to the plotting helpers in diags/visual.

quickview(timespan=None, nrow=None, ncol=None, wspace=0.3, hspace=0.5, ax_loc=None, figsize=None, stat_period=-50, roll_int=50, ylim_dict=None, spells=None, recalculate=False)#

Create a multi-panel overview figure for a selection of spells.

Returns (fig, ax) where ax is a dict of axes keyed by spell keys.

save_means(vn, comp, output_dirpath, timespan, slicing=False, regrid=False, dlat=1, dlon=1, overwrite=False)#

Save seasonal and annual mean files for vn into output_dirpath.

Writes files for ANN, DJF, MAM, JJA and SON for the given timespan and optionally regrids results.