Regridding using xESMF and an existing weights file#
A fairly common request is to use an existing ESMF weights file to regrid a Xarray Dataset (1, 2). Applying weights in general should be easy: read weights then apply them using dot or tensordot on the input dataset.
In the Xarray/Dask/Pangeo ecosystem, xESMF provides an interface to ESMF for convenient regridding, includiing parallelization with Dask. Here we demonstrate how to use an existing ESMF weights file with xESMF specifically for CAM-SE.
CAM-SE is the Community Atmosphere Model with the spectral element dynamical core (Dennis et al, 2011)
The spectral element dynamical core, CAM-SE, is an unstructured grid that supports uniform resolutions based on the equiangular gnomonic cubed-sphere grid as well as a mesh refinement capability with local increases in resolution through conformal mesh refinement. CAM-SE is the default resolution for CESM2 high resolution capabilities … (Lauritzen et al. 2018).
The main challenge is the input dataset has one spatial dimension (ncol), while xESMF is hardcoded to expect two spatial dimensions (lat, lon). We solve that by adding a dummy dimension. At the end, we’ll make this plot of the vertically integrated water vapour transport (IVT)
%load_ext watermark
import hvplot.xarray
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr
import xesmf
%watermark -iv
sys : 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:26:04) [GCC 10.4.0]
numpy : 1.23.5
matplotlib: 3.6.2
xesmf : 0.6.3
json : 2.0.9
xarray : 2022.12.0
hvplot : 0.8.2
Read data#
First some file paths
data_file = "/glade/campaign/collections/cmip/CMIP6/iHESP/BHIST/HR/b.e13.BHISTC5.ne120_t12.cesm-ihesp-hires1.0.30-1920-2100.002/atm/proc/tseries/hour_6I/b.e13.BHISTC5.ne120_t12.cesm-ihesp-hires1.0.30-1920-2100.002.cam.h2.IVT.192001-192912.nc"
weight_file = "/glade/work/shields/SE_grid/map_ne120_to_0.23x0.31_bilinear.nc"
We read in the input data
data_in = xr.open_dataset(data_file, chunks={"time": 50})
data_in
<xarray.Dataset>
Dimensions: (lev: 30, ilev: 31, cosp_prs: 7, nbnd: 2, cosp_tau: 7,
cosp_scol: 50, cosp_ht: 40, cosp_sr: 15, cosp_sza: 5,
ncol: 777602, time: 14600)
Coordinates:
* lev (lev) float64 3.643 7.595 14.36 24.61 ... 957.5 976.3 992.6
* ilev (ilev) float64 2.255 5.032 10.16 18.56 ... 967.5 985.1 1e+03
* cosp_prs (cosp_prs) float64 900.0 740.0 620.0 500.0 375.0 245.0 90.0
* cosp_tau (cosp_tau) float64 0.15 0.8 2.45 6.5 16.2 41.5 219.5
* cosp_scol (cosp_scol) int32 1 2 3 4 5 6 7 8 ... 43 44 45 46 47 48 49 50
* cosp_ht (cosp_ht) float64 240.0 720.0 1.2e+03 ... 1.848e+04 1.896e+04
* cosp_sr (cosp_sr) float64 0.605 2.1 4.0 6.0 ... 70.0 539.5 1.004e+03
* cosp_sza (cosp_sza) float64 0.0 15.0 30.0 45.0 60.0
* time (time) object 1920-01-01 00:00:00 ... 1929-12-31 18:00:00
Dimensions without coordinates: nbnd, ncol
Data variables: (12/35)
hyam (lev) float64 dask.array<chunksize=(30,), meta=np.ndarray>
hybm (lev) float64 dask.array<chunksize=(30,), meta=np.ndarray>
P0 float64 ...
hyai (ilev) float64 dask.array<chunksize=(31,), meta=np.ndarray>
hybi (ilev) float64 dask.array<chunksize=(31,), meta=np.ndarray>
cosp_prs_bnds (cosp_prs, nbnd) float64 dask.array<chunksize=(7, 2), meta=np.ndarray>
... ...
n2ovmr (time) float64 dask.array<chunksize=(50,), meta=np.ndarray>
f11vmr (time) float64 dask.array<chunksize=(50,), meta=np.ndarray>
f12vmr (time) float64 dask.array<chunksize=(50,), meta=np.ndarray>
sol_tsi (time) float64 dask.array<chunksize=(50,), meta=np.ndarray>
nsteph (time) int32 dask.array<chunksize=(50,), meta=np.ndarray>
IVT (time, ncol) float32 dask.array<chunksize=(50, 777602), meta=np.ndarray>
Attributes:
np: 4
ne: 120
Conventions: CF-1.0
source: CAM
case: b.e13.BHISTC5.ne120_t12.cesm-ihesp-hires1.0.30-1920-210...
title: UNSET
logname: nanr
host: login1.frontera.
Version: $Name$
revision_Id: $Id$
initial_file: B.E.13.BHISTC5.ne120_t12.sehires38.003.sunway.cam.i.192...
topography_file: /work/02503/edwardsj/CESM/inputdata//atm/cam/topo/USGS-...Here’s the primary data variable IVT for integrated vapour transport with one spatial dimension: ncol
data_in.IVT
<xarray.DataArray 'IVT' (time: 14600, ncol: 777602)>
dask.array<open_dataset-124517cf72a30883d5a3c70220985aeeIVT, shape=(14600, 777602), dtype=float32, chunksize=(50, 777602), chunktype=numpy.ndarray>
Coordinates:
* time (time) object 1920-01-01 00:00:00 ... 1929-12-31 18:00:00
Dimensions without coordinates: ncol
Attributes:
units: kg/m/s
long_name: Total (vertically integrated) vapor transportAnd here’s what an ESMF weights file looks like:
weights = xr.open_dataset(weight_file)
weights
<xarray.Dataset>
Dimensions: (src_grid_rank: 1, dst_grid_rank: 2, n_a: 777602,
n_b: 884736, nv_a: 3, nv_b: 4, n_s: 2654208)
Dimensions without coordinates: src_grid_rank, dst_grid_rank, n_a, n_b, nv_a,
nv_b, n_s
Data variables: (12/19)
src_grid_dims (src_grid_rank) int32 ...
dst_grid_dims (dst_grid_rank) int32 ...
yc_a (n_a) float64 ...
yc_b (n_b) float64 ...
xc_a (n_a) float64 ...
xc_b (n_b) float64 ...
... ...
area_b (n_b) float64 ...
frac_a (n_a) float64 ...
frac_b (n_b) float64 ...
col (n_s) int32 ...
row (n_s) int32 ...
S (n_s) float64 ...
Attributes:
title: ESMF Offline Regridding Weight Generator
normalization: destarea
map_method: Bilinear remapping
ESMF_regrid_method: Bilinear
conventions: NCAR-CSM
domain_a: /glade/scratch/shields/regridded/mapfiles/ne120.nc
domain_b: /glade/scratch/shields/regridded/mapfiles/0.23x0.31.nc
grid_file_src: /glade/scratch/shields/regridded/mapfiles/ne120.nc
grid_file_dst: /glade/scratch/shields/regridded/mapfiles/0.23x0.31.nc
CVS_revision: 7.1.0rRegridding with xESMF#
Th primary xESMF interface is the Regridder class.
regridder = xesmf.Regridder(
ds_in,
ds_out,
weights=weight_file,
method="bilinear",
reuse_weights=True,
periodic=True,
)
It requires input grid information in ds_in and output grid information in ds_out. Both files need to have variables lat, lon or variables that can be identified as “latitude” and “longitude” using CF metadata.
We can construct this information from the weights file:
Read the weights file#
weights = xr.open_dataset(weight_file)
# input variable shape
in_shape = weights.src_grid_dims.load().data
# Since xESMF expects 2D vars, we'll insert a dummy dimension of size-1
if len(in_shape) == 1:
in_shape = [1, in_shape.item()]
# output variable shape
out_shape = weights.dst_grid_dims.load().data.tolist()[::-1]
in_shape, out_shape
([1, 777602], [768, 1152])
Construct a Regridder.#
First we make dummy input dataset with lat of size 1 and lon of same size as ncol, and an output dataset with the right lat, lon. For the latter, we use xc_b and yc_b as locations of the centers.
Note
We are assuming a rectilinear output grid, but this could be modified for a curvilinear grid. (Question: Is there a way to identify this from the weights file?)
As a reminder, we use lat, lon dimension names because this is hardcoded in to xESMF.
dummy_in = xr.Dataset(
{
"lat": ("lat", np.empty((in_shape[0],))),
"lon": ("lon", np.empty((in_shape[1],))),
}
)
dummy_out = xr.Dataset(
{
"lat": ("lat", weights.yc_b.data.reshape(out_shape)[:, 0]),
"lon": ("lon", weights.xc_b.data.reshape(out_shape)[0, :]),
}
)
regridder = xesmf.Regridder(
dummy_in,
dummy_out,
weights=weight_file,
method="bilinear",
reuse_weights=True,
periodic=True,
)
regridder
xESMF Regridder
Regridding algorithm: bilinear
Weight filename: bilinear_1x777602_768x1152_peri.nc
Reuse pre-computed weights? True
Input grid shape: (1, 777602)
Output grid shape: (768, 1152)
Periodic in longitude? True
This works but note that a lot of metadata here is wrong like “Weight filename”
Apply the Regridder#
Next to apply the regridder, we’ll use DataArray.expand_dims insert a new "dummy" dimension before the ncol dimension using axis=-2. To be safe, we force ncol to be the last dimension using transpose. We do this only for variables that already have the ncol dimension.
Here I’m using the name dummy so that we are clear that it is fake.
vars_with_ncol = [name for name in data_in.variables if "ncol" in data_in[name].dims]
updated = data_in.copy().update(
data_in[vars_with_ncol].transpose(..., "ncol").expand_dims("dummy", axis=-2)
)
updated
<xarray.Dataset>
Dimensions: (lev: 30, ilev: 31, cosp_prs: 7, nbnd: 2, cosp_tau: 7,
cosp_scol: 50, cosp_ht: 40, cosp_sr: 15, cosp_sza: 5,
dummy: 1, ncol: 777602, time: 14600)
Coordinates:
* lev (lev) float64 3.643 7.595 14.36 24.61 ... 957.5 976.3 992.6
* ilev (ilev) float64 2.255 5.032 10.16 18.56 ... 967.5 985.1 1e+03
* cosp_prs (cosp_prs) float64 900.0 740.0 620.0 500.0 375.0 245.0 90.0
* cosp_tau (cosp_tau) float64 0.15 0.8 2.45 6.5 16.2 41.5 219.5
* cosp_scol (cosp_scol) int32 1 2 3 4 5 6 7 8 ... 43 44 45 46 47 48 49 50
* cosp_ht (cosp_ht) float64 240.0 720.0 1.2e+03 ... 1.848e+04 1.896e+04
* cosp_sr (cosp_sr) float64 0.605 2.1 4.0 6.0 ... 70.0 539.5 1.004e+03
* cosp_sza (cosp_sza) float64 0.0 15.0 30.0 45.0 60.0
* time (time) object 1920-01-01 00:00:00 ... 1929-12-31 18:00:00
Dimensions without coordinates: nbnd, dummy, ncol
Data variables: (12/35)
hyam (lev) float64 dask.array<chunksize=(30,), meta=np.ndarray>
hybm (lev) float64 dask.array<chunksize=(30,), meta=np.ndarray>
P0 float64 ...
hyai (ilev) float64 dask.array<chunksize=(31,), meta=np.ndarray>
hybi (ilev) float64 dask.array<chunksize=(31,), meta=np.ndarray>
cosp_prs_bnds (cosp_prs, nbnd) float64 dask.array<chunksize=(7, 2), meta=np.ndarray>
... ...
n2ovmr (time) float64 dask.array<chunksize=(50,), meta=np.ndarray>
f11vmr (time) float64 dask.array<chunksize=(50,), meta=np.ndarray>
f12vmr (time) float64 dask.array<chunksize=(50,), meta=np.ndarray>
sol_tsi (time) float64 dask.array<chunksize=(50,), meta=np.ndarray>
nsteph (time) int32 dask.array<chunksize=(50,), meta=np.ndarray>
IVT (time, dummy, ncol) float32 dask.array<chunksize=(50, 1, 777602), meta=np.ndarray>
Attributes:
np: 4
ne: 120
Conventions: CF-1.0
source: CAM
case: b.e13.BHISTC5.ne120_t12.cesm-ihesp-hires1.0.30-1920-210...
title: UNSET
logname: nanr
host: login1.frontera.
Version: $Name$
revision_Id: $Id$
initial_file: B.E.13.BHISTC5.ne120_t12.sehires38.003.sunway.cam.i.192...
topography_file: /work/02503/edwardsj/CESM/inputdata//atm/cam/topo/USGS-...Now to apply the regridder on updated we rename dummy to lat (both are size-1 in updated and dummy_in), and ncol to lon (both are the same size in updated and dummy_in)
regridded = regridder(updated.rename({"dummy": "lat", "ncol": "lon"}))
regridded
<xarray.Dataset>
Dimensions: (lat: 768, lon: 1152, time: 14600, lev: 30, ilev: 31,
cosp_prs: 7, cosp_tau: 7, cosp_scol: 50, cosp_ht: 40,
cosp_sr: 15, cosp_sza: 5)
Coordinates:
* lat (lat) float64 -90.0 -89.77 -89.53 -89.3 ... 89.3 89.53 89.77 90.0
* lon (lon) float64 0.0 0.3125 0.625 0.9375 ... 358.8 359.1 359.4 359.7
* lev (lev) float64 3.643 7.595 14.36 24.61 ... 936.2 957.5 976.3 992.6
* ilev (ilev) float64 2.255 5.032 10.16 18.56 ... 967.5 985.1 1e+03
* cosp_prs (cosp_prs) float64 900.0 740.0 620.0 500.0 375.0 245.0 90.0
* cosp_tau (cosp_tau) float64 0.15 0.8 2.45 6.5 16.2 41.5 219.5
* cosp_scol (cosp_scol) int32 1 2 3 4 5 6 7 8 9 ... 43 44 45 46 47 48 49 50
* cosp_ht (cosp_ht) float64 240.0 720.0 1.2e+03 ... 1.848e+04 1.896e+04
* cosp_sr (cosp_sr) float64 0.605 2.1 4.0 6.0 ... 55.0 70.0 539.5 1.004e+03
* cosp_sza (cosp_sza) float64 0.0 15.0 30.0 45.0 60.0
* time (time) object 1920-01-01 00:00:00 ... 1929-12-31 18:00:00
Data variables:
area (lat, lon) float64 dask.array<chunksize=(768, 1152), meta=np.ndarray>
IVT (time, lat, lon) float32 dask.array<chunksize=(50, 768, 1152), meta=np.ndarray>
Attributes:
regrid_method: bilinearVisualize#
Here we’ll visualize a single timestep but note that regridded.IVT.mean("time") will nicely parallelize with dask.
figure = regridded.IVT.isel(time=100).hvplot(
cmap="twilight", clim=(0, 1000), frame_width=500, geo=True, coastline=True
)
figure