I am attempting to load data from two netCDF files using xr.open_mfdataset. The call looks like this:
ncData = xr.open_mfdataset(urls,chunks=None,combine='by_coords',preprocess=preprocess,decode_times=False,compat='override')
The preprocess() function looks like this:
# Function to preprocess the datasets prior to concatenation
def preprocess(ds):
print("PREPROCESS")
# Drop all the variables we want to ignore, and remove single dimensions via squeeze()
for dv in ds.data_vars:
if dv in p.opt['ignore_list']:
ds = ds.drop_vars([dv])
ds = ds.squeeze(drop=True)
if 'TMP' in ds.data_vars:
# Reduce the z0 dimension in the dataset with TMP because it has 1 extra level not in the other dataset
newds = ds.isel(z0=slice(0,len(ds.z0)-1))
del(ds)
print(newds)
return(newds)
else:
print(ds)
return(ds)
In the preprocess function, I print each dataset out that's being loaded and I see this:
# DS 1:
<xarray.Dataset>
Dimensions: (x0: 1799, y0: 1059, z0: 39)
Coordinates:
* x0 (x0) float32 -2.698e+03 -2.695e+03 ... 2.693e+03 2.696e+03
* y0 (y0) float32 -1.587e+03 -1.584e+03 ... 1.584e+03 1.587e+03
lat0 (y0, x0) float32 dask.array<chunksize=(1059, 1799), meta=np.ndarray>
lon0 (y0, x0) float32 dask.array<chunksize=(1059, 1799), meta=np.ndarray>
* z0 (z0) float32 1e+03 975.0 950.0 925.0 ... 100.0 75.0 50.0
# DS 2:
<xarray.Dataset>
Dimensions: (x0: 1799, y0: 1059, z0: 39)
Coordinates:
* x0 (x0) float32 -2.698e+03 -2.695e+03 ... 2.696e+03
* y0 (y0) float32 -1.587e+03 -1.584e+03 ... 1.587e+03
lat0 (y0, x0) float32 dask.array<chunksize=(1059, 1799), meta=np.ndarray>
lon0 (y0, x0) float32 dask.array<chunksize=(1059, 1799), meta=np.ndarray>
* z0 (z0) float32 1e+03 975.0 950.0 ... 100.0 75.0 50.0
When the merging and/or combining of the two datasets occurs within open_mfdataset(), I don't get any errors about combining but I do get errors about large chunks. This is because for some reason I cannot figure out, xarray is doubling the size of the x0 and y0 dimension. The resulting Dataset looks like this:
<xarray.Dataset>
Dimensions: (x0: 3598, y0: 2118, z0: 39)
Coordinates:
* x0 (x0) float64 -2.698e+03 -2.698e+03 ... 2.696e+03
* y0 (y0) float64 -1.587e+03 -1.587e+03 ... 1.587e+03
lat0 (y0, x0) float32 dask.array<chunksize=(2118, 3598), meta=np.ndarray>
lon0 (y0, x0) float32 dask.array<chunksize=(2118, 3598), meta=np.ndarray>
* z0 (z0) float32 1e+03 975.0 950.0 ... 100.0 75.0 50.0
Does anyone have any tips of where to start looking or why Xarray might be doubling the x0/y0 dimensions on me?
As I somewhat expected, given my past success with this code, it's a garbage in, garbage out issue. It appears that my 2D x0 and y0 coordinate variables are ever so slightly different. Is there a way to tell open_mfdataset() to ignore this and choose from the first or last dataset in the list? Or from a particular specification of cooridinates? I thought override='compat' would do that, but maybe not?
Use join="override"
because x0
and. y0
are dimension coordiante variables. compat
controls the checking of non-dimension coordinate variables.
Last updated: May 16 2025 at 17:14 UTC