Stream: xarray

Topic: open_mfdataset doubles size of y0/x0 dims


view this post on Zulip Daniel Adriaansen (Dec 14 2021 at 20:37):

I am attempting to load data from two netCDF files using xr.open_mfdataset. The call looks like this:

ncData = xr.open_mfdataset(urls,chunks=None,combine='by_coords',preprocess=preprocess,decode_times=False,compat='override')

The preprocess() function looks like this:

# Function to preprocess the datasets prior to concatenation
def preprocess(ds):

  print("PREPROCESS")
  # Drop all the variables we want to ignore, and remove single dimensions via squeeze()
  for dv in ds.data_vars:
    if dv in p.opt['ignore_list']:
      ds = ds.drop_vars([dv])
  ds = ds.squeeze(drop=True)
  if 'TMP' in ds.data_vars:
    # Reduce the z0 dimension in the dataset with TMP because it has 1 extra level not in the other dataset
    newds = ds.isel(z0=slice(0,len(ds.z0)-1))
    del(ds)
    print(newds)
    return(newds)
  else:
    print(ds)
    return(ds)

In the preprocess function, I print each dataset out that's being loaded and I see this:

# DS 1:
<xarray.Dataset>
Dimensions:         (x0: 1799, y0: 1059, z0: 39)
Coordinates:
  * x0              (x0) float32 -2.698e+03 -2.695e+03 ... 2.693e+03 2.696e+03
  * y0              (y0) float32 -1.587e+03 -1.584e+03 ... 1.584e+03 1.587e+03
    lat0            (y0, x0) float32 dask.array<chunksize=(1059, 1799), meta=np.ndarray>
    lon0            (y0, x0) float32 dask.array<chunksize=(1059, 1799), meta=np.ndarray>
  * z0              (z0) float32 1e+03 975.0 950.0 925.0 ... 100.0 75.0 50.0

# DS 2:
<xarray.Dataset>
Dimensions:               (x0: 1799, y0: 1059, z0: 39)
Coordinates:
  * x0                    (x0) float32 -2.698e+03 -2.695e+03 ... 2.696e+03
  * y0                    (y0) float32 -1.587e+03 -1.584e+03 ... 1.587e+03
    lat0                  (y0, x0) float32 dask.array<chunksize=(1059, 1799), meta=np.ndarray>
    lon0                  (y0, x0) float32 dask.array<chunksize=(1059, 1799), meta=np.ndarray>
  * z0                    (z0) float32 1e+03 975.0 950.0 ... 100.0 75.0 50.0

When the merging and/or combining of the two datasets occurs within open_mfdataset(), I don't get any errors about combining but I do get errors about large chunks. This is because for some reason I cannot figure out, xarray is doubling the size of the x0 and y0 dimension. The resulting Dataset looks like this:

<xarray.Dataset>
Dimensions:               (x0: 3598, y0: 2118, z0: 39)
Coordinates:
  * x0                    (x0) float64 -2.698e+03 -2.698e+03 ... 2.696e+03
  * y0                    (y0) float64 -1.587e+03 -1.587e+03 ... 1.587e+03
    lat0                  (y0, x0) float32 dask.array<chunksize=(2118, 3598), meta=np.ndarray>
    lon0                  (y0, x0) float32 dask.array<chunksize=(2118, 3598), meta=np.ndarray>
  * z0                    (z0) float32 1e+03 975.0 950.0 ... 100.0 75.0 50.0

Does anyone have any tips of where to start looking or why Xarray might be doubling the x0/y0 dimensions on me?

view this post on Zulip Daniel Adriaansen (Dec 14 2021 at 21:49):

As I somewhat expected, given my past success with this code, it's a garbage in, garbage out issue. It appears that my 2D x0 and y0 coordinate variables are ever so slightly different. Is there a way to tell open_mfdataset() to ignore this and choose from the first or last dataset in the list? Or from a particular specification of cooridinates? I thought override='compat' would do that, but maybe not?

view this post on Zulip Deepak Cherian (Dec 15 2021 at 16:24):

Use join="override" because x0 and. y0 are dimension coordiante variables. compat controls the checking of non-dimension coordinate variables.


Last updated: May 16 2025 at 17:14 UTC