Stream: xarray

Topic: stack, unstack, and dropping nans


view this post on Zulip Michael Levy (Jan 11 2023 at 18:37):

I've got a 2D array with some nans in it, and I want to pass it as an argument to a function that expects a 1D array without nans... so I stack my two dimensions, drop the nans, do the calculation, and then unstack the dimensions. But some of the 2D arrays have entire rows or columns of nans, and when I unstack those rows and columns are missing.

Here's a simple example to illustrate:

>>> import xarray as xr
>>> import numpy as np
>>> nparray = np.full((6,6), np.nan)
>>> nparray[1,0:3] = 1
>>> nparray[2,2:5] = 2
>>> nparray[3,1:4] = 8
>>> nparray[5,:5] = 4
>>> nparray # note that the first row, second-to-last row, and last column are entirely nans
array([[nan, nan, nan, nan, nan, nan],
       [ 1.,  1.,  1., nan, nan, nan],
       [nan, nan,  2.,  2.,  2., nan],
       [nan,  8.,  8.,  8., nan, nan],
       [nan, nan, nan, nan, nan, nan],
       [ 4.,  4.,  4.,  4.,  4., nan]])
>>> da = xr.DataArray(nparray, dims=['nlat', 'nlon'], name='my_array')
>>> ds = da.to_dataset()
>>> ds['nlat'] = np.arange(6)
>>> ds['nlon'] = np.arange(2,8)
>>> ds # my original dataset has dimensions nlat=6, nlon=6
<xarray.Dataset>
Dimensions:   (nlat: 6, nlon: 6)
Coordinates:
  * nlat      (nlat) int64 0 1 2 3 4 5
  * nlon      (nlon) int64 2 3 4 5 6 7
Data variables:
    my_array  (nlat, nlon) float64 nan nan nan nan nan ... 4.0 4.0 4.0 4.0 nan
>>> ds = ds.stack(X=['nlat', 'nlon'])
>>> ds = ds.where(np.isfinite(ds['my_array']), drop=True)
>>> ds = ds.unstack()
>>> ds # after the stack, drop, unstack sequence my dataset has dimensions nlat=4, nlon=5
<xarray.Dataset>
Dimensions:   (nlat: 4, nlon: 5)
Coordinates:
  * nlat      (nlat) int64 1 2 3 5
  * nlon      (nlon) int64 2 3 4 5 6
Data variables:
    my_array  (nlat, nlon) float64 1.0 1.0 1.0 nan nan ... 4.0 4.0 4.0 4.0 4.0
>>> ds['my_array'].data # looking at the underlying data, the rows and column that were all nan are now missing
array([[ 1.,  1.,  1., nan, nan],
       [nan, nan,  2.,  2.,  2.],
       [nan,  8.,  8.,  8., nan],
       [ 4.,  4.,  4.,  4.,  4.]])
>>>

Notice that the nlat coordinate no longer has a 0 or 4 value, and nlon is missing 7 so what started out as a 6x6 matrix is now 4x5. In practice, I'm doing this stacking / unstacking inside a call to xr.map_blocks(); in the example above, if the template is expecting each chunk to return a 6x6 portion of the bigger matrix then map_blocks() would throw an exception from the chunks that have lost some nans.

If I save the original nlat and nlon coordinates, is there a way to unstack() into a dataset with the original dimensions?


Last updated: May 16 2025 at 17:14 UTC