I've got a 2D array with some nans in it, and I want to pass it as an argument to a function that expects a 1D array without nans... so I stack my two dimensions, drop the nans, do the calculation, and then unstack the dimensions. But some of the 2D arrays have entire rows or columns of nans, and when I unstack those rows and columns are missing.
Here's a simple example to illustrate:
>>> import xarray as xr
>>> import numpy as np
>>> nparray = np.full((6,6), np.nan)
>>> nparray[1,0:3] = 1
>>> nparray[2,2:5] = 2
>>> nparray[3,1:4] = 8
>>> nparray[5,:5] = 4
>>> nparray # note that the first row, second-to-last row, and last column are entirely nans
array([[nan, nan, nan, nan, nan, nan],
[ 1., 1., 1., nan, nan, nan],
[nan, nan, 2., 2., 2., nan],
[nan, 8., 8., 8., nan, nan],
[nan, nan, nan, nan, nan, nan],
[ 4., 4., 4., 4., 4., nan]])
>>> da = xr.DataArray(nparray, dims=['nlat', 'nlon'], name='my_array')
>>> ds = da.to_dataset()
>>> ds['nlat'] = np.arange(6)
>>> ds['nlon'] = np.arange(2,8)
>>> ds # my original dataset has dimensions nlat=6, nlon=6
<xarray.Dataset>
Dimensions: (nlat: 6, nlon: 6)
Coordinates:
* nlat (nlat) int64 0 1 2 3 4 5
* nlon (nlon) int64 2 3 4 5 6 7
Data variables:
my_array (nlat, nlon) float64 nan nan nan nan nan ... 4.0 4.0 4.0 4.0 nan
>>> ds = ds.stack(X=['nlat', 'nlon'])
>>> ds = ds.where(np.isfinite(ds['my_array']), drop=True)
>>> ds = ds.unstack()
>>> ds # after the stack, drop, unstack sequence my dataset has dimensions nlat=4, nlon=5
<xarray.Dataset>
Dimensions: (nlat: 4, nlon: 5)
Coordinates:
* nlat (nlat) int64 1 2 3 5
* nlon (nlon) int64 2 3 4 5 6
Data variables:
my_array (nlat, nlon) float64 1.0 1.0 1.0 nan nan ... 4.0 4.0 4.0 4.0 4.0
>>> ds['my_array'].data # looking at the underlying data, the rows and column that were all nan are now missing
array([[ 1., 1., 1., nan, nan],
[nan, nan, 2., 2., 2.],
[nan, 8., 8., 8., nan],
[ 4., 4., 4., 4., 4.]])
>>>
Notice that the nlat
coordinate no longer has a 0 or 4 value, and nlon
is missing 7 so what started out as a 6x6 matrix is now 4x5. In practice, I'm doing this stacking / unstacking inside a call to xr.map_blocks()
; in the example above, if the template is expecting each chunk to return a 6x6 portion of the bigger matrix then map_blocks()
would throw an exception from the chunks that have lost some nan
s.
If I save the original nlat
and nlon
coordinates, is there a way to unstack()
into a dataset with the original dimensions?
Last updated: May 16 2025 at 17:14 UTC