I'm trying to use the eofs python package (https://ajdawson.github.io/eofs/latest/api/eofs.xarray.html) with a dask DataArray, which is apparently supported (see https://github.com/ajdawson/eofs/pull/109). The following shows the sequence of commands:
Screen-Shot-2022-02-15-at-3.24.45-PM.png
This returns the error:
ValueError: operands could not be broadcast together with shapes (20080, 20080) (1, nan)
It's not clear where the nan dimension is coming from. The error goes away if the dask DataArray is first loaded into an xarray DataArray. Has anyone had success using eofs with dask?
@Stephen Yeager, I believe this is a bug/ an incompatibility of eofs with dask... eofs is slicing the data here using a dask array as a slice
ipdb> self
<eofs.standard.Eof object at 0x15a7a31c0>
ipdb> self._data
dask.array<reshape, shape=(36, 56375), dtype=float64, chunksize=(18, 56375), chunktype=numpy.ndarray>
ipdb> nonMissingIndex
dask.array<getitem, shape=(nan,), dtype=int64, chunksize=(nan,), chunktype=numpy.ndarray>
ipdb> self._data[:, nonMissingIndex]
dask.array<slice_with_int_dask_array_aggregate, shape=(36, nan), dtype=float64, chunksize=(18, nan), chunktype=numpy.ndarray>
Notice how after slicing the data, our shape got changed from (36,56375)
-> (36, nan)
. I am not sure how this used to work when it was first introduced in eofs
but It appears that if you eagerly evaluate the slice, things seem to work as expected:
ipdb> self._data[:, nonMissingIndex.compute()]
dask.array<getitem, shape=(36, 25996), dtype=float64, chunksize=(18, 25996), chunktype=numpy.ndarray>
I recommend opening an issue on the eofs issue tracker. If you are looking for a reproducible example, here is one:
In [13]: import eofs, xarray as xr
In [14]: ds = xr.tutorial.open_dataset("rasm", chunks={"time": 18})
In [15]: solver = eofs.xarray.Eof(ds.Tair)
Ooh. never mind... It's probably not a bug :smile:... If you avoid chunking along the time
dimension, it seems to work...
You may want to chunk along the lat
and lon
dimensions instead
Good suggestion! Thanks.
I'm not able to get eofs.xarray.Eof() to work with a 2GB dask DataArray, even if I avoid chunking in time
. If you want to have a look, here's a notebook that isolates the problem: /glade/u/home/yeager/analysis/python/toshare/dask_eof.ipynb
Last updated: May 16 2025 at 17:14 UTC