Stream: dask

Topic: eofs with dask


view this post on Zulip Stephen Yeager (Feb 15 2022 at 22:36):

I'm trying to use the eofs python package (https://ajdawson.github.io/eofs/latest/api/eofs.xarray.html) with a dask DataArray, which is apparently supported (see https://github.com/ajdawson/eofs/pull/109). The following shows the sequence of commands:
Screen-Shot-2022-02-15-at-3.24.45-PM.png
This returns the error:
ValueError: operands could not be broadcast together with shapes (20080, 20080) (1, nan)
It's not clear where the nan dimension is coming from. The error goes away if the dask DataArray is first loaded into an xarray DataArray. Has anyone had success using eofs with dask?

view this post on Zulip Anderson Banihirwe (Feb 15 2022 at 23:19):

@Stephen Yeager, I believe this is a bug/ an incompatibility of eofs with dask... eofs is slicing the data here using a dask array as a slice

ipdb> self
<eofs.standard.Eof object at 0x15a7a31c0>
ipdb> self._data
dask.array<reshape, shape=(36, 56375), dtype=float64, chunksize=(18, 56375), chunktype=numpy.ndarray>
ipdb> nonMissingIndex
dask.array<getitem, shape=(nan,), dtype=int64, chunksize=(nan,), chunktype=numpy.ndarray>
ipdb> self._data[:, nonMissingIndex]
dask.array<slice_with_int_dask_array_aggregate, shape=(36, nan), dtype=float64, chunksize=(18, nan), chunktype=numpy.ndarray>

Notice how after slicing the data, our shape got changed from (36,56375) -> (36, nan). I am not sure how this used to work when it was first introduced in eofs but It appears that if you eagerly evaluate the slice, things seem to work as expected:

ipdb> self._data[:, nonMissingIndex.compute()]
dask.array<getitem, shape=(36, 25996), dtype=float64, chunksize=(18, 25996), chunktype=numpy.ndarray>

view this post on Zulip Anderson Banihirwe (Feb 15 2022 at 23:22):

I recommend opening an issue on the eofs issue tracker. If you are looking for a reproducible example, here is one:

In [13]: import eofs, xarray as xr

In [14]: ds = xr.tutorial.open_dataset("rasm", chunks={"time": 18})

In [15]: solver = eofs.xarray.Eof(ds.Tair)

view this post on Zulip Anderson Banihirwe (Feb 15 2022 at 23:27):

Ooh. never mind... It's probably not a bug :smile:... If you avoid chunking along the time dimension, it seems to work...

view this post on Zulip Anderson Banihirwe (Feb 15 2022 at 23:28):

You may want to chunk along the lat and lon dimensions instead

view this post on Zulip Stephen Yeager (Feb 15 2022 at 23:29):

Good suggestion! Thanks.

view this post on Zulip Stephen Yeager (Feb 16 2022 at 18:21):

I'm not able to get eofs.xarray.Eof() to work with a 2GB dask DataArray, even if I avoid chunking in time. If you want to have a look, here's a notebook that isolates the problem: /glade/u/home/yeager/analysis/python/toshare/dask_eof.ipynb


Last updated: May 16 2025 at 17:14 UTC