In looking at Isla's notebook (how do I link to a thread?) , I noticed that the regridding step is slow. xesmf seems to use dask.array.map_blocks(lambda x: x.dot(weights))
approach i.e. mapping numpy's dot on each block.. I wonder if x.dot(weights)
i.e. using dask.array.dot
to deal with the blocks would work better.
From https://github.com/JiaweiZhuang/xESMF/issues/3 and https://nbviewer.jupyter.org/github/JiaweiZhuang/sparse_dot/blob/master/sparse_dot_benchmark.ipynb the initial benchmarking was done before dask could wrap sparse arrays so it should be worth the effort to update that notebook
I'm thinking something like
weights = ... # get weights from Regridder object sparse_weights = xr.DataArray( dask.array.from_array( sparse.COO.from_scipy_sparse(weights), chunks=... ), dims=... ) # client.scatter(sparse_weights) # necessary? does this help? xr.dot(dataset, sparse_weights)
vs
Regridder(dataset)
cc @xdev good "team time" project?
original convo here: https://zulip2.cloud.ucar.edu/#narrow/stream/27-dask/topic/optimizing.20workers.20and.20memory/near/25977
May not be a definite win:
Last updated: Jan 27 2025 at 22:16 UTC