xesmf performance benchmarking · dask

Stream: dask

Topic: xesmf performance benchmarking

Deepak Cherian (Mar 02 2021 at 16:19):

In looking at Isla's notebook (how do I link to a thread?) , I noticed that the regridding step is slow. xesmf seems to use dask.array.map_blocks(lambda x: x.dot(weights)) approach i.e. mapping numpy's dot on each block.. I wonder if x.dot(weights) i.e. using dask.array.dot to deal with the blocks would work better.

From https://github.com/JiaweiZhuang/xESMF/issues/3 and https://nbviewer.jupyter.org/github/JiaweiZhuang/sparse_dot/blob/master/sparse_dot_benchmark.ipynb the initial benchmarking was done before dask could wrap sparse arrays so it should be worth the effort to update that notebook

I'm thinking something like

weights = ... # get weights from Regridder object
sparse_weights = xr.DataArray(
    dask.array.from_array(
        sparse.COO.from_scipy_sparse(weights),
        chunks=...
    ),
    dims=...
)
# client.scatter(sparse_weights)  # necessary? does this help?
xr.dot(dataset, sparse_weights)

Regridder(dataset)

cc @xdev good "team time" project?

Deepak Cherian (Mar 02 2021 at 16:30):

original convo here: https://zulip2.cloud.ucar.edu/#narrow/stream/27-dask/topic/optimizing.20workers.20and.20memory/near/25977

Deepak Cherian (Mar 05 2021 at 15:42):

May not be a definite win:

Last updated: Jan 27 2025 at 22:16 UTC