Stream: dask

Topic: xesmf performance benchmarking


view this post on Zulip Deepak Cherian (Mar 02 2021 at 16:19):

In looking at Isla's notebook (how do I link to a thread?) , I noticed that the regridding step is slow. xesmf seems to use dask.array.map_blocks(lambda x: x.dot(weights)) approach i.e. mapping numpy's dot on each block.. I wonder if x.dot(weights) i.e. using dask.array.dot to deal with the blocks would work better.

From https://github.com/JiaweiZhuang/xESMF/issues/3 and https://nbviewer.jupyter.org/github/JiaweiZhuang/sparse_dot/blob/master/sparse_dot_benchmark.ipynb the initial benchmarking was done before dask could wrap sparse arrays so it should be worth the effort to update that notebook

I'm thinking something like

weights = ... # get weights from Regridder object
sparse_weights = xr.DataArray(
    dask.array.from_array(
        sparse.COO.from_scipy_sparse(weights),
        chunks=...
    ),
    dims=...
)
# client.scatter(sparse_weights)  # necessary? does this help?
xr.dot(dataset, sparse_weights)

vs

Regridder(dataset)

cc @xdev good "team time" project?

view this post on Zulip Deepak Cherian (Mar 02 2021 at 16:30):

original convo here: https://zulip2.cloud.ucar.edu/#narrow/stream/27-dask/topic/optimizing.20workers.20and.20memory/near/25977

view this post on Zulip Deepak Cherian (Mar 05 2021 at 15:42):

May not be a definite win:

  1. https://github.com/SciTools-incubator/iris-esmf-regrid/pull/23
  2. https://github.com/ravwojdyla/dask/blob/rav/matmul_blockwise_perf/matmul_perf.ipynb
  3. https://github.com/pangeo-data/pangeo/issues/756
  4. https://github.com/dask/dask/issues/6916
  5. https://github.com/dask/dask/issues/2225
  6. https://github.com/dask/dask/issues/3587

Last updated: Jan 30 2022 at 12:01 UTC