Stream: dask
Topic: xesmf performance benchmarking
Deepak Cherian (Mar 02 2021 at 16:19):
In looking at Isla's notebook (how do I link to a thread?) , I noticed that the regridding step is slow. xesmf seems to use dask.array.map_blocks(lambda x: x.dot(weights))
approach i.e. mapping numpy's dot on each block.. I wonder if x.dot(weights)
i.e. using dask.array.dot
to deal with the blocks would work better.
From https://github.com/JiaweiZhuang/xESMF/issues/3 and https://nbviewer.jupyter.org/github/JiaweiZhuang/sparse_dot/blob/master/sparse_dot_benchmark.ipynb the initial benchmarking was done before dask could wrap sparse arrays so it should be worth the effort to update that notebook
I'm thinking something like
weights = ... # get weights from Regridder object sparse_weights = xr.DataArray( dask.array.from_array( sparse.COO.from_scipy_sparse(weights), chunks=... ), dims=... ) # client.scatter(sparse_weights) # necessary? does this help? xr.dot(dataset, sparse_weights)
vs
Regridder(dataset)
cc @xdev good "team time" project?
Deepak Cherian (Mar 02 2021 at 16:30):
original convo here: https://zulip2.cloud.ucar.edu/#narrow/stream/27-dask/topic/optimizing.20workers.20and.20memory/near/25977
Deepak Cherian (Mar 05 2021 at 15:42):
May not be a definite win:
- https://github.com/SciTools-incubator/iris-esmf-regrid/pull/23
- https://github.com/ravwojdyla/dask/blob/rav/matmul_blockwise_perf/matmul_perf.ipynb
- https://github.com/pangeo-data/pangeo/issues/756
- https://github.com/dask/dask/issues/6916
- https://github.com/dask/dask/issues/2225
- https://github.com/dask/dask/issues/3587
Last updated: Jan 30 2022 at 12:01 UTC