Cool stuff from the Met Office and ECMWF: https://blog.dask.org/2022/07/19/dask-multi-cloud
We have devised a technique for creating a Dask cluster where worker nodes are hosted in different data centres, connected by a mesh VPN that allows the scheduler and workers to communicate and exchange results.
A novel (ab)use of Dask resources allows us to run data processing tasks on the workers in the cluster closest to the source data, so that communication between data centres is minimised. If combined with zarr to give access to huge hyper-cube datasets in object storage, we believe that the technique could realise the potential of data-proximate distributed computing in the Cloud.
that is indeed cool!
Last updated: May 16 2025 at 17:14 UTC