distributed>=2022.9.2
introduced a new feature for controlling memory saturation. Folks who have been experiencing KilledWorker / memory saturation issues with Dask for embarrassingly parallel computations may find this useful.
Share your experiences with worker-saturation config to reduce memory usage
Tl;dr
import dask
import distributed
import dask_jobqueue
dask.config.set({"distributed.scheduler.worker-saturation": "1.0"}
cluster = dask_jobqueue.PBSCluster(....)
client = distributed.Client(cluster)
@Isla Simpson This is your TEM diags calculation (only 1 year, random data as input):
image.png
ridiculous improvements! You probably don't have to for-loop-over-files any more!
Excellent! So it's just that one line needs to be added in? That's great!
Looks like it'll be the default in the next release
This setting is now the default in v2022.11.0: https://www.coiled.io/blog/reducing-dask-memory-usage
Last updated: May 16 2025 at 17:14 UTC