Stream: dask

Topic: controlling-dask-workers-memory-saturation


view this post on Zulip Anderson Banihirwe (Oct 07 2022 at 19:57):

distributed>=2022.9.2 introduced a new feature for controlling memory saturation. Folks who have been experiencing KilledWorker / memory saturation issues with Dask for embarrassingly parallel computations may find this useful.

Share your experiences with worker-saturation config to reduce memory usage

Tl;dr

import dask
import distributed
import dask_jobqueue

dask.config.set({"distributed.scheduler.worker-saturation": "1.0"}
cluster = dask_jobqueue.PBSCluster(....)
client = distributed.Client(cluster)

view this post on Zulip Deepak Cherian (Oct 12 2022 at 19:38):

@Isla Simpson This is your TEM diags calculation (only 1 year, random data as input):
image.png

ridiculous improvements! You probably don't have to for-loop-over-files any more!

view this post on Zulip Isla Simpson (Oct 12 2022 at 21:22):

Excellent! So it's just that one line needs to be added in? That's great!

view this post on Zulip Deepak Cherian (Oct 13 2022 at 18:55):

Looks like it'll be the default in the next release

view this post on Zulip Deepak Cherian (Nov 16 2022 at 19:02):

This setting is now the default in v2022.11.0: https://www.coiled.io/blog/reducing-dask-memory-usage


Last updated: May 16 2025 at 17:14 UTC