Stream: dask

Topic: Clean exit for Dask workers and cluster


view this post on Zulip Daniel Howard (Apr 16 2025 at 18:49):

A ticket was created about Dask workers hitting walltime but still having completed their work earlier. Because of this, they abort non-gracefully and then generate excessive email warnings.

Is there an agreed upon way to terminate Dask jobs cleanly to ensure efficient use of compute resources but also avoid those emails? See this Dask Discourse thread for related discussion and possible solution.

I know using cluster = PBSCluster(..., job_extra_directives = ['-m n']) should fix email problem but that's not ideal.

view this post on Zulip Katie Dagon (Apr 17 2025 at 22:02):

I usually run:

client.close()
cluster.close()

at the bottom of a notebook when I'm done working to shutdown my dask workers. But I guess this requires that extra step of manually closing. It would be interesting to explore an automated solution (that doesn't involve letting the wallclock run out).

view this post on Zulip Daniel Howard (Apr 18 2025 at 21:53):

Thanks. Per some of the threads online, those commands didn't appear to always give an "error-free" exit despite yes, the commands actually closing each Dask task. I see the worker.close() command so maybe that would be helpful to try too if placed at the end of a set of work assigned to a worker. https://distributed.dask.org/en/stable/worker.html

I'll share with Fred C and maybe update here if they recommend differently.


Last updated: May 16 2025 at 17:14 UTC