Stream: dask

Topic: distributed.nanny worker restart


view this post on Zulip Matthew Hayman (Dec 01 2022 at 15:40):

I'm using dask to parallelize some independent function calls on Cheyenne. I call client.submit(fnc,*args) in a for loop. When the length of the loop, and therefore number of submissions, exceeds the number of workers I get several errors stating:
distributed.nanny - WARNING - Restarting worker
and most (though not all) of the jobs fail.

I don't think I have this issue on my local machine . I'm just using dask to leverage the multiple cores on a single Cheyenne node (not working across multiple PBS job submissions or nodes). Is there something that needs to be setup on the client or scheduler so I can have a queue for the workers? Does a dask distributed client not have the ability to queue jobs?


Last updated: May 16 2025 at 17:14 UTC