Stream: dask

Topic: worker timeout errors


view this post on Zulip Stephen Yeager (Oct 04 2021 at 16:42):

A dask notebook that was working last week is failing today with failed connection errors:
asyncio.exceptions.TimeoutError

The error log files are here:
/glade/scratch/yeager/*.casper-pbs.ER

view this post on Zulip Max Grover (Oct 04 2021 at 16:43):

I think there is something up with the PBS scheduler on Casper this morning. I had similar issues yesterday; I submitted a help ticket with CISL, and I believe @Kristen Krumhardt is planning on doing so as well.

view this post on Zulip Matt Long (Oct 04 2021 at 16:44):

cc @Jared Baker

view this post on Zulip Max Grover (Oct 04 2021 at 17:16):

It seems that the CISL Resource Status page indicates that:

The Casper job scheduling system was not responding as of 11:11 10/04/2021.
If the problem persists, users will receive email updates through our Notifier service.

Last updated: Jan 30 2022 at 12:01 UTC