Stream: dask
Topic: worker timeout errors
Stephen Yeager (Oct 04 2021 at 16:42):
A dask notebook that was working last week is failing today with failed connection errors:
asyncio.exceptions.TimeoutError
The error log files are here:
/glade/scratch/yeager/*.casper-pbs.ER
Max Grover (Oct 04 2021 at 16:43):
I think there is something up with the PBS scheduler on Casper this morning. I had similar issues yesterday; I submitted a help ticket with CISL, and I believe @Kristen Krumhardt is planning on doing so as well.
Matt Long (Oct 04 2021 at 16:44):
cc @Jared Baker
Max Grover (Oct 04 2021 at 17:16):
It seems that the CISL Resource Status
page indicates that:
The Casper job scheduling system was not responding as of 11:11 10/04/2021.
If the problem persists, users will receive email updates through our Notifier service.
Last updated: Jan 30 2022 at 12:01 UTC