Stream: jupyter

Topic: jupyterhub issue


view this post on Zulip Kristen Krumhardt (May 27 2021 at 18:54):

Hello, I was using jupyterhub on casper today and I got kicked off and now it won't let me start a new session. It's stuck on the "Your server is starting up" page, which eventually times out. I've got a new jupyterhub session going on Cheyenne, but I will need to be back on casper to access certain files. @Jared Baker , would you be able to help me sort this out?

view this post on Zulip Isla Simpson (May 27 2021 at 18:57):

I'm also having issues getting onto casper with jupyterhub. Stuck on "Your server is starting up" with a red bar.

view this post on Zulip Maria Molina (May 27 2021 at 19:01):

Others say their jobs that were running crashed

view this post on Zulip Maria Molina (May 27 2021 at 19:04):

Might not be jupyter specific, but PBS on casper because no jobs show up for me with qstat @casper

view this post on Zulip Stephen Yeager (May 27 2021 at 19:13):

CISL resource status page (https://www2.cisl.ucar.edu/user-support/cisl-resource-status) shows Casper job scheduling system is down.

view this post on Zulip Maria Molina (May 27 2021 at 19:14):

Thanks @Stephen Yeager !

view this post on Zulip Jared Baker (May 27 2021 at 19:55):

Yes, something more related to PBS as noted. We're working the issue.

view this post on Zulip Keith Lindsay (Jun 03 2021 at 17:17):

I've had multiple occurrences of "Server unavailable or unreachable" errors this morning with jupyterhub on casper batch nodes.
Are others experiencing this as well?

An aspect of these errors that is confusing to me is that the underlying casper batch job is still running.

view this post on Zulip Kristen Krumhardt (Jun 03 2021 at 17:20):

Me too. I've gotten kicked off 3 times now... yep, like @Keith Lindsay , I just looked and my jobs are still running on casper too.

view this post on Zulip Isla Simpson (Jun 03 2021 at 17:24):

me three

view this post on Zulip Keith Lindsay (Jun 03 2021 at 17:28):

When attempting to restart, I sometimes get a popup with the following message

Spawn failed: The Jupyter batch job has disappeared while pending in the queue or died immediately after starting.

view this post on Zulip Max Grover (Jun 03 2021 at 17:29):

@Jared Baker any idea of what is going on there?

view this post on Zulip Yassir Eddebbar (Jun 03 2021 at 17:30):

Same here, started on and off yesterday pm for me....

view this post on Zulip Jared Baker (Jun 03 2021 at 17:31):

Do you have approximate times for these this morning?

view this post on Zulip Jared Baker (Jun 03 2021 at 17:32):

Maybe around 11:30?

view this post on Zulip Jared Baker (Jun 03 2021 at 17:32):

There were some PBS woes earlier this morning, but I can go digging more.

view this post on Zulip Kristen Krumhardt (Jun 03 2021 at 17:36):

Mine have all happened between 9 and 11am

view this post on Zulip Isla Simpson (Jun 03 2021 at 17:42):

Around 11:20 for me.

view this post on Zulip Yassir Eddebbar (Jun 03 2021 at 17:42):

It was working well till ~10:30am-present

view this post on Zulip Keith Lindsay (Jun 03 2021 at 18:03):

One of the occurrences for me was right before I posted, so about 11:15.
I'm not sure about the times before that.

view this post on Zulip Max Grover (Jun 03 2021 at 18:45):

@Jared Baker it is working for me now

view this post on Zulip Jared Baker (Jun 03 2021 at 18:46):

Okay, I'm still skeptical on a number of things right now. But I'll post more once I feel a bit more confident.

view this post on Zulip Stephen Yeager (Jun 03 2021 at 18:53):

I filed a CISL ticket yesterday at ~4pm reporting Jupyterhub instability on casper batch nodes. This seems to be a very recurring issue.

view this post on Zulip Jared Baker (Jun 03 2021 at 23:33):

Okay, I think I've got things back to a reasonable state over the course of the afternoon and tried to address some issues when the PBS server fails to communicate back to the JupyterHub server. Hopefully this helps with stability.

view this post on Zulip Yassir Eddebbar (Jun 03 2021 at 23:41):

Thanks @Jared Baker Quite stable this pm on my end!

view this post on Zulip Rosie Fisher (Jun 23 2021 at 09:11):

I'm having the same error message as @Keith Lindsay reported above this morning... :/


Last updated: Jan 30 2022 at 12:01 UTC