Stream: jupyterlab-hub

Topic: Trouble running on Casper


view this post on Zulip Jean-Francois Lamarque (Aug 30 2021 at 17:41):

Hi I am having trouble running on Cheyenne this morning. Getting weird error messages( like "error: can't start new thread"). Also I can't seem to be able to launch a server on Casper.Any thoughts?

view this post on Zulip Jean-Francois Lamarque (Aug 30 2021 at 17:53):

Never mind. Working now

view this post on Zulip Daniel Marsh (Aug 30 2021 at 20:22):

Hmmm, trying to launch a jubyterhub session on casper - getting pending in queue for a while now.

edit: timed out so I launched a server on cheyenne instead. Is caper really this over-subscribed?

view this post on Zulip Maria Molina (Aug 30 2021 at 22:24):

I submitted a CISL ticket about this and apparently the issue for Casper login nodes is that people spawn them and then leave them idle, then others get timeout errors.

view this post on Zulip Katie Dagon (Oct 28 2021 at 18:44):

I'm having this problem today (timeout, can't get on jupyterhub via casper login node). I submitted a CISL ticket as well. I wonder if there needs to be a time limit so that idle jupyterhub sessions don't take up spots in the jhublogin queue. I can't switch to cheyenne because my data is on campaign. Kind of frustrating...

view this post on Zulip Matt Long (Oct 28 2021 at 22:57):

@Katie Dagon, I was helping @Zephyr Sylvester today and discovered that if your project number is oversubscribed, the behavior of the spawner on the login node can be a little mysterious. I didn't fully test the behavior, but I think the spawner may just hang rather than giving a clear message regarding an overspent account.

In this case, the solution was to ensure that DAV_PROJECT was specified in the dot files and set to another project number that was not oversubscribed. So in this case, we modified Zephyr's .profile file to include

export DAV_PROJECT=NCGD0011

Not sure if this is your problem or not.
cc @hpcd

view this post on Zulip Katie Dagon (Oct 28 2021 at 23:39):

Thanks @Matt Long. I think in this case it was that the jupyterhub was oversubscribed and there are a limited number of sessions that can run simultaneously (using qstat -Q it looks like that number might be 108?). I did chat briefly with Rory Kelly about this and the time limit is long (7 days) since it's the same as a standard terminal session on a login node. So it's probably good if people remember to close out their jupyterhub sessions so others can get on during busy times :smile:

The DAV_PROJECT info is still helpful, since I should probably set that environment variable anyway. Taking a closer look at SAM, it appears that in the absence of this setting, the spawner on the login node selects from any of the project codes that I have access to, seemingly at random.

view this post on Zulip Brian Vanderwende (Oct 29 2021 at 00:03):

Hi Katie. Yes, Matt's advice is definitely good to heed regarding DAV_PROJECT. You are correct though that there was a cap hit on the number of Casper login sessions. This is something we need to find a robust solution to in the long term, but I'll go through the current sessions tonight and see if any can be cleaned up to free some slots for tomorrow.

view this post on Zulip Katie Dagon (Oct 29 2021 at 00:37):

Thanks @Brian Vanderwende, much appreciated!

view this post on Zulip Danica Lombardozzi (Nov 02 2021 at 14:55):

Katie Dagon said:

So it's probably good if people remember to close out their jupyterhub sessions so others can get on during busy times :smile:

I'm not sure I know how to properly close out my sessions! What's the best way to ensure that we're quitting properly and not hogging sessions?

view this post on Zulip Jared Baker (Nov 02 2021 at 14:58):

@Danica Lombardozzi File > Hub Control Panel

That will take you to a page where you can stop your severs.


Last updated: May 16 2025 at 17:14 UTC