Hi I am having trouble running on Cheyenne this morning. Getting weird error messages( like "error: can't start new thread"). Also I can't seem to be able to launch a server on Casper.Any thoughts?
Never mind. Working now
Hmmm, trying to launch a jubyterhub session on casper - getting pending in queue for a while now.
edit: timed out so I launched a server on cheyenne instead. Is caper really this over-subscribed?
I submitted a CISL ticket about this and apparently the issue for Casper login nodes is that people spawn them and then leave them idle, then others get timeout errors.
I'm having this problem today (timeout, can't get on jupyterhub via casper login node). I submitted a CISL ticket as well. I wonder if there needs to be a time limit so that idle jupyterhub sessions don't take up spots in the jhublogin queue. I can't switch to cheyenne because my data is on campaign. Kind of frustrating...
@Katie Dagon, I was helping @Zephyr Sylvester today and discovered that if your project number is oversubscribed, the behavior of the spawner on the login node can be a little mysterious. I didn't fully test the behavior, but I think the spawner may just hang rather than giving a clear message regarding an overspent account.
In this case, the solution was to ensure that DAV_PROJECT
was specified in the dot files and set to another project number that was not oversubscribed. So in this case, we modified Zephyr's .profile file to include
export DAV_PROJECT=NCGD0011
Not sure if this is your problem or not.
cc @hpcd
Thanks @Matt Long. I think in this case it was that the jupyterhub was oversubscribed and there are a limited number of sessions that can run simultaneously (using qstat -Q
it looks like that number might be 108?). I did chat briefly with Rory Kelly about this and the time limit is long (7 days) since it's the same as a standard terminal session on a login node. So it's probably good if people remember to close out their jupyterhub sessions so others can get on during busy times :smile:
The DAV_PROJECT
info is still helpful, since I should probably set that environment variable anyway. Taking a closer look at SAM, it appears that in the absence of this setting, the spawner on the login node selects from any of the project codes that I have access to, seemingly at random.
Hi Katie. Yes, Matt's advice is definitely good to heed regarding DAV_PROJECT
. You are correct though that there was a cap hit on the number of Casper login sessions. This is something we need to find a robust solution to in the long term, but I'll go through the current sessions tonight and see if any can be cleaned up to free some slots for tomorrow.
Thanks @Brian Vanderwende, much appreciated!
Katie Dagon said:
So it's probably good if people remember to close out their jupyterhub sessions so others can get on during busy times :smile:
I'm not sure I know how to properly close out my sessions! What's the best way to ensure that we're quitting properly and not hogging sessions?
@Danica Lombardozzi File
> Hub Control Panel
That will take you to a page where you can stop your severs.
Last updated: May 16 2025 at 17:14 UTC