Pending spawn issues? · jupyterlab-hub

I have been unable to use jupyterhub today. When I click on a server it does launch another window but then it just sits there for a long time saying "Pending in queue" if I try to stop the server I get an error: "API request failed (400): afoster is pending spawn, please wait" I'm not sure how to go about troubleshooting this. Any help would be much appreciated!

Michael Levy (Nov 14 2023 at 18:14):

@Adrianna Foster I had this issue yesterday -- I was trying to get on a Casper login node, but was sitting in the queue instead. (I think there's a limit of 180 jupyterhub instances on the Casper login nodes, and I was lucky number 181.) Which machine / node were you trying to access? The Casper login nodes don't look quite as busy today, but if you can't get on a login node I'd recommend trying a compute node instead.

More detail than you probably want, but if you ssh to Casper you can run $ qstat | grep jhublog to see how busy the hub login nodes are (compute node requests go through the htc queue). Right now it looks like there are 144 active jobs, five people waiting, and your job is running:

$ qstat | grep jhublog | wc -l
149
$ qstat | grep jhublog | tail
8990219.casper-p* cr-login-stable  kaylaw            00:07:44 R jhublogin
8990331.casper-p* cr-login-stable  juliob            00:09:11 R jhublogin
8990757.casper-p* cr-login-stable  kaitlins          00:02:52 R jhublogin
8991913.casper-p* cr-login-stable  sdhavale          00:00:45 R jhublogin
8991953.casper-p* cr-login-stable  afoster           00:00:48 R jhublogin
8992077.casper-p* cr-login-stable  wwieder                  0 Q jhublogin
8992078.casper-p* cr-login-stable  morenor                  0 Q jhublogin
8992089.casper-p* cr-login-stable  vanderwb                 0 Q jhublogin
8992176.casper-p* cr-login-stable  ksampson                 0 Q jhublogin
8992217.casper-p* cr-login-stable  shivr                    0 Q jhublogin

edit to add: I'm certain there were 180 running and 1 queued yesterday, but it looks like maybe the limit is 144 today?

Danielle Touma (Nov 14 2023 at 18:17):

Hi Adrianna, this happens every now and then for me as well when trying to get a Casper node on JupyterHub and there is a queue. The only way I've been able to get past the stuck server it is to add a new server, or I've also been successful by ssh-ing into casper from the mac terminal, and then using qdel to cancel the job that is stuck in queue.

Adrianna Foster (Nov 14 2023 at 18:40):

okay @Michael Levy and @Danielle Touma thanks! I was able to finally get in but this was super helpful!

Negin Sobhani (Nov 28 2023 at 18:37):

@Adrianna Foster and @Danielle Touma As Mike mentioned, this happens when there are too many users using casper login instances on JH.

In these situations I would suggest using Casper Pbs Batch node instead of login to access the JHub.Basically in Jupyterhub under the "Resource Selection", choose PBS batch node instead of login. It will direct the job to another queue instead of jhublog.

Negin Sobhani (Nov 28 2023 at 18:39):

And also one general advice to all users to avoid this issue occurring for other users:
To prevent access issues on Casper, we strongly encourage you to terminate your JHub login sessions as soon as you have completed your work. We often noticed that some idle JHub sessions prevent other users from accessing login sessions on Casper.

Keith Lindsay (Nov 28 2023 at 22:51):

@Negin Sobhani , a drawback of stopping a jhub session is that if you restart it you don't get back the tabs that you previously had open (at least I don't). This is my primary reason for using casper login for sessions; they are long-running and I don't have to reopen multiple tabs that I've set up.

At the end of a previous zulip thread, @Jared Baker explained that this loss of session state is a consequence of cleanup that now occurs in the CISL configuration of jhub.

If there were a way besides using long-lived sessions to preserve which tabs are open, I would happily close my sessions. If someone here knows how to preserve this session state, I'd be very interested in learning about that.

Jared Baker (Nov 29 2023 at 01:05):

Hmm, that’s an interesting workflow. Might be able to cook something up to fix that. I’ll think about it over the next day or so. There was a good set of reasons we decided to make the state dirs ephemeral, but we might be able to reduce that scope a bit.

Stream: jupyterlab-hub

Topic: Pending spawn issues?