Hi everyone -- I'm trying to run some code to use CESM2, but am having issues getting workers... I'm working with @Will Wieder and I'm trying to run through a notebook he wrote using the same conda environment he uses, but with different results.
I tried running a cell with the following:
cluster, client = get_ClusterClient(nmem='20GB')
cluster.scale(10)
cluster
On Will's machine, a window pops up after the code and workers start appearing in the dask worker window. I am getting neither of these outputs. No error messages right now, just no workers.
Any thoughts?
you may want to make sure you have both ipywidgets
and dask-labextension
installed in the environment you are using within the notebook
mamba install -c conda-forge ipywidgets dask-labextension
or
conda install -c conda-forge ipywidgets dask-labextension
restart the notebook/kernel after the installation
Still running into the same issues... Is there a way to double check that I have both of them installed?
from the notebook, do you get any output when you run
import ipywidgets
print(ipywidgets.__version__)
I get
7.6.5
okay... looks good.
regarding the missing workers, you may want to check if you already have some pending jobs in the queue
from the command line
qstat -u $USER
or within a notebook cell
!qstat -u $USER
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
2259179.casper* eschlerm jhublog* cr-login-* 97095 1 1 4gb 720:0 R 454:4
^ This is the output from the command line
it appears you don't have any pending dask-worker jobs
what's the output of
print(cluster.job_script())
#!/usr/bin/env bash
#PBS -N dask-worker
#PBS -q casper
#PBS -A P93300041
#PBS -l select=1:ncpus=1:mem=20GB
#PBS -l walltime=2:00:00
/glade/work/eschlerm/opt/miniconda/envs/lens-py/bin/python -m distributed.cli.dask_worker tcp://10.12.1.3:43534 --nthreads 1 --memory-limit 18.63GiB --name dummy-name --nanny --death-timeout 60 --interface ib0 --protocol tcp://
Also I'm now getting the missing qsub error again
[Errno 2] No such file or directory: 'qsub': 'qsub'
this issue appears to be related to https://zulip.ucar.edu/#narrow/stream/16-jupyterlab-hub/topic/qsub.20missing.20from.20.24PATH.20when.20using.20JupyterHub. @Jared Baker, do you happen to have a hint about why @Else Schlerman doesn't have qsub
on their PATH???
I see it on the path for the base jupyter server on crhtc45. Very much in the PATH
variable at the end.
when launching the submit job, is that where the error with qsub
is coming with? If submitting, I wouldn't guarantee variables in in the environment w/o -V
on qsub on in the script.
I see it on the path for the base jupyter server on crhtc45. Very much in the PATH variable at the end.
@Else Schlerman, are you using the jupyterhub (https://jupyterhub.hpc.ucar.edu/
) or launching the jupyter server yourself (via jupyter-forward
)?
I'm launching the jupyter server via jupyter-forward
@Jared Baker I'm not quite sure what you're asking, but here is the git repository of the code with the error message, if that is helpful
https://github.com/eschlerm/permafrost/blob/master/.ipynb_checkpoints/LocalChange-ARC-checkpoint.ipynb
I'm noticing that the qsub error occurs when I add print(cluster.job_script())
to the cell and run it, but I'm not currently getting the error otherwise. However, I am still not getting any workers
The only output I get is
Tab(children=(HTML(value='\n <div class="jp-RenderedHTMLCommon jp-RenderedHTML jp-mod-trusted jp-Ou…
can you just add #PBS -V
to the dask script?
in jupyter notebooks?
@Else Schlerman, you will need to modify the code in get_ClusterClient()
function which I assume contains the code responsible for instantiating the dask cluster
and pass job_extra=["-V"]
cluster = PBSCluster(..., job_extra=["-V"])
Thank you @Anderson Banihirwe
I now have:
def get_ClusterClient(ncores=1, nmem='25GB'):
import dask
from dask_jobqueue import PBSCluster
from dask.distributed import Client
ncores=ncores
nmem = nmem
cluster = PBSCluster(
cores=ncores, # The number of cores you want
memory=nmem, # Amount of memory
processes=ncores, # How many processes
queue='casper', # The type of queue to utilize (/glade/u/apps/dav/opt/usr/bin/execcasper)
resource_spec='select=1:ncpus='+str(ncores)+':mem='+nmem, # Specify resources
project='P93300041', # Input your project ID here
walltime='2:00:00', # Amount of wall time
interface='ib0', # Interface to use
job_extra=["-V"]
)
dask.config.set({
'distributed.dashboard.link':
'https://jupyterhub.hpc.ucar.edu/stable/user/{USER}/proxy/{port}/status'
})
client = Client(cluster)
return cluster, client
This did seem to fix the qsub error when I use the print(cluster.job_script())
command, but still not getting any workers
@Else Schlerman It's possible you're missing a cluster.scale(x)
command (where x = number of workers) after the cluster = PBSCluster()
call. I think that is the call that actually requests the workers.
Thanks @Katie Dagon I do have that command in the next cell, not copied above. However, I went to the xdev office hours last night -- it seems like the issue was coming from my jupyter forwarding configuration and things are now working as expected!
to add to this, Else's cloned my conda environment with a .yml file created from the environment that's working for me. She's running the identical notebook, but unable to get any workers to show up. Is there something else we're potentially missing here?
Ah, I posted before reading this last note, I wondered if what was a jupyter forward issue. Thanks for digging in @Else Schlerman !
Yeah, it turned out that juypter-forward
was having trouble with the TMPDIR
environment variable (printenv TMPDIR
was showing the variable pointing to her scratch space, but then when it checked to see if $TMPDIR
was writable, it was reverting to an empty string and therefore trying to create files in /
). Once we explicitly defined TMPDIR
in her .bashrc
file, everything worked as expected... though it's still unclear to me why we needed to do that. (I should let @Anderson Banihirwe know about this :smile: )
Can I ask a general question about when we would want to use jupyter-forward
over the jupyterhub? Is it to avoid hub stability issues? I used to do a lot of port forwarding to launch jupyter lab, but since the hub stability has improved I find myself just logging on to the hub. Curious about when jupyter-forward
might be preferable though.
Katie Dagon said:
Can I ask a general question about when we would want to use
jupyter-forward
over the jupyterhub? Is it to avoid hub stability issues? I used to do a lot of port forwarding to launch jupyter lab, but since the hub stability has improved I find myself just logging on to the hub. Curious about whenjupyter-forward
might be preferable though.
At this point, I'm really only using jupyter-forward
when the Hub is down. It's proving to be a useful tool for systems that don't have JupyterHub installed - I haven't really done any analysis on andre
but suspect jupyter-forward
would be the best tool for launching a notebook on that machine
It's proving to be a useful tool for systems that don't have JupyterHub installed - I haven't really done any analysis on andre but suspect jupyter-forward would be the best tool for launching a notebook on that machine
I agree.
Last updated: May 16 2025 at 17:14 UTC