I'm not sure if this is a dask
problem, a JupyterHub problem, or a PBS issue... @Jared Baker, I'm tagging you in case it's a Hub or PBS problem, since I don't know if you check out this channel regularly. I was chatting with @Holly Olivarez, who ran into this issue first, but I was able to reproduce it on my own:
from dask_jobqueue import PBSCluster
from dask.distributed import Client
cluster = PBSCluster(
cores=36,
memory='300 GB',
processes=9,
resource_spec='select=1:ncpus=36:mem=300GB',
)
cluster.scale(1)
Works fine
client = Client(cluster)
fails with
RuntimeError: Command exited with non-zero exit code.
Exit code: 1
Command:
qsub /glade/scratch/mlevy/tmpdir/tmpaxepzplf.sh
stdout:
There was a problem selecting the proper resource. Please open a research computing ticket.
stderr:
this is dask 2022.01.0
, not sure if it would be useful to have version numbers from anything else. I was using the Hub to run from a Casper PBS node, and Holly was on the Casper login node.
Has anyone seen this before? As mentioned early in this stream (but a different topic), this PBSCluster()
command was working fine just a few days ago... I believe out of the same conda environment I'm using to reproduce Holly's error.
where are you running on? JupyterHub Stable?
yup, JupyterHub Stable
okay. I know where that message comes from. Give me just a few seconds.
What about now?
looks like it's working, thanks! (I needed to add the queue
argument, but that's probably an issue in my configuration :)
Yeah, it looks to me like the queuing system was not getting sufficient information (i.e., the queue!).
Last updated: May 16 2025 at 17:14 UTC