@all Here is a post detailing how to get started using Dask with PBSCluster on Casper through the new Jupyterhub which launches tomorrow https://ncar.github.io/esds/posts/casper_pbs_dask/
If anybody has this working, I'm trying to do it currently but am unsure of the order of things. Any help much appreciated!
Currently I'm starting a JupyterHub session (e.g. Casper batch, 2 nodes, 16 CPUs per node, 100GiB per node) and then running a version of the example code (from the GitHub page) in the resulting notebook. But it seems strange to me to be providing all the session information again rather than just, say, a Job ID from which this could be grabbed. Additionally the PBSCluster
documentation suggests that it is setting up a new job from scratch (e.g. passing arguments to #PBS), rather than using the one I already have. Am I doing things the right way around, or asking for a whole new job after already starting one? And - apologies for two questions in one post! - what is the significance of cluster.scale(2)
?
Hi Dafydd,
I am using that function to get dask workers for my notebooks.
I think of those dask clusters as associated with a specific notebook rather than with a JupyterHub session. So I don't ask for my computational resources when I am logging into JupyterHub, I just adjust the wallclock. Then I ask for the required number of CPUs within a given notebook with that PBSCluster sequence.
The cluster.scale() function goes and gets your extra computational cores. cluster.scale(2) would deliver 2 CPUS.
But I'm interested to hear if there are other approaches out there..
Hi Daniel,
This makes sense, thanks! I've managed to get this behaving more as expected by launching the notebook on the login node and then requesting the dask workers on the compute nodes from within the notebook. I suppose a related question would then be how to get dask to use an existing set of resources (say for running a python script to process output at the end of a model run which is already using many nodes/cpus), but that's probably one for another stream.
@Dafydd Stephenson,
I suppose a related question would then be how to get dask to use an existing set of resources (say for running a python script to process output at the end of a model run which is already using many nodes/cpus), but that's probably one for another stream.
You may find dask-mpi
to be useful for this kind of setup: https://mpi.dask.org/en/latest/
You may find
dask-mpi
to be useful for this kind of setup: https://mpi.dask.org/en/latest/
This looks exactly like what I'm looking for and seems very straightforward to set up. Thanks!
Last updated: Jan 27 2025 at 22:16 UTC