I'm unable to get the Dask dashboard working on cheyenne.
Is anyone able to post an example of importing and running Dask with the dashboard using JupyterHub on Cheyenne? I would be very grateful, as Casper resources are very hard to get right now. I am using Dask through a conda environment that I built on Casper, so I'm a little worried that it might not be configured properly to work on cheyenne.
I'm unable to get the Dask dashboard working on cheyenne.
Are you getting an 404
error or some other error?
I don't know what the dashboard URL is supposed to be. I'm not seeing an HTTP error (yet).
I don't know what the dashboard URL is supposed to be.
The template of the dashboard link should be "https://jupyterhub.ucar.edu/ch/user/{USER}/proxy/{port}/status"
Are you using ncar-jobqueue
or dask-jobqueue
to instantiate the cluster?
Thank you, I will try that. Do you also happen to know what parameters to pass when creating the cluster, so that CPU and memory resources are reasonable for the cheyenne architecture?
Try
cores = 20 processes = 10 memory = '109GB'
if you find yourself getting KilledWorker
errors, try reducing the number of processes
Thank you, I am assuming I have to use ncar-jobqueue
on cheyenne. I will post shortly the code I am trying.
I am assuming I have to use ncar-jobqueue on cheyenne
When using ncar-jobqueue
in conjunction with the hub
, the dashboard link is set for you i.e. should work out of the box without you needing to set it manually.
I was worried that by using my own Dask configuration, that the dashboard link would not be set by default, but I will try it as you suggest.
Does this code look reasonable?
import dask machine = 'cheyenne' # 'casper' if machine == 'cheyenne': # The following is supposedly set when using NCARCluster #dask.config.set({'distributed.dashboard.link': "https://jupyterhub.ucar.edu/ch/user/{USER}/proxy/{port}/status"}) from ncar_jobqueue import NCARCluster cluster = NCARCluster(cores=10, processes=20, memory='109GB', project='STDD0003') cluster.scale(jobs=20) else: # Assume machine is Casper. dask.config.set({'distributed.dashboard.link': '/proxy/{port}/status'}) from dask_jobqueue import SLURMCluster cluster = SLURMCluster(cores=8, memory='200GB', project='STDD0003') cluster.scale(jobs=8) from distributed import Client client = Client(cluster) cluster
The primary objective of ncar-jobqueue
is to abstract the if
statements...
My recommendation is to just use
from ncar_jobqueue import NCARCluster cluster = NCARCluster(cores=10, processes=20, memory='109GB', project='STDD0003')
If you want to have separate configurations (such as memory, cores, etcc) for casper and/or cheyenne, You could just put them in the ~/.config/dask/jobqueue.yaml
file
OK, that is good information, I will give it a try on both systems and report back if there are problems. Thank you again!
OK, the dashboard works on cheyenne now, but Dask freezes with errors. I might be related to the operations I am trying, but still I wonder if I have the right package versions for Dask.
I am using these package versions:
bokeh 1.4.0 py38h32f6830_1 conda-forge dask 2.14.0 py_0 conda-forge dask-core 2.14.0 py_0 conda-forge dask-jobqueue 0.7.1 py_0 conda-forge distributed 2.14.0 py38h32f6830_0 conda-forge ncar-jobqueue 2020.3.4 pypi_0 pypi
tornado.application - ERROR - Exception in callback <bound method BokehTornado._keep_alive of <bokeh.server.tornado.BokehTornado object at 0x2b3a03b7a790>> Traceback (most recent call last): File "/glade/u/home/bonnland/miniconda3/envs/lens-conversion/lib/python3.8/site-packages/tornado/ioloop.py", line 907, in _run return self.callback() File "/glade/u/home/bonnland/miniconda3/envs/lens-conversion/lib/python3.8/site-packages/bokeh/server/tornado.py", line 579, in _keep_alive c.send_ping() File "/glade/u/home/bonnland/miniconda3/envs/lens-conversion/lib/python3.8/site-packages/bokeh/server/connection.py", line 80, in send_ping self._socket.ping(codecs.encode(str(self._ping_count), "utf-8")) File "/glade/u/home/bonnland/miniconda3/envs/lens-conversion/lib/python3.8/site-packages/tornado/websocket.py", line 447, in ping raise WebSocketClosedError() tornado.websocket.WebSocketClosedError
I am also getting back warnings fromintake-esm.to_dataset_dict()
like this:
/glade/u/home/bonnland/miniconda3/envs/lens-conversion/lib/python3.8/site-packages/dask/array/core.py:3911: PerformanceWarning: Increasing number of chunks by factor of 65 result = blockwise(
I don't think I am specifying too few chunks...
UPDATE: I was specifying too few chunks after all, when trying to create a Zarr store. After increasing the number of chunks and being patient, I was able to get past the original errors. The error messages still appear, but do not cause the overall computation to halt.
Users should note that the Dask dashboard can produce 500: Internal Server Error messages and go "dead" (all buttons turn grey) and for minutes at a time, before "coming back alive" (all buttons turn orange). It is possible that the computation is still active while these errors are occurring.
I believe I understand better now why the Dask dashboard is not working for me on Cheyenne, while it is working for me on Casper.
On Casper, I was ssh-tunneling to my own installed version of Jupyter Lab, which is more recent. Whereas on Cheyenne, I am logging in via JupyterHub, and using an older version of Jupyter. Apparently the older version of Jupyter on JupyterHub does not have a properly configured Jupyter instance, or it is out of date, because the Dask dashboard there is constantly "going dead".
If anyone knows how to run their own JupyterLab instance on Cheyenne, I would really appreciate some tips. Thanks!
you can use the same tunnelling approach on cheyenne...
Ah, thanks I didn't know that!
Last updated: May 16 2025 at 17:14 UTC