Stream: jupyter
Topic: Dask on cheyenne Example?
Brian Bonnlander (Sep 09 2020 at 21:57):
I'm unable to get the Dask dashboard working on cheyenne.
Is anyone able to post an example of importing and running Dask with the dashboard using JupyterHub on Cheyenne? I would be very grateful, as Casper resources are very hard to get right now. I am using Dask through a conda environment that I built on Casper, so I'm a little worried that it might not be configured properly to work on cheyenne.
Anderson Banihirwe (Sep 09 2020 at 22:04):
I'm unable to get the Dask dashboard working on cheyenne.
Are you getting an 404
error or some other error?
Brian Bonnlander (Sep 09 2020 at 22:05):
I don't know what the dashboard URL is supposed to be. I'm not seeing an HTTP error (yet).
Anderson Banihirwe (Sep 09 2020 at 22:10):
I don't know what the dashboard URL is supposed to be.
The template of the dashboard link should be "https://jupyterhub.ucar.edu/ch/user/{USER}/proxy/{port}/status"
Anderson Banihirwe (Sep 09 2020 at 22:10):
Are you using ncar-jobqueue
or dask-jobqueue
to instantiate the cluster?
Brian Bonnlander (Sep 09 2020 at 22:12):
Thank you, I will try that. Do you also happen to know what parameters to pass when creating the cluster, so that CPU and memory resources are reasonable for the cheyenne architecture?
Anderson Banihirwe (Sep 09 2020 at 22:16):
Try
cores = 20 processes = 10 memory = '109GB'
Anderson Banihirwe (Sep 09 2020 at 22:18):
if you find yourself getting KilledWorker
errors, try reducing the number of processes
Brian Bonnlander (Sep 09 2020 at 22:29):
Thank you, I am assuming I have to use ncar-jobqueue
on cheyenne. I will post shortly the code I am trying.
Anderson Banihirwe (Sep 09 2020 at 22:32):
I am assuming I have to use ncar-jobqueue on cheyenne
When using ncar-jobqueue
in conjunction with the hub
, the dashboard link is set for you i.e. should work out of the box without you needing to set it manually.
Brian Bonnlander (Sep 09 2020 at 22:35):
I was worried that by using my own Dask configuration, that the dashboard link would not be set by default, but I will try it as you suggest.
Brian Bonnlander (Sep 09 2020 at 22:39):
Does this code look reasonable?
import dask machine = 'cheyenne' # 'casper' if machine == 'cheyenne': # The following is supposedly set when using NCARCluster #dask.config.set({'distributed.dashboard.link': "https://jupyterhub.ucar.edu/ch/user/{USER}/proxy/{port}/status"}) from ncar_jobqueue import NCARCluster cluster = NCARCluster(cores=10, processes=20, memory='109GB', project='STDD0003') cluster.scale(jobs=20) else: # Assume machine is Casper. dask.config.set({'distributed.dashboard.link': '/proxy/{port}/status'}) from dask_jobqueue import SLURMCluster cluster = SLURMCluster(cores=8, memory='200GB', project='STDD0003') cluster.scale(jobs=8) from distributed import Client client = Client(cluster) cluster
Anderson Banihirwe (Sep 09 2020 at 22:45):
The primary objective of ncar-jobqueue
is to abstract the if
statements...
My recommendation is to just use
from ncar_jobqueue import NCARCluster cluster = NCARCluster(cores=10, processes=20, memory='109GB', project='STDD0003')
If you want to have separate configurations (such as memory, cores, etcc) for casper and/or cheyenne, You could just put them in the ~/.config/dask/jobqueue.yaml
file
Brian Bonnlander (Sep 09 2020 at 22:55):
OK, that is good information, I will give it a try on both systems and report back if there are problems. Thank you again!
Brian Bonnlander (Sep 09 2020 at 23:25):
OK, the dashboard works on cheyenne now, but Dask freezes with errors. I might be related to the operations I am trying, but still I wonder if I have the right package versions for Dask.
I am using these package versions:
bokeh 1.4.0 py38h32f6830_1 conda-forge dask 2.14.0 py_0 conda-forge dask-core 2.14.0 py_0 conda-forge dask-jobqueue 0.7.1 py_0 conda-forge distributed 2.14.0 py38h32f6830_0 conda-forge ncar-jobqueue 2020.3.4 pypi_0 pypi
tornado.application - ERROR - Exception in callback <bound method BokehTornado._keep_alive of <bokeh.server.tornado.BokehTornado object at 0x2b3a03b7a790>> Traceback (most recent call last): File "/glade/u/home/bonnland/miniconda3/envs/lens-conversion/lib/python3.8/site-packages/tornado/ioloop.py", line 907, in _run return self.callback() File "/glade/u/home/bonnland/miniconda3/envs/lens-conversion/lib/python3.8/site-packages/bokeh/server/tornado.py", line 579, in _keep_alive c.send_ping() File "/glade/u/home/bonnland/miniconda3/envs/lens-conversion/lib/python3.8/site-packages/bokeh/server/connection.py", line 80, in send_ping self._socket.ping(codecs.encode(str(self._ping_count), "utf-8")) File "/glade/u/home/bonnland/miniconda3/envs/lens-conversion/lib/python3.8/site-packages/tornado/websocket.py", line 447, in ping raise WebSocketClosedError() tornado.websocket.WebSocketClosedError
I am also getting back warnings fromintake-esm.to_dataset_dict()
like this:
/glade/u/home/bonnland/miniconda3/envs/lens-conversion/lib/python3.8/site-packages/dask/array/core.py:3911: PerformanceWarning: Increasing number of chunks by factor of 65 result = blockwise(
I don't think I am specifying too few chunks...
Brian Bonnlander (Sep 10 2020 at 20:31):
UPDATE: I was specifying too few chunks after all, when trying to create a Zarr store. After increasing the number of chunks and being patient, I was able to get past the original errors. The error messages still appear, but do not cause the overall computation to halt.
Users should note that the Dask dashboard can produce 500: Internal Server Error messages and go "dead" (all buttons turn grey) and for minutes at a time, before "coming back alive" (all buttons turn orange). It is possible that the computation is still active while these errors are occurring.
Brian Bonnlander (Sep 11 2020 at 16:41):
I believe I understand better now why the Dask dashboard is not working for me on Cheyenne, while it is working for me on Casper.
On Casper, I was ssh-tunneling to my own installed version of Jupyter Lab, which is more recent. Whereas on Cheyenne, I am logging in via JupyterHub, and using an older version of Jupyter. Apparently the older version of Jupyter on JupyterHub does not have a properly configured Jupyter instance, or it is out of date, because the Dask dashboard there is constantly "going dead".
If anyone knows how to run their own JupyterLab instance on Cheyenne, I would really appreciate some tips. Thanks!
Deepak Cherian (Sep 11 2020 at 17:28):
you can use the same tunnelling approach on cheyenne...
Brian Bonnlander (Sep 11 2020 at 18:26):
Ah, thanks I didn't know that!
Last updated: Jan 30 2022 at 12:01 UTC