Dask on cheyenne Example? · jupyterlab-hub

Is anyone able to post an example of importing and running Dask with the dashboard using JupyterHub on Cheyenne? I would be very grateful, as Casper resources are very hard to get right now. I am using Dask through a conda environment that I built on Casper, so I'm a little worried that it might not be configured properly to work on cheyenne.

Anderson Banihirwe (Sep 09 2020 at 22:04):

Brian Bonnlander (Sep 09 2020 at 22:05):

I don't know what the dashboard URL is supposed to be. I'm not seeing an HTTP error (yet).

Anderson Banihirwe (Sep 09 2020 at 22:10):

The template of the dashboard link should be "https://jupyterhub.ucar.edu/ch/user/{USER}/proxy/{port}/status"

Anderson Banihirwe (Sep 09 2020 at 22:10):

Brian Bonnlander (Sep 09 2020 at 22:12):

Thank you, I will try that. Do you also happen to know what parameters to pass when creating the cluster, so that CPU and memory resources are reasonable for the cheyenne architecture?

Anderson Banihirwe (Sep 09 2020 at 22:16):

cores = 20
processes = 10
memory = '109GB'

Anderson Banihirwe (Sep 09 2020 at 22:18):

if you find yourself getting KilledWorker errors, try reducing the number of processes

Brian Bonnlander (Sep 09 2020 at 22:29):

Thank you, I am assuming I have to use ncar-jobqueue on cheyenne. I will post shortly the code I am trying.

Anderson Banihirwe (Sep 09 2020 at 22:32):

When using ncar-jobqueue in conjunction with the hub, the dashboard link is set for you i.e. should work out of the box without you needing to set it manually.

Brian Bonnlander (Sep 09 2020 at 22:35):

I was worried that by using my own Dask configuration, that the dashboard link would not be set by default, but I will try it as you suggest.

Brian Bonnlander (Sep 09 2020 at 22:39):

import dask

machine = 'cheyenne'  # 'casper'

if machine == 'cheyenne':
    # The following is supposedly set when using NCARCluster
    #dask.config.set({'distributed.dashboard.link': "https://jupyterhub.ucar.edu/ch/user/{USER}/proxy/{port}/status"})
    from ncar_jobqueue import NCARCluster
    cluster = NCARCluster(cores=10, processes=20, memory='109GB', project='STDD0003')
    cluster.scale(jobs=20)
else:
    # Assume machine is Casper.
    dask.config.set({'distributed.dashboard.link': '/proxy/{port}/status'})
    from dask_jobqueue import SLURMCluster
    cluster = SLURMCluster(cores=8, memory='200GB', project='STDD0003')
    cluster.scale(jobs=8)

from distributed import Client
client = Client(cluster)
cluster

Anderson Banihirwe (Sep 09 2020 at 22:45):

from ncar_jobqueue import NCARCluster
cluster = NCARCluster(cores=10, processes=20, memory='109GB', project='STDD0003')

If you want to have separate configurations (such as memory, cores, etcc) for casper and/or cheyenne, You could just put them in the ~/.config/dask/jobqueue.yaml file

Brian Bonnlander (Sep 09 2020 at 22:55):

OK, that is good information, I will give it a try on both systems and report back if there are problems. Thank you again!

Brian Bonnlander (Sep 09 2020 at 23:25):

OK, the dashboard works on cheyenne now, but Dask freezes with errors. I might be related to the operations I am trying, but still I wonder if I have the right package versions for Dask.

bokeh                     1.4.0            py38h32f6830_1    conda-forge
dask                      2.14.0                     py_0    conda-forge
dask-core                 2.14.0                     py_0    conda-forge
dask-jobqueue             0.7.1                      py_0    conda-forge
distributed               2.14.0           py38h32f6830_0    conda-forge
ncar-jobqueue             2020.3.4                 pypi_0    pypi

tornado.application - ERROR - Exception in callback <bound method BokehTornado._keep_alive of <bokeh.server.tornado.BokehTornado object at 0x2b3a03b7a790>>
Traceback (most recent call last):
  File "/glade/u/home/bonnland/miniconda3/envs/lens-conversion/lib/python3.8/site-packages/tornado/ioloop.py", line 907, in _run
    return self.callback()
  File "/glade/u/home/bonnland/miniconda3/envs/lens-conversion/lib/python3.8/site-packages/bokeh/server/tornado.py", line 579, in _keep_alive
    c.send_ping()
  File "/glade/u/home/bonnland/miniconda3/envs/lens-conversion/lib/python3.8/site-packages/bokeh/server/connection.py", line 80, in send_ping
    self._socket.ping(codecs.encode(str(self._ping_count), "utf-8"))
  File "/glade/u/home/bonnland/miniconda3/envs/lens-conversion/lib/python3.8/site-packages/tornado/websocket.py", line 447, in ping
    raise WebSocketClosedError()
tornado.websocket.WebSocketClosedError

/glade/u/home/bonnland/miniconda3/envs/lens-conversion/lib/python3.8/site-packages/dask/array/core.py:3911: PerformanceWarning: Increasing number of chunks by factor of 65
  result = blockwise(

Brian Bonnlander (Sep 10 2020 at 20:31):

UPDATE: I was specifying too few chunks after all, when trying to create a Zarr store. After increasing the number of chunks and being patient, I was able to get past the original errors. The error messages still appear, but do not cause the overall computation to halt.

Users should note that the Dask dashboard can produce 500: Internal Server Error messages and go "dead" (all buttons turn grey) and for minutes at a time, before "coming back alive" (all buttons turn orange). It is possible that the computation is still active while these errors are occurring.

Brian Bonnlander (Sep 11 2020 at 16:41):

I believe I understand better now why the Dask dashboard is not working for me on Cheyenne, while it is working for me on Casper.

On Casper, I was ssh-tunneling to my own installed version of Jupyter Lab, which is more recent. Whereas on Cheyenne, I am logging in via JupyterHub, and using an older version of Jupyter. Apparently the older version of Jupyter on JupyterHub does not have a properly configured Jupyter instance, or it is out of date, because the Dask dashboard there is constantly "going dead".

If anyone knows how to run their own JupyterLab instance on Cheyenne, I would really appreciate some tips. Thanks!

Stream: jupyterlab-hub

Topic: Dask on cheyenne Example?

Brian Bonnlander (Sep 09 2020 at 21:57):

Anderson Banihirwe (Sep 09 2020 at 22:04):

Brian Bonnlander (Sep 09 2020 at 22:05):

Anderson Banihirwe (Sep 09 2020 at 22:10):

Anderson Banihirwe (Sep 09 2020 at 22:10):

Brian Bonnlander (Sep 09 2020 at 22:12):

Anderson Banihirwe (Sep 09 2020 at 22:16):

Anderson Banihirwe (Sep 09 2020 at 22:18):

Brian Bonnlander (Sep 09 2020 at 22:29):

Anderson Banihirwe (Sep 09 2020 at 22:32):

Brian Bonnlander (Sep 09 2020 at 22:35):

Brian Bonnlander (Sep 09 2020 at 22:39):

Anderson Banihirwe (Sep 09 2020 at 22:45):

Brian Bonnlander (Sep 09 2020 at 22:55):

Brian Bonnlander (Sep 09 2020 at 23:25):

Brian Bonnlander (Sep 10 2020 at 20:31):

Brian Bonnlander (Sep 11 2020 at 16:41):

Deepak Cherian (Sep 11 2020 at 17:28):

Brian Bonnlander (Sep 11 2020 at 18:26):