Stream: jupyter
Topic: Casper dask dashboard errors
Brian Bonnlander (May 04 2020 at 23:13):
Anyone else having problems displaying the dask dashboard after ssh-tunneling to Casper? I click on the link (e.g. /proxy/8787/status), and I get the error [ErrNo 111] Connection refused
.
Brian Bonnlander (May 04 2020 at 23:16):
It seems that Dask is running properly despite this problem...
Anderson Banihirwe (May 04 2020 at 23:29):
Can you confirm that dask is actually running the dashboard on port 8787?
Brian Bonnlander (May 04 2020 at 23:31):
How do I check?
Anderson Banihirwe (May 04 2020 at 23:31):
Also, can you confirm that you have jupyter-server-proxy
in your environment?
Anderson Banihirwe (May 04 2020 at 23:32):
How do I check?
How are you setting the dashboard link in dask's configuration?
Anderson Banihirwe (May 04 2020 at 23:32):
dask.config.set({'distributed.dashboard.link': '/proxy/8787/status'})
or
dask.config.set({'distributed.dashboard.link': '/proxy/{port}/status'})
????
Brian Bonnlander (May 04 2020 at 23:33):
I'm doing the second one with {port}
in it.
Anderson Banihirwe (May 04 2020 at 23:33):
Okay
Anderson Banihirwe (May 04 2020 at 23:33):
Also, can you confirm that you have
jupyter-server-proxy
in your environment?
How about this :point_up:
Brian Bonnlander (May 04 2020 at 23:34):
When I type conda list
in my environment, that package is not listed.
Anderson Banihirwe (May 04 2020 at 23:35):
Sounds like we found the issue
Brian Bonnlander (May 04 2020 at 23:35):
OK, I will try that out.
Anderson Banihirwe (May 04 2020 at 23:35):
You need the jupyter-server-proxy
for the dask.config.set({'distributed.dashboard.link': '/proxy/{port}/status'})
to work
Anderson Banihirwe (May 05 2020 at 00:26):
@Brian Bonnlander, I am noticing similar behavior even when I have jupyter-server-proxy
installed. What version of dask/distributed/bokeh are you running?
Brian Bonnlander (May 05 2020 at 00:47):
# packages in environment at /glade/u/home/bonnland/miniconda3/envs/lens-conversion: # # Name Version Build Channel dask 2.15.0 py_0 conda-forge distributed 2.15.2 py38h32f6830_0 conda-forge bokeh 1.4.0 py38h32f6830_1 conda-forge
It is our zarrification environment, built a few days ago.
Anderson Banihirwe (May 05 2020 at 00:50):
Try downgrading dask and distributed to 2.14.0
, and see if the issue goes away
Brian Bonnlander (May 05 2020 at 00:54):
So, after changing the environment, is a kernel restart sufficient to get the changes? Or do I have to restart the lab?
Anderson Banihirwe (May 05 2020 at 00:56):
Try refreshing the lab first
Brian Bonnlander (May 05 2020 at 01:03):
I did conda install dask=2.14.0 distributed=2.14.0
, which took a while to solve the environment, but...after hitting the circular 'Refresh' button on the lab and restarting the kernel, the dashboard works! Downgrading was the answer.
Anderson Banihirwe (May 05 2020 at 01:06):
Great!... Something weird is going on depending on the versions of dask/distributed/bokeh one is using. I was running into a similar issue with the following:
$ conda list dask # packages in environment at /glade/work/abanihi/softwares/miniconda3/envs/analysis: # # Name Version Build Channel dask 2.15.0 py_0 conda-forge dask-core 2.15.0 py_0 conda-forge dask-jobqueue 0.7.1 py_0 conda-forge dask-mpi 2.0.0 py37_0 conda-forge $ conda list bokeh # packages in environment at /glade/work/abanihi/softwares/miniconda3/envs/analysis: # # Name Version Build Channel bokeh 2.0.1 py37hc8dfbb8_0 conda-forge
Anna-Lena Deppenmeier (May 05 2020 at 16:00):
I had a similar problem recently. I wonder whether it might be useful to keep track of the working (combination) of versions in here, so that when someone runs into problems they can try downgrading / upgrading to the "tested" versions first?!
Brian Bonnlander (May 05 2020 at 16:12):
OK, for me these commands got me a working dashboard:
conda activate my-pangeo-environment conda install dask=2.14.0 distributed=2.14.0 bokeh=1.4.0
Riley Brady (May 22 2020 at 16:58):
FYI this worked for me as well. After a few frustrating weeks of not seeing a dashboard. Downgraded to the same as @Brian Bonnlander and the dashboard is running again. @Anderson Banihirwe any idea what's going on here? I recall an earlier issue with bokeh
but this seems like a dask
thing now as well. Are the developers aware?
Anderson Banihirwe (May 22 2020 at 18:00):
@Anderson Banihirwe any idea what's going on here? I recall an earlier issue with bokeh but this seems like a dask thing now as well. Are the developers aware?
There were some issues with distributed 2.15.0 and 2.15.1. However, yesterday I ran into a similar issue with the latest version (2.16.0). I haven't had time to narrow down the possible causes....
Anderson Banihirwe (May 22 2020 at 18:01):
I am going to try out different versions of distributed
, bokek
, and jupyter-server-proxy
to see if I can come up with a combination of versions that are problematic, and then I will open an issue upstream
Anderson Banihirwe (Jul 02 2020 at 15:44):
As an update, it turns out that there were some changes in dask's distributed scheduler codebase that broke the dashboard functionality when the network interface was explicitly specified (under the hood, dask-jobqueue explicitly specifies that dask should use the infiniband interface)....
So, for anyone who is running into this same issue,
one way to fix this is to pass the dashboard_address='0.0.0.0'
which tells the dashboard server to listen to all network interfaces:
cluster = SLURMCluster(...., scheduler_options={"dashboard_address" :'0.0.0.0'})
or
cluster = PBSCluster(...., scheduler_options={"dashboard_address" :'0.0.0.0'})
or
cluster = NCARCluster(...., scheduler_options={"dashboard_address" :'0.0.0.0'})
Anderson Banihirwe (Jul 02 2020 at 15:47):
Or you can wait for the next release of distributed ( I think it's going to be 2.19.1
) which will include a fix for this issue...
Last updated: Jan 30 2022 at 12:01 UTC