Stream: jupyter

Topic: Casper dask dashboard errors


view this post on Zulip Brian Bonnlander (May 04 2020 at 23:13):

Anyone else having problems displaying the dask dashboard after ssh-tunneling to Casper? I click on the link (e.g. /proxy/8787/status), and I get the error [ErrNo 111] Connection refused.

view this post on Zulip Brian Bonnlander (May 04 2020 at 23:16):

It seems that Dask is running properly despite this problem...

view this post on Zulip Anderson Banihirwe (May 04 2020 at 23:29):

Can you confirm that dask is actually running the dashboard on port 8787?

view this post on Zulip Brian Bonnlander (May 04 2020 at 23:31):

How do I check?

view this post on Zulip Anderson Banihirwe (May 04 2020 at 23:31):

Also, can you confirm that you have jupyter-server-proxy in your environment?

view this post on Zulip Anderson Banihirwe (May 04 2020 at 23:32):

How do I check?

How are you setting the dashboard link in dask's configuration?

view this post on Zulip Anderson Banihirwe (May 04 2020 at 23:32):

dask.config.set({'distributed.dashboard.link': '/proxy/8787/status'})

or

dask.config.set({'distributed.dashboard.link': '/proxy/{port}/status'})

????

view this post on Zulip Brian Bonnlander (May 04 2020 at 23:33):

I'm doing the second one with {port} in it.

view this post on Zulip Anderson Banihirwe (May 04 2020 at 23:33):

Okay

view this post on Zulip Anderson Banihirwe (May 04 2020 at 23:33):

Also, can you confirm that you have jupyter-server-proxy in your environment?

How about this :point_up:

view this post on Zulip Brian Bonnlander (May 04 2020 at 23:34):

When I type conda list in my environment, that package is not listed.

view this post on Zulip Anderson Banihirwe (May 04 2020 at 23:35):

Sounds like we found the issue

view this post on Zulip Brian Bonnlander (May 04 2020 at 23:35):

OK, I will try that out.

view this post on Zulip Anderson Banihirwe (May 04 2020 at 23:35):

You need the jupyter-server-proxy for the dask.config.set({'distributed.dashboard.link': '/proxy/{port}/status'}) to work

view this post on Zulip Anderson Banihirwe (May 05 2020 at 00:26):

@Brian Bonnlander, I am noticing similar behavior even when I have jupyter-server-proxy installed. What version of dask/distributed/bokeh are you running?

view this post on Zulip Brian Bonnlander (May 05 2020 at 00:47):

# packages in environment at /glade/u/home/bonnland/miniconda3/envs/lens-conversion:
#
# Name                    Version                   Build  Channel
dask                      2.15.0                     py_0    conda-forge
distributed               2.15.2           py38h32f6830_0    conda-forge
bokeh                     1.4.0            py38h32f6830_1    conda-forge

It is our zarrification environment, built a few days ago.

view this post on Zulip Anderson Banihirwe (May 05 2020 at 00:50):

Try downgrading dask and distributed to 2.14.0, and see if the issue goes away

view this post on Zulip Brian Bonnlander (May 05 2020 at 00:54):

So, after changing the environment, is a kernel restart sufficient to get the changes? Or do I have to restart the lab?

view this post on Zulip Anderson Banihirwe (May 05 2020 at 00:56):

Try refreshing the lab first

view this post on Zulip Brian Bonnlander (May 05 2020 at 01:03):

I did conda install dask=2.14.0 distributed=2.14.0, which took a while to solve the environment, but...after hitting the circular 'Refresh' button on the lab and restarting the kernel, the dashboard works! Downgrading was the answer.

view this post on Zulip Anderson Banihirwe (May 05 2020 at 01:06):

Great!... Something weird is going on depending on the versions of dask/distributed/bokeh one is using. I was running into a similar issue with the following:

$ conda list dask
# packages in environment at /glade/work/abanihi/softwares/miniconda3/envs/analysis:
#
# Name                    Version                   Build  Channel
dask                      2.15.0                     py_0    conda-forge
dask-core                 2.15.0                     py_0    conda-forge
dask-jobqueue             0.7.1                      py_0    conda-forge
dask-mpi                  2.0.0                    py37_0    conda-forge

$ conda list bokeh
# packages in environment at /glade/work/abanihi/softwares/miniconda3/envs/analysis:
#
# Name                    Version                   Build  Channel
bokeh                     2.0.1            py37hc8dfbb8_0    conda-forge

view this post on Zulip Anna-Lena Deppenmeier (May 05 2020 at 16:00):

I had a similar problem recently. I wonder whether it might be useful to keep track of the working (combination) of versions in here, so that when someone runs into problems they can try downgrading / upgrading to the "tested" versions first?!

view this post on Zulip Brian Bonnlander (May 05 2020 at 16:12):

OK, for me these commands got me a working dashboard:

conda activate my-pangeo-environment
conda install dask=2.14.0 distributed=2.14.0 bokeh=1.4.0

view this post on Zulip Riley Brady (May 22 2020 at 16:58):

FYI this worked for me as well. After a few frustrating weeks of not seeing a dashboard. Downgraded to the same as @Brian Bonnlander and the dashboard is running again. @Anderson Banihirwe any idea what's going on here? I recall an earlier issue with bokeh but this seems like a dask thing now as well. Are the developers aware?

view this post on Zulip Anderson Banihirwe (May 22 2020 at 18:00):

@Anderson Banihirwe any idea what's going on here? I recall an earlier issue with bokeh but this seems like a dask thing now as well. Are the developers aware?

There were some issues with distributed 2.15.0 and 2.15.1. However, yesterday I ran into a similar issue with the latest version (2.16.0). I haven't had time to narrow down the possible causes....

view this post on Zulip Anderson Banihirwe (May 22 2020 at 18:01):

I am going to try out different versions of distributed, bokek, and jupyter-server-proxy to see if I can come up with a combination of versions that are problematic, and then I will open an issue upstream

view this post on Zulip Anderson Banihirwe (Jul 02 2020 at 15:44):

As an update, it turns out that there were some changes in dask's distributed scheduler codebase that broke the dashboard functionality when the network interface was explicitly specified (under the hood, dask-jobqueue explicitly specifies that dask should use the infiniband interface)....

So, for anyone who is running into this same issue,

one way to fix this is to pass the dashboard_address='0.0.0.0' which tells the dashboard server to listen to all network interfaces:

cluster = SLURMCluster(...., scheduler_options={"dashboard_address" :'0.0.0.0'})

or

cluster = PBSCluster(...., scheduler_options={"dashboard_address" :'0.0.0.0'})

or

cluster = NCARCluster(...., scheduler_options={"dashboard_address" :'0.0.0.0'})

view this post on Zulip Anderson Banihirwe (Jul 02 2020 at 15:47):

Or you can wait for the next release of distributed ( I think it's going to be 2.19.1) which will include a fix for this issue...


Last updated: Jan 30 2022 at 12:01 UTC