scattering data · xarray · Zulip Chat Archive

Stream: xarray

Topic: scattering data

Adrianna Foster (Jan 23 2024 at 20:37):

Since the updates I've been getting this warning message quite a lot. I'm assuming it's because I'm not doing something correctly?

/glade/work/afoster/conda-envs/ml_analysis/lib/python3.11/site-packages/distributed/client.py:3162: UserWarning: Sending large graph of size 15.50 MiB.
This may cause some slowdown.
Consider scattering data ahead of time and using futures.
  warnings.warn(

For this warning, I am executing this command, which does some averaging:

gpp_annual = annual_mean(ds_pft.GPP)

def annual_mean(da):
    cf1, cf2 = cfs[da.name].values()

    days_per_month = da['time.daysinmonth']
    ann_mean = cf1*(days_per_month*da).groupby('time.year').sum().compute()
    ann_mean.name = da.name
    return ann_mean

I do have dask imported, but I guess I'm not sure how to actually "scatter"

Adrianna Foster (Jan 23 2024 at 21:04):

okay so I removed the .compute() from the function and now it runs extremely quickly... I'm not sure if this is a fluke, but it does make sense I think?

Adrianna Foster (Jan 23 2024 at 21:05):

though it's now just slowing down at a different location...

Michael Levy (Jan 23 2024 at 21:08):

.compute() is doing the computation, and I believe returning the result to the task running the notebook. If you want to do the computation, but leave the result distributed (which I think is what the warning about scattering wants you to do), you could do

ann_mean = cf1*(days_per_month*da).groupby('time.year').sum().persist()
wait(ann_mean)

the wait function needs to be imported from dask.distributed

Michael Levy (Jan 23 2024 at 21:09):

.persist() will start executing the dask task list, but it's non-blocking. The wait() call makes it blocking, so the function won't return until the annual mean has been computed

Adrianna Foster (Jan 23 2024 at 21:10):

thanks!

Last updated: May 16 2025 at 17:14 UTC