Since the updates I've been getting this warning message quite a lot. I'm assuming it's because I'm not doing something correctly?
/glade/work/afoster/conda-envs/ml_analysis/lib/python3.11/site-packages/distributed/client.py:3162: UserWarning: Sending large graph of size 15.50 MiB.
This may cause some slowdown.
Consider scattering data ahead of time and using futures.
warnings.warn(
For this warning, I am executing this command, which does some averaging:
gpp_annual = annual_mean(ds_pft.GPP)
def annual_mean(da):
cf1, cf2 = cfs[da.name].values()
days_per_month = da['time.daysinmonth']
ann_mean = cf1*(days_per_month*da).groupby('time.year').sum().compute()
ann_mean.name = da.name
return ann_mean
I do have dask imported, but I guess I'm not sure how to actually "scatter"
okay so I removed the .compute()
from the function and now it runs extremely quickly... I'm not sure if this is a fluke, but it does make sense I think?
though it's now just slowing down at a different location...
.compute()
is doing the computation, and I believe returning the result to the task running the notebook. If you want to do the computation, but leave the result distributed (which I think is what the warning about scattering wants you to do), you could do
ann_mean = cf1*(days_per_month*da).groupby('time.year').sum().persist()
wait(ann_mean)
the wait
function needs to be imported from dask.distributed
.persist()
will start executing the dask task list, but it's non-blocking. The wait()
call makes it blocking, so the function won't return until the annual mean has been computed
thanks!
Last updated: May 16 2025 at 17:14 UTC