Stream: python-questions
Topic: ValueError using xarray map_blocks to remove climatology
Dafydd Stephenson (Aug 25 2021 at 18:45):
Hi all,
I'm trying to calculate anomalies from a daily climatology using xarray's groupby
, but without rechunking, which is apparently something xarray struggles with. A workaround suggested by Ryan Abernathey here https://nbviewer.jupyter.org/gist/rabernat/30e7b747f0e3583b5b776e4093266114 uses map_blocks
to feed the DataArray to a function which calculates the climatology. My own (near-identical) implementation of this fails, as (I think) within the function a new DataArray is created by groupby
with an additional dimension that is not in the input DA. I've created a short example just using random data:
import numpy as np
import xarray as xr
import datetime as dt
import dask.array as da
times=np.arange(dt.datetime(1992,2,11,0,0,0),dt.datetime(2021,2,11,0,0,0),dt.timedelta(days=1))
lats=np.arange(-90,90,1.875)
lons=np.arange(-180,180,3.75)
X=xr.DataArray(\
da.random.normal(5,1,(len(times),96,96),chunks=(len(times),48,32)),\
dims=['time','lat','lon'],\
coords={'time':times,'lat':lats,'lon':lons},\
name='variable')
def calculate_anomaly(ds):
# from https://nbviewer.jupyter.org/gist/rabernat/30e7b747f0e3583b5b776e4093266114
if len(ds['time']) == 0:
return ds
gb = ds.groupby('time.dayofyear')
clim = gb.mean(dim='time')
return gb - clim
ANOM = xr.map_blocks(calculate_anomaly,X)
ANOM_std = ANOM.std(dim='time').load()
For me, running this (but only with the final line included, triggering the compute) results in:
ValueError: Result from applying user function has unexpected coordinate variables {'dayofyear'}.
I have tried supplying a "template" DA with the extra dimension to map_blocks
but this for some reason grinds everything to a halt.
If anybody has run into this before I'd really appreciate a hand! Thanks
Deepak Cherian (Sep 03 2021 at 15:44):
Hi @Dafydd Stephenson
This works for me. I wrote out the anomaly computation to understand what the output looked. like and made minor changes. (drop_vars
and template=X
)
import datetime as dt
import dask.array as da
import numpy as np
import xarray as xr
times = np.arange(
dt.datetime(1992, 2, 11, 0, 0, 0),
dt.datetime(2021, 2, 11, 0, 0, 0),
dt.timedelta(days=1),
)
lats = np.arange(-90, 90, 1.875)
lons = np.arange(-180, 180, 3.75)
X = xr.DataArray(
da.random.normal(5, 1, (len(times), 96, 96), chunks=(len(times), 48, 32)),
dims=["time", "lat", "lon"],
coords={"time": times, "lat": lats, "lon": lons},
name="variable",
)
def calculate_anomaly(ds):
# from https://nbviewer.jupyter.org/gist/rabernat/30e7b747f0e3583b5b776e4093266114
if len(ds["time"]) == 0:
return ds
gb = ds.groupby("time.dayofyear")
clim = gb.mean(dim="time")
# drop this extra dayofyear because it seems useless
return (gb - clim).drop_vars("dayofyear")
# used this to understand the expected output
#template = (X.groupby("time.dayofyear") - X.groupby("time.dayofyear").mean()).chunk(
# {"time": -1}
#)
ANOM = xr.map_blocks(calculate_anomaly, X, template=X)
ANOM_std = ANOM.std(dim='time')
ANOM_std.compute()
Last updated: Jan 30 2022 at 12:01 UTC