Stream: python-questions

Topic: ValueError using xarray map_blocks to remove climatology


view this post on Zulip Dafydd Stephenson (Aug 25 2021 at 18:45):

Hi all,
I'm trying to calculate anomalies from a daily climatology using xarray's groupby, but without rechunking, which is apparently something xarray struggles with. A workaround suggested by Ryan Abernathey here https://nbviewer.jupyter.org/gist/rabernat/30e7b747f0e3583b5b776e4093266114 uses map_blocks to feed the DataArray to a function which calculates the climatology. My own (near-identical) implementation of this fails, as (I think) within the function a new DataArray is created by groupby with an additional dimension that is not in the input DA. I've created a short example just using random data:

import numpy as np
import xarray as xr
import datetime as dt
import dask.array as da

times=np.arange(dt.datetime(1992,2,11,0,0,0),dt.datetime(2021,2,11,0,0,0),dt.timedelta(days=1))
lats=np.arange(-90,90,1.875)
lons=np.arange(-180,180,3.75)

X=xr.DataArray(\
               da.random.normal(5,1,(len(times),96,96),chunks=(len(times),48,32)),\
               dims=['time','lat','lon'],\
               coords={'time':times,'lat':lats,'lon':lons},\
               name='variable')

def calculate_anomaly(ds):
    # from https://nbviewer.jupyter.org/gist/rabernat/30e7b747f0e3583b5b776e4093266114
    if len(ds['time']) == 0:
        return ds
    gb = ds.groupby('time.dayofyear')
    clim = gb.mean(dim='time')
    return gb - clim

ANOM = xr.map_blocks(calculate_anomaly,X)
ANOM_std = ANOM.std(dim='time').load()

For me, running this (but only with the final line included, triggering the compute) results in:

ValueError: Result from applying user function has unexpected coordinate variables {'dayofyear'}.

I have tried supplying a "template" DA with the extra dimension to map_blocks but this for some reason grinds everything to a halt.
If anybody has run into this before I'd really appreciate a hand! Thanks

view this post on Zulip Deepak Cherian (Sep 03 2021 at 15:44):

Hi @Dafydd Stephenson

This works for me. I wrote out the anomaly computation to understand what the output looked. like and made minor changes. (drop_vars and template=X)

import datetime as dt

import dask.array as da
import numpy as np
import xarray as xr

times = np.arange(
    dt.datetime(1992, 2, 11, 0, 0, 0),
    dt.datetime(2021, 2, 11, 0, 0, 0),
    dt.timedelta(days=1),
)
lats = np.arange(-90, 90, 1.875)
lons = np.arange(-180, 180, 3.75)

X = xr.DataArray(
    da.random.normal(5, 1, (len(times), 96, 96), chunks=(len(times), 48, 32)),
    dims=["time", "lat", "lon"],
    coords={"time": times, "lat": lats, "lon": lons},
    name="variable",
)


def calculate_anomaly(ds):
    # from https://nbviewer.jupyter.org/gist/rabernat/30e7b747f0e3583b5b776e4093266114
    if len(ds["time"]) == 0:
        return ds
    gb = ds.groupby("time.dayofyear")
    clim = gb.mean(dim="time")
    # drop this extra dayofyear because it seems useless
    return (gb - clim).drop_vars("dayofyear")


#  used this to understand the expected output
#template = (X.groupby("time.dayofyear") - X.groupby("time.dayofyear").mean()).chunk(
#    {"time": -1}
#)

ANOM = xr.map_blocks(calculate_anomaly, X, template=X)
ANOM_std = ANOM.std(dim='time')
ANOM_std.compute()

Last updated: Jan 30 2022 at 12:01 UTC