groupyby().cumsum() · xarray · Zulip Chat Archive

I'm trying to generate annual timeseries with the cumulative sum of daily runoff from the CESM2-LE.
This must be a common task, but the code below overload memory. Is there a more memory efficient way to try this?

ds.QRUNOFF.groupby('time.year').apply(
lambda x: x.cumsum(dim='time')).compute()

Will Wieder (May 17 2021 at 16:53):

Deepak Cherian (May 18 2021 at 01:52):

# calculate full cumsum
cumsum = ds.QRUNOFF.cumsum("time")
# may need xr.cftime_range intead
year_start = pd.date_range("31-Dec-year0", "31-Dec-year1", freq="A")
# index out values at the end of each year and move them to start of the next year
values_year_start = cumsum.sel(time=year_start, method="nearest")
values_year_start["time"]  = value_year_start["time"] + pd.Timedelta("1D")
# reindex to full time vector by forward filling
reindexed = values_year_start.time.reindex(time=cumsum.time, method="ffill")
# get result
result = cumum - reindexed

Stream: xarray

Topic: groupyby().cumsum()

Will Wieder (May 17 2021 at 16:01):

Will Wieder (May 17 2021 at 16:53):

Deepak Cherian (May 18 2021 at 01:52):