Stream: xarray

Topic: groupyby().cumsum()


view this post on Zulip Will Wieder (May 17 2021 at 16:01):

I'm trying to generate annual timeseries with the cumulative sum of daily runoff from the CESM2-LE.
This must be a common task, but the code below overload memory. Is there a more memory efficient way to try this?

ds.QRUNOFF.groupby('time.year').apply(
lambda x: x.cumsum(dim='time')).compute()

view this post on Zulip Will Wieder (May 17 2021 at 16:53):

I also found this thread on github @Deepak Cherian, but I can't tell if the issue has been fixed? https://github.com/pydata/xarray/issues/3141

view this post on Zulip Deepak Cherian (May 18 2021 at 01:52):

It's still open so it hasn't been fixed.

I would do

# calculate full cumsum
cumsum = ds.QRUNOFF.cumsum("time")
# may need xr.cftime_range intead
year_start = pd.date_range("31-Dec-year0", "31-Dec-year1", freq="A")
# index out values at the end of each year and move them to start of the next year
values_year_start = cumsum.sel(time=year_start, method="nearest")
values_year_start["time"]  = value_year_start["time"] + pd.Timedelta("1D")
# reindex to full time vector by forward filling
reindexed = values_year_start.time.reindex(time=cumsum.time, method="ffill")
# get result
result = cumum - reindexed

Last updated: May 16 2025 at 17:14 UTC