I'm trying to generate annual timeseries with the cumulative sum of daily runoff from the CESM2-LE.
This must be a common task, but the code below overload memory. Is there a more memory efficient way to try this?
ds.QRUNOFF.groupby('time.year').apply(
lambda x: x.cumsum(dim='time')).compute()
I also found this thread on github @Deepak Cherian, but I can't tell if the issue has been fixed? https://github.com/pydata/xarray/issues/3141
It's still open so it hasn't been fixed.
I would do
# calculate full cumsum cumsum = ds.QRUNOFF.cumsum("time") # may need xr.cftime_range intead year_start = pd.date_range("31-Dec-year0", "31-Dec-year1", freq="A") # index out values at the end of each year and move them to start of the next year values_year_start = cumsum.sel(time=year_start, method="nearest") values_year_start["time"] = value_year_start["time"] + pd.Timedelta("1D") # reindex to full time vector by forward filling reindexed = values_year_start.time.reindex(time=cumsum.time, method="ffill") # get result result = cumum - reindexed
Last updated: May 16 2025 at 17:14 UTC