Stream: python-questions

Topic: xarray: seasonal-mean timeseries from a monthly timeseries


view this post on Zulip Stephen Yeager (Nov 22 2020 at 21:32):

I'm struggling to find clear web documentation on how to convert a monthly DataArray (time: 720) into a yearly DataArray(time:60) that represents a seasonal average ('DJF' or 'JFM', etc). Apparently, this requires using the resample method, but I find the documentation of xarray.Dataset.resample to be unhelpful. Can anyone share a clear example of how this is done?

view this post on Zulip Elizabeth Maroon (Nov 22 2020 at 21:50):

I'm sure there's a slicker way to do this with resample and/or groupby('time.season'), but here's how I've done this in the past using rolling to calculate running 3-month-means then grabbing every 12th mean:

startmonth=11 #for DJF, if first month = January; D=11 w/0-indexing
endmonth=len(ds['time'])

djfmean=ds.rolling(time=3).mean().isel(time=slice(startmonth,endmonth,12))

view this post on Zulip Elizabeth Maroon (Nov 22 2020 at 22:04):

Oops - look like I had the startmonth off; for DJF should be 13 (for February). By default, rolling using the right side of the window as the time index.

view this post on Zulip Stephen Yeager (Nov 22 2020 at 23:19):

I came up with the following which seems to work:

def jfm_mean(ds):
    month_length = ds.time.dt.days_in_month
    result = ((ds * month_length).resample(time='QS-JAN').sum() /
          month_length.resample(time='QS-JAN').sum())
    return result.sel(time=result['time.month']==1)

view this post on Zulip Elizabeth Maroon (Nov 22 2020 at 23:24):

Appears to work on a test dataset over here.

view this post on Zulip Mira Berdahl (Dec 16 2022 at 23:11):

I'm following up on this old post, as I've tried following along with @Stephen Yeager 's method, but getting an outcome I don't understand (it takes a long time to run, and then the output is an array with time=0, size 0. )

def djf_mean(ds):
    # Make a DataArray with the number of days in each month, size = len(time)
    month_length = ds.time.dt.days_in_month

    result = ((ds * month_length).resample(time='QS-DEC').sum() /
          month_length.resample(time='QS-DEC').sum())
    return result.sel(time=result['time.month']==1)

I've taken a screenshot of my input array into this function (2m air temp, 1000 years of monthly data), as well as the result after applying the function. I think the issue arises in the calculation of the numerator, but when I break it down into smaller pieces to test, its not obvious what is going wrong, other than this operation produces an array of size 0 even though both ds and month_length look good:

(ds * month_length).resample(time='QS-DEC').sum()

Thoughts? Screen-Shot-2022-12-16-at-3.10.01-PM-2.png

view this post on Zulip Heather Craker (Dec 19 2022 at 16:08):

@Mira Berdahl since the older post was created, GeoCAT-comp has created some functions that I believe do what you want. Are you looking for climatological seasonal averages across all years of your data or do you want the average for the seasons in each year?

view this post on Zulip Heather Craker (Dec 19 2022 at 16:38):

Depending on what kind of average you want, you can use either climatology_average or calendar_average. I've linked to the documentation page for both

view this post on Zulip Mira Berdahl (Dec 19 2022 at 18:42):

@Heather Craker Thanks for the tips on using GeoCAT. I was looking for the calendar_average method, so that's good to know it exists. In the meantime, I found my mistake in the function I was trying to use, the last line should read :

return result.sel(time=result['time.moth']==12)

Since I'm looking for DJF instead of JFM average.

view this post on Zulip Heather Craker (Dec 19 2022 at 18:51):

Great! I'm glad you found a solution! If you need any help with geocat functions, just let me know! You should be able to ping @geocat to get help from any of us on the team

view this post on Zulip David Bailey (Jan 24 2024 at 20:31):

Hey Steve,

Is there documentation somewhere about the time='QS-JAN' option to resample? Does this syntax always mean JFM?

Dave

Stephen Yeager said:

I came up with the following which seems to work:

def jfm_mean(ds):
    month_length = ds.time.dt.days_in_month
    result = ((ds * month_length).resample(time='QS-JAN').sum() /
          month_length.resample(time='QS-JAN').sum())
    return result.sel(time=result['time.month']==1)

view this post on Zulip Michael Levy (Jan 24 2024 at 21:01):

@David Bailey it's documented with xr.Dataset.resample() -- the QS part means "quarterly", and the -JAN part means "starting on January 1"

view this post on Zulip Katelyn FitzGerald (Jan 24 2024 at 21:01):

There's a not so obvious (took me a bit to find it a while back, but is really helpful) reference and link to the Pandas time/offset/freq aliases in the xarray.resample docs. This should help: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#anchored-offsets and there's more general info on that page.

view this post on Zulip Michael Levy (Jan 24 2024 at 21:04):

@Katelyn FitzGerald that's far more extensive documentation! So QE-[mon] instead of QS-[mon] would be "quarterly, ending in [mon]" (presumably on the ladst of the month?); I was wondering why the S was included in the argument :)

view this post on Zulip David Bailey (Jan 24 2024 at 21:13):

This is all very helpful. It is still not quite was I was hoping for. So when I do the following:

month_length = ds1.time.dt.days_in_month
weights_season = month_length.groupby(seasons) / month_length.groupby(seasons).sum()
weights_monthly = month_length.groupby("time.year") / month_length.groupby("time.year").sum()

aice1_seas = (ds1['aice'] * weights_monthly).resample(time="QS-JAN").sum(dim="time")

I get an array that is 152 x ni x nj where it is 38 years with 4 seasons. I was kind of hoping for a four dimensional array that was 4 x 38 x ni x nj. I had been trying to use groupby, but it produces just 4 time dimensions (seasons) and averages over all the years. I guess I can work with the array and just use the step of 4.

view this post on Zulip Katelyn FitzGerald (Jan 24 2024 at 22:46):

If you're looking to average over the seasons of all years, another option is something like DataArray.groupby("time.season").mean() (maybe this is what you're talking about).

I'd have to think a bit more about season/year/i/j dim solution. I think you'd need to reshape/reindex.

view this post on Zulip David Bailey (Jan 24 2024 at 22:49):

Right. I was trying to do that with the weights_season example here, but with custom seasons.

months = ds1.time.dt.month
seasons = xr.full_like(months, fill_value="none", dtype="U4")
seasons.name = "season"
seasons[months.isin([1, 2, 3])] = "JFM"
seasons[months.isin([4, 5, 6])] = "AMJ"
seasons[months.isin([7, 8, 9])] = "JAS"
seasons[months.isin([10, 11, 12])] = "OND"

I think the resample is pretty much what I want.

Dave


Last updated: May 16 2025 at 17:14 UTC