Stream: python-questions

Topic: trouble with weighted_temporal_mean function


view this post on Zulip Mira Berdahl (Dec 07 2022 at 23:01):

Hi,
I'm trying to use the following function on some CESM2 atmospheric data.

def weighted_temporal_mean(ds, var):
    """
    weight by days in each month
    """
    # Determine the month length
    month_length = ds.time.dt.days_in_month
    # Calculate the weights
    wgts = month_length.groupby("time.year") / month_length.groupby("time.year").sum()

    # Make sure the weights in each year add up to 1
    np.testing.assert_allclose(wgts.groupby("time.year").sum(xr.ALL_DIMS), 1.0)
    # Subset our dataset for our variable
    obs = ds[var]
    # Setup our masking for nan values
    cond = obs.isnull()
    ones = xr.where(cond, 0.0, 1.0)
    # Calculate the numerator
    obs_sum = (obs * wgts).resample(time="AS").sum(dim="time")
    # Calculate the denominator
    ones_out = (ones * wgts).resample(time="AS").sum(dim="time")
    # Return the weighted average
    return obs_sum / ones_out

However, when I use it:

Annual_AirT_PIControl = weighted_temporal_mean(mfds, "TS")

I get the following error:

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

It seems to appear when taking the .sum in the line:

 obs_sum = (obs * wgts).resample(time="AS").sum(dim="time")

In the past I've been able to use this on other atmospheric and ocean variables without this issue. Any thoughts?

I've attached a screenshot of the details of mfds variable.

Screen-Shot-2022-12-07-at-3.00.01-PM.png

view this post on Zulip Julia Kent (Dec 07 2022 at 23:16):

Hmm I don't have a clear answer, but can you check if you have NaN's in your dataset that would be better handled by a different value?
https://pandas.pydata.org/docs/reference/api/pandas.isna.html
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html
These methods might be useful to you.

Is there a larger error message?

view this post on Zulip Mira Berdahl (Dec 07 2022 at 23:26):

Since I am using xarray, pd.isna won't work on my array.
The xarray equivalent (mfds.TS.isnull) doesn't seem to give any info on whether or not I have nans in my array:

<bound method DataWithCoords.isnull of <xarray.DataArray 'TS' (time: 48000, lat: 96, lon: 144)>
dask.array<concatenate, shape=(48000, 96, 144), dtype=float32, chunksize=(6, 96, 144), chunktype=numpy.ndarray>
Coordinates:
  * lat      (lat) float64 -90.0 -88.11 -86.21 -84.32 ... 84.32 86.21 88.11 90.0
  * lon      (lon) float64 0.0 2.5 5.0 7.5 10.0 ... 350.0 352.5 355.0 357.5
  * time     (time) object 0001-01-16 12:00:00 ... 4000-12-16 12:00:00
Attributes:
    units:         K
    long_name:     Surface temperature (radiative)
    cell_methods:  time: mean>

And no, the only error I get when trying to use the temporal_mean function has to do with not being able to support isnan.

view this post on Zulip Deepak Cherian (Dec 10 2022 at 03:45):

Hmm maybe one of your data variables is of string type or some other non-numeric type (though datetime and cftime should work just fine).

view this post on Zulip Deepak Cherian (Dec 12 2022 at 16:40):

Turns out the bug was in flox v0.3.2 (a package for faster groupby). EIther disable flox with

with xr.set_options(use_flox=False):
    # do groupbys/resamples here

Or update flox to the latest version. We figured this out by seeing that the error was raised within flox and not in xarray


Last updated: May 16 2025 at 17:14 UTC