Hi,
I'm trying to use the following function on some CESM2 atmospheric data.
def weighted_temporal_mean(ds, var):
"""
weight by days in each month
"""
# Determine the month length
month_length = ds.time.dt.days_in_month
# Calculate the weights
wgts = month_length.groupby("time.year") / month_length.groupby("time.year").sum()
# Make sure the weights in each year add up to 1
np.testing.assert_allclose(wgts.groupby("time.year").sum(xr.ALL_DIMS), 1.0)
# Subset our dataset for our variable
obs = ds[var]
# Setup our masking for nan values
cond = obs.isnull()
ones = xr.where(cond, 0.0, 1.0)
# Calculate the numerator
obs_sum = (obs * wgts).resample(time="AS").sum(dim="time")
# Calculate the denominator
ones_out = (ones * wgts).resample(time="AS").sum(dim="time")
# Return the weighted average
return obs_sum / ones_out
However, when I use it:
Annual_AirT_PIControl = weighted_temporal_mean(mfds, "TS")
I get the following error:
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
It seems to appear when taking the .sum in the line:
obs_sum = (obs * wgts).resample(time="AS").sum(dim="time")
In the past I've been able to use this on other atmospheric and ocean variables without this issue. Any thoughts?
I've attached a screenshot of the details of mfds variable.
Screen-Shot-2022-12-07-at-3.00.01-PM.png
Hmm I don't have a clear answer, but can you check if you have NaN's in your dataset that would be better handled by a different value?
https://pandas.pydata.org/docs/reference/api/pandas.isna.html
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html
These methods might be useful to you.
Is there a larger error message?
Since I am using xarray, pd.isna won't work on my array.
The xarray equivalent (mfds.TS.isnull) doesn't seem to give any info on whether or not I have nans in my array:
<bound method DataWithCoords.isnull of <xarray.DataArray 'TS' (time: 48000, lat: 96, lon: 144)>
dask.array<concatenate, shape=(48000, 96, 144), dtype=float32, chunksize=(6, 96, 144), chunktype=numpy.ndarray>
Coordinates:
* lat (lat) float64 -90.0 -88.11 -86.21 -84.32 ... 84.32 86.21 88.11 90.0
* lon (lon) float64 0.0 2.5 5.0 7.5 10.0 ... 350.0 352.5 355.0 357.5
* time (time) object 0001-01-16 12:00:00 ... 4000-12-16 12:00:00
Attributes:
units: K
long_name: Surface temperature (radiative)
cell_methods: time: mean>
And no, the only error I get when trying to use the temporal_mean function has to do with not being able to support isnan.
Hmm maybe one of your data variables is of string type or some other non-numeric type (though datetime and cftime should work just fine).
Turns out the bug was in flox v0.3.2 (a package for faster groupby). EIther disable flox with
with xr.set_options(use_flox=False):
# do groupbys/resamples here
Or update flox to the latest version. We figured this out by seeing that the error was raised within flox and not in xarray
Last updated: May 16 2025 at 17:14 UTC