Stream: python-questions
Topic: ufunc for scipy.stats
Will Wieder (Jul 01 2020 at 12:34):
Does anyone have an example of getting ufunc to work for scipy.stats? Specifically I'm trying to calculate a correlation coefficient (r) and p-values for two xarray data arrays. This example is good for for the correlation coefficient, http://xarray.pydata.org/en/stable/dask.html, but I'd like to just use scipy.stats
Will Wieder (Jul 01 2020 at 12:50):
xarray even has a .corr function now, but still not sure how to get significance levels from this?
Keith Lindsay (Jul 01 2020 at 12:59):
@Will Wieder , I use xr.apply_ufunc
to compute regression slope in this code. I'm passing regression_slope_np_1d_2d
, a function I wrote that works on plain numpy arrays, to xr.apply_ufunc
. This might serve as a template for you, where you would similarly pass a scipy.stats function to xr.apply_ufunc
.
Deepak Cherian (Jul 01 2020 at 14:49):
@Will Wieder do you have an example notebook with your attempt? One complication here is that apply_ufunc
cannot deal with multiple return values with dask arrays yet. so you'll have to write a wrapper function that stacks r
and p
into a (2xN) array and return that. And then unpack it to two dataarrays.
This example (https://xarray.pydata.org/en/stable/examples/apply_ufunc_vectorize_1d.html) discusses a lot of the errors you may run in to while doing this.
Riley Brady (Jul 01 2020 at 15:45):
@Will Wieder --
(1) A non-apply-ufunc solution would be to use xskillscore
(https://github.com/raybellwaves/xskillscore/). I've done a lot of development here, and we have it working pretty efficiently. We use apply_ufunc
and pure numpy functions under the hood. And it works with dask
. So you could just do r = xskillscore.pearson_r(da1, da2, 'time')
to get gridded results of correlation coefficients. Then p = xskillscore.pearson_r_p_value(da1, da2, 'time')
to get the significance values.
(2) Getting at what @Deepak Cherian is saying, you can get around the multiple output issue with dask
/apply_ufunc
with the following:
from scipy import stats def new_linregress(x, y): # Wrapper around scipy linregress to use in apply_ufunc slope, intercept, r_value, p_value, std_err = stats.linregress(x, y) return np.array([slope, intercept, r_value, p_value, std_err]) # return a new DataArray stats = xr.apply_ufunc(new_linregress, ds[x], ds[y], input_core_dims=[['time'], ['time']], output_core_dims=[["parameter"]], vectorize=True, dask="parallelized", output_dtypes=['float64'], output_sizes={"parameter": 5}, )
This will return a dataset with dimension parameter
that has slope, intercept, etc. in it.
Will Wieder (Jul 01 2020 at 16:43):
Thanks for the suggestions. I'll give this a try. For the the purposes of my current project I just manually calculated the critical values for p<0.05 and masked out grid cells that didn't meet this criteria.
Will Wieder (Jul 01 2020 at 17:27):
@Riley Brady thanks for this example. It's really helpful! Can more examples like this be provided in the xarray documentation?!
Last updated: Jan 30 2022 at 12:01 UTC