I've got some old but useful code that uses esmlab
. I noticed today that it is failing with python 3.10:
ImportError: cannot import name 'Mapping' from 'collections'
That could be fixed by changing the import statement to from collections.abc import Mapping
, but the esmlab project has been archived , and the repo says the functionality is now in GeoCAT.
I'm very happy to use GeoCAT, but I'm not totally convinced that the functionality is actually there. The functions I'm using are just the weighted statistics. I know that xarray can do weighted sum/mean/std/var, but not correlation. I don't see this in GeoCAT. I can make my own weighted correlation, but it's convenient to have it in libraries I'm already importing.
So my question, which is probably for @Anderson Banihirwe and/or @Orhan Eroglu and/or @Deepak Cherian , is whether the recommended path is to use the Xarray weighted class to do weighted computations, and whether GeoCAT already has or plans to have additional weighted computations such as correlation?
Hi @Brian Medeiros thanks a lot for bringing this to attention! Here are a few background info and suggestions:
The climatology functionalities of ESMLab have been refactored and incorporated into GeoCAT. Please see the functions under the geocat.comp.climatologies namespace in the GeoCAT-comp User API. This work was implemented by @Heather Craker from GeoCAT.
- I am not sure though if you were using those basic statistics functions from ESMlab for something else or any climatology computations of your own.
However, I find adding the weighted correlation (and other potential functions that are not Xarray's builtin functions) into GeoCAT much useful for its users. The usability of such functions was the motivation behind incorporating ESMLab functionality into GeoCAT-comp. @Heather Craker , any thoughts or existing plans on this, if any? Don't worry if not; we can definitely discuss a plan for this in the GeoCAT meeting.
In this case I wasn't using the climatology functions, but I did see those, and they seem really useful!
@Orhan Eroglu I personally am not working on adding weighted correlation functionality to GeoCAT while I'm still working part time; however, I can make it a priority once I start working full time in the summer. @Brian Medeiros if you need the functionality sooner rather than later, I recommend opening an issue on our GeoCAT-comp repository requesting this functionality with the expected input and output. That way someone on the team can work on it they are able even if I cannot do so at the moment!
Thanks @Heather Craker -- I'm fine for the time being. I just made my own function to do the weighted correlation, leveraging the weighted class in xarray to do most of the work:
def _weighted_corr(x,y, w, dim=None):
xw = x.weighted(w)
yw = y.weighted(y)
# weighted covariance:
devx = x - xw.mean(dim=dim)
devy = y - yw.mean(dim=dim)
devxy = devx * devy
covxy = devxy.weighted(w).mean(dim=dim)
denom = np.sqrt(xw.var()*yw.var())
return covxy / denom
That's good to hear! When I do get around to adding this to GeoCAT, would you mind if I base it off of your work? I'll credit you of course.
Of course, feel free to use it! I should admit that I basically copied that from ESMLab's code, so they should get the credit.
xskillscore provides weighted correlation metrics: https://xskillscore.readthedocs.io/en/stable/api.html#correlation-metrics
Thanks, @Deepak Cherian . I have used xskillscore indirectly before and it seemed good. I'll look at bringing that into my scripts.
Brian Medeiros said:
Thanks Heather Craker -- I'm fine for the time being. I just made my own function to do the weighted correlation, leveraging the weighted class in xarray to do most of the work:
Hi Brian. Now that I'm back full time, I'm working on adding this functionality to GeoCAT. Currently, I'm comparing the speeds of different functions that already exists for weighted correlations. Do you have sample input and output data that I can use to run these performance tests?
Hi @Brian Medeiros. If you are able to share with me the inputs and desired outputs you were getting from your function, that would help me add this functionality to GeoCAT. If you can't, a description of the size and shape of the inputs and outputs would help as well.
Hi @Heather Craker
Typically the inputs would be (latitude, longitude) arrays being correlated and the weights would be for area of grid cells (i.e., cos(latitude)). In that case, the result would be a scalar number, the correlation coefficient (possibly with auxiliary data like statistical significance). Often there could be additional dimensions, and the expected output is to reduce over the spatial dimensions. Examples would include:
Does that make sense? I can provide a specific example using data on glade if that is helpful.
Thank you Brian that helps clarify exactly what you're wanting this function to do with the extra dimensions. Example data from glade would also help. If you can send me a file path, that should work!
Hi @Brian Medeiros. I'm partway through creating a proof of concept and I'd like to do some more thorough testing. If you could send me file paths to example inputs and outputs at your earliest convenience, that would be immensely helpful. Thank you!
I'll go look for some examples now, and send them your way shortly.
Just curious: Is there something that xskillscore cannot do for a weighted correlation?
That's what I'm trying to check right now. If xskillscore does everything we want it to do with sufficient speed, then it will be wrapped into GeoCAT-comp for convenience.
If it doesn't, then the GeoCAT team can then design something more appropriate. Once I get the example data and can more clearly see how Brian wants the xarray dimensions to be reduced/handled, then I can see if xskillscore is sufficient
Sorry ... got distracted with other things. Just some data to start with might be climatology files generated as part of the atmosphere diagnostics:
location: /glade/scratch/brianpm/cam_diag_climo/files
There are 2 cases in that directory:
- `b.e20.BHIST.f09_g16.20thC.125.02`
- `b.e20.BHIST.f09_g17.20thC.297_05`
That each have several variables. Examples of model-to-model correlations would be the spatial correlations of any of them. Example:
COR[b.e20.BHIST.f09_g16.20thC.125.02/b.e20.BHIST.f09_g16.20thC.125.02_PRECC_climo.nc, b.e20.BHIST.f09_g17.20thC.297_05/b.e20.BHIST.f09_g17.20thC.297_05_PRECC_climo.nc]
Where the variable to be correlated is PRECC
in each of those files. It is shaped (12, 192, 288), so I'd expect to be able to get either the total correlation by month and space (i.e., a scalar value) or an array of 12 values that give the spatial correlation for each month, depending on some kwarg like "dim".
I also was messing around with the weighted correlation in this notebook:
/glade/u/home/brianpm/Code/hacknostics/Notebooks/weighted_correlation.ipynb
My previous function has something wrong with it (but I don't know exactly what it is). The XSkillScore version seems to work correctly, and seems fast for this small problem.
Thanks Brian! I'm glad to hear that XSkillScore is working for you. I'll see if it still performs fast enough with larger datasets and if it can benefit from parallelization.
Last updated: May 16 2025 at 17:14 UTC