Calculating Temporal Averages with GeoCAT-comp vs Xarray#

With temporally large datasets, computing seasonal and annual averages are a great ways to summarize the data and make it easier to manage and understand. You may want to take hourly, daily, or monthly data and compute seasonal or annual averages.

Challenges#

When using data that has a daily or finer resolution (e.g. hourly), calculating an annual average is simple. Every day and hour has the same length, so an unweighted average will work.

But when using data that is monthly, things can get a bit tricky. Not every month is created equal. February has 28 or 29 days and March has 31 days. Since monthly data has one value for each month, those points can’t be averaged in the usual way. A weighted average is needed.

While it is tempting to quickly compute monthly to annual averages with Xarray’s resample or groupby functions, we need to be careful to specify the weights. Unfortunately, Xarray doesn’t support weighted resample or groupby at the time this post was created, but geocat-comp.climatology_average builds upon Xarray to compute the weights for you.

Below is a plot showing the difference between computing the winter average temperature from monthly data using the incorrect unweighted average and the correct weighted average.

Demonstration#

In this post, I’ll show how to compute seasonal averages from monthly data the naive way (with unweighted averages) and the correct way (with weighted averages).

Imports#

import cartopy as cart
import geocat.comp as gc
import geocat.viz as gv
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr

Helper function to make all of the plots the same way but with different data#

def custom_plot(data, title):
    # Generate figure (set its size (width, height) in inches)
    plt.figure(figsize=(14, 7))

    # Generate axes, using Cartopy
    projection = cart.crs.PlateCarree()
    ax = plt.axes(projection=projection)
    ax.add_feature(cart.feature.LAND, zorder=10, edgecolor='k')

    # Draw coastlines
    ax.coastlines()
    ax.gridlines(alpha=0.5)

    if 'Difference' in title:
        # Contourf-plot data (for filled contours)
        p = data.plot.contourf(
            ax=ax,
            vmin=-0.1,
            vmax=0.1,
            levels=11,
            cmap='bwr',
            add_colorbar=False,
            transform=projection,
            extend='neither',
        )

        # Add horizontal colorbar
        cbar = plt.colorbar(p, orientation='horizontal', shrink=0.5)
        cbar.ax.tick_params(labelsize=14)
        cbar.set_ticks(np.linspace(-0.1, 0.1, 6))
    else:
        # Contourf-plot data (for filled contours)
        p = data.plot.contourf(
            ax=ax,
            vmin=20,
            vmax=30,
            levels=11,
            cmap='inferno',
            add_colorbar=False,
            transform=projection,
            extend='neither',
        )

        # Add horizontal colorbar
        cbar = plt.colorbar(p, orientation='horizontal', shrink=0.5)
        cbar.ax.tick_params(labelsize=14)
        cbar.set_ticks(np.linspace(20, 30, 6))

    # Use geocat.viz.util convenience function to set axes tick values
    gv.set_axes_limits_and_ticks(
        ax,
        xlim=(-180, -70),
        ylim=(-20, 20),
        xticks=np.arange(-180, -70, 10),
        yticks=np.arange(-20, 20, 5),
    )

    # Use geocat.viz.util convenience function to make plots look like NCL plots by using latitude, longitude tick labels
    gv.add_lat_lon_ticklabels(ax)

    # Use geocat.viz.util convenience function to add minor and major tick lines
    gv.add_major_minor_ticks(ax, labelsize=12)

    # Use geocat.viz.util convenience function to add titles to left and right of the plot axis.
    gv.set_titles_and_labels(
        ax,
        maintitle=title,
        lefttitle="Winter Average",
        lefttitlefontsize=16,
        righttitle=data.units,
        righttitlefontsize=16,
        xlabel="",
        ylabel="",
    )

    # Show the plot
    plt.show()

Read in and format data#

The data we will be using is a subset from RDA dataset ds277.0 - ‘NOAA NCEP Optimum Interpolation Sea Surface Temperature Analysis’. It contains monthly average sea surface temperatures over the eastern equitorial Pacific from 1982 to 1986. We will be computing seasonal averages from this data and comparing the two different methods for doing this calculation.

ds = xr.open_dataset('603321.sst.sst.mnmean.nc')
ds = ds.sst  # Pull out the sea surface temperature data
ds = ds.isel(
    time=range(1, 49)
)  # Remove the first data point so that we have an equal number of data points from each month

So what’s the difference?#

It is hard to see the difference between the correct and incorrect ways of caluclating the seasonal averages. If we plot the difference between the two results, the computational errors become easier to see.

diff = seasonal_average_weighted_correctly - seasonal_average_weighted_incorrectly
diff = diff.assign_attrs({'units': 'delta degC'})  # provide the units

custom_plot(diff.isel(season=0), 'Difference: Correct Average - Incorrect Average')

../../../_images/6bf3ab5bc6e213f67cb4b34abc545750738e02516c83e10a58174b35b6cbf21c.png

What we learned#

The incorrect averages deviate from the correct averages by up to 0.1 degrees Celsius in this example, but it wasn’t obvious before we computed the difference! While these differences are very small, they aren’t small enough to be neglgible for scientific purposes. It’s really easy to assume that an unweighted average will give you the correct climatology values and end up with hard to find errors in your calculations.

This example covered the correct way to compute seasonal climatologies from monthly data using GeoCAT-comp’s climatology_average and the discrepancies of using unweighted averages. Not every calculation needs a weighted average, but be sure to consider what kind of average you need before doing your calculations to avoid a debugging headache!

Additional Info#

If you want finer control over the averaging than what GeoCAT-comp allows with climatology_average, check out this post from Xarray about computing seasonal averages from monthly means. This tutorial is a detailed explaination of the process that climatology_average is based on. We also discussed this same computational challenge in an older blog post before climatology_average was implemented that may be of interest.

Preparation for a (Re)Introduction to Earth System Data Science (ESDS) Across NCAR/UCAR/UCP Regridding using xESMF and an existing weights file

29 November 2022

Recent Posts

Archives