{ "cells": [ { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "# Calculating Temporal Averages with GeoCAT-comp vs Xarray\n", "\n", "With temporally large datasets, computing seasonal and annual averages are a great ways to summarize the data and make it easier to manage and understand. You may want to take hourly, daily, or monthly data and compute seasonal or annual averages." ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "## Challenges\n", "When using data that has a daily or finer resolution (e.g. hourly), calculating an annual average is simple. Every day and hour has the same length, so an unweighted average will work.\n", "\n", "But when using data that is monthly, things can get a bit tricky. Not every month is created equal. February has 28 or 29 days and March has 31 days. Since monthly data has one value for each month, those points can't be averaged in the usual way. A weighted average is needed.\n", "\n", "While it is tempting to quickly compute monthly to annual averages with `Xarray`'s `resample` or `groupby` functions, we need to be careful to specify the weights. Unfortunately, `Xarray` doesn't support weighted `resample` or `groupby` at the time this post was created, but `geocat-comp.climatology_average` builds upon `Xarray` to compute the weights for you.\n", "\n", "Below is a plot showing the difference between computing the winter average temperature from monthly data using the incorrect unweighted average and the correct weighted average.\n", "\n", "![xarray_vs_geocat_climatology.png](../images/xarray_vs_geocat_climatology.png)" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "## Demonstration\n", "In this post, I'll show how to compute seasonal averages from monthly data the naive way (with unweighted averages) and the correct way (with weighted averages).\n", "\n", "### Imports" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "import cartopy as cart\n", "import geocat.comp as gc\n", "import geocat.viz as gv\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import xarray as xr" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "### Helper function to make all of the plots the same way but with different data" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "def custom_plot(data, title):\n", " # Generate figure (set its size (width, height) in inches)\n", " plt.figure(figsize=(14, 7))\n", "\n", " # Generate axes, using Cartopy\n", " projection = cart.crs.PlateCarree()\n", " ax = plt.axes(projection=projection)\n", " ax.add_feature(cart.feature.LAND, zorder=10, edgecolor='k')\n", "\n", " # Draw coastlines\n", " ax.coastlines()\n", " ax.gridlines(alpha=0.5)\n", "\n", " if 'Difference' in title:\n", " # Contourf-plot data (for filled contours)\n", " p = data.plot.contourf(\n", " ax=ax,\n", " vmin=-0.1,\n", " vmax=0.1,\n", " levels=11,\n", " cmap='bwr',\n", " add_colorbar=False,\n", " transform=projection,\n", " extend='neither',\n", " )\n", "\n", " # Add horizontal colorbar\n", " cbar = plt.colorbar(p, orientation='horizontal', shrink=0.5)\n", " cbar.ax.tick_params(labelsize=14)\n", " cbar.set_ticks(np.linspace(-0.1, 0.1, 6))\n", " else:\n", " # Contourf-plot data (for filled contours)\n", " p = data.plot.contourf(\n", " ax=ax,\n", " vmin=20,\n", " vmax=30,\n", " levels=11,\n", " cmap='inferno',\n", " add_colorbar=False,\n", " transform=projection,\n", " extend='neither',\n", " )\n", "\n", " # Add horizontal colorbar\n", " cbar = plt.colorbar(p, orientation='horizontal', shrink=0.5)\n", " cbar.ax.tick_params(labelsize=14)\n", " cbar.set_ticks(np.linspace(20, 30, 6))\n", "\n", " # Use geocat.viz.util convenience function to set axes tick values\n", " gv.set_axes_limits_and_ticks(\n", " ax,\n", " xlim=(-180, -70),\n", " ylim=(-20, 20),\n", " xticks=np.arange(-180, -70, 10),\n", " yticks=np.arange(-20, 20, 5),\n", " )\n", "\n", " # Use geocat.viz.util convenience function to make plots look like NCL plots by using latitude, longitude tick labels\n", " gv.add_lat_lon_ticklabels(ax)\n", "\n", " # Use geocat.viz.util convenience function to add minor and major tick lines\n", " gv.add_major_minor_ticks(ax, labelsize=12)\n", "\n", " # Use geocat.viz.util convenience function to add titles to left and right of the plot axis.\n", " gv.set_titles_and_labels(\n", " ax,\n", " maintitle=title,\n", " lefttitle=\"Winter Average\",\n", " lefttitlefontsize=16,\n", " righttitle=data.units,\n", " righttitlefontsize=16,\n", " xlabel=\"\",\n", " ylabel=\"\",\n", " )\n", "\n", " # Show the plot\n", " plt.show()" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "### Read in and format data\n", "The data we will be using is a subset from RDA dataset ds277.0 - 'NOAA NCEP Optimum Interpolation Sea Surface Temperature Analysis'. It contains monthly average sea surface temperatures over the eastern equitorial Pacific from 1982 to 1986. We will be computing seasonal averages from this data and comparing the two different methods for doing this calculation." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "ds = xr.open_dataset('603321.sst.sst.mnmean.nc')\n", "ds = ds.sst # Pull out the sea surface temperature data\n", "ds = ds.isel(\n", " time=range(1, 49)\n", ") # Remove the first data point so that we have an equal number of data points from each month" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "### The incorrect way to compute seasonal averages from monthly data" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "It's easy to compute an unweighted average using `xarray` functionality; however, this generates inaccurate results. Here is what the ***incorrect*** way of doing this looks like. Notice that the result has a `season` dimension. These seasons are quarters of the year representing the meteorological seasons. (e.g. December, January, and February for winter)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [ { "data": { "text/html": [ "
<xarray.DataArray 'sst' (season: 4, lat: 40, lon: 110)>\n", "array([[[25.660833, 25.619997, 25.572502, ..., 26.779167, 26.494165,\n", " 26.465834],\n", " [25.914999, 25.876663, 25.829168, ..., 27.035002, 26.868334,\n", " 26.679167],\n", " [26.064165, 26.026667, 25.981667, ..., 27.000832, 26.989166,\n", " 26.899164],\n", " ...,\n", " [28.206667, 28.167498, 28.148336, ..., 23.69333 , 23.731667,\n", " 23.705002],\n", " [27.877502, 27.8225 , 27.790833, ..., 23.844164, 23.805834,\n", " 23.6825 ],\n", " [27.518335, 27.458334, 27.4225 , ..., 23.766668, 23.775 ,\n", " 23.638334]],\n", "\n", " [[27.340834, 27.276665, 27.225832, ..., 28.585833, 28.35 ,\n", " 28.225832],\n", " [27.399168, 27.331665, 27.276667, ..., 28.417498, 28.280005,\n", " 28.213333],\n", " [27.421667, 27.354164, 27.295 , ..., 28.247498, 28.249998,\n", " 28.216665],\n", "...\n", " [28.158333, 28.155 , 28.11167 , ..., 22.666666, 22.707499,\n", " 22.724998],\n", " [27.8175 , 27.8125 , 27.760834, ..., 22.8975 , 22.81 ,\n", " 22.76 ],\n", " [27.436666, 27.419168, 27.356668, ..., 22.8225 , 22.7325 ,\n", " 22.658333]],\n", "\n", " [[27.836668, 27.792501, 27.725 , ..., 28.742498, 28.539167,\n", " 28.454165],\n", " [27.945831, 27.901667, 27.828333, ..., 28.66333 , 28.4975 ,\n", " 28.419165],\n", " [28.007498, 27.959997, 27.886667, ..., 28.534166, 28.47083 ,\n", " 28.421667],\n", " ...,\n", " [26.004168, 25.980001, 26.002497, ..., 18.877499, 19.0925 ,\n", " 19.121664],\n", " [25.459167, 25.431665, 25.445 , ..., 18.905832, 19.098333,\n", " 19.135 ],\n", " [24.853333, 24.831667, 24.836664, ..., 18.72083 , 18.925 ,\n", " 18.979166]]], dtype=float32)\n", "Coordinates:\n", " * lat (lat) float32 19.5 18.5 17.5 16.5 15.5 ... -16.5 -17.5 -18.5 -19.5\n", " * lon (lon) float32 180.5 181.5 182.5 183.5 ... 286.5 287.5 288.5 289.5\n", " * season (season) object 'DJF' 'JJA' 'MAM' 'SON'\n", "Attributes: (12/13)\n", " long_name: Monthly Mean of Sea Surface Temperature\n", " unpacked_valid_range: [-5. 40.]\n", " actual_range: [-1.7999996 35.56862 ]\n", " units: degC\n", " precision: 2\n", " var_desc: Sea Surface Temperature\n", " ... ...\n", " level_desc: Surface\n", " statistic: Mean\n", " parent_stat: Weekly Mean\n", " standard_name: sea_surface_temperature\n", " cell_methods: time: mean (monthly from weekly values interpolate...\n", " valid_range: [-500 4000]