Comparing Atmospheric Model Output with Observations Using Intake-ESM#
Comparing models and observations is a critical component of climate diagnostic packages. This process can be challenging though - given the number of observational datasets to compare against, and the difference in spatiotemporal resolutions. In the previous iteration of the diagnostics package used for atmospheric data from the Community Earth System Model (CESM), they used pre-computed, observational datasets stored in a directory on the GLADE filesystem (/glade/p/cesm/amwg/amwg_diagnostics/obs_data
)
Within this example, we walk though generating an intake-esm
catalog from the observational data, reading in CESM data, and compare models and observations
Imports#
import ast
import glob
import pathlib
import traceback
import warnings
from datetime import datetime
import cartopy.crs as ccrs
import geocat.comp
import geoviews.feature as gf
import holoviews as hv
import hvplot
import hvplot.xarray
import intake
import matplotlib.pyplot as plt
import pandas as pd
import xarray as xr
from distributed import Client
from ecgtools import Builder
from ecgtools.builder import INVALID_ASSET, TRACEBACK
from ncar_jobqueue import NCARCluster
hv.extension('bokeh')
Build an Intake-ESM
catalog from the observational dataset#
Before we jump into building the catalog, let’s take a look at what data we are working with
Investigate the File Structure and Sample Dataset#
files = sorted(glob.glob('/glade/p/cesm/amwg/amwg_diagnostics/obs_data/*'))
files[::20]
['/glade/p/cesm/amwg/amwg_diagnostics/obs_data/AIRS_01_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/ARM_annual_cycle_twp_c2_cmbe_sound_p_f.cdf',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/CERES-EBAF_01_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/CERES2_04_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/CERES_07_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/CLOUDSATCOSP_07_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/CLOUDSAT_10_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/ECMWF_09_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/EP.ERAI_DJF_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/ERAI_04_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/ERBE_07_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/ERS_12_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/GPCP_JJA_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/HadISST_CL_03_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/HadISST_PD_02_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/HadISST_PI_05_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/ISCCPCOSP_07_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/ISCCPFD_07_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/ISCCP_12_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/JRA25_SON_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/LEGATES_04_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/MERRAW_19x2_09_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/MERRA_12_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/MISRCOSP_JJA_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/MODIS_ANN_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/NVAP_03_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/PRECL_07_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/SSMI_09_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/SSMI_SEAICE_DJF_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/TRMM_MAM_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/WARREN_DJF_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/WILLMOTT_04_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/XIEARKIN_09_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/mlsg_10_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/mlso_ANN_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/mlsw_MAM_climo.nc']
Observational datasetsets in this directory follow the convention source_(month or season)_climo.nc
.
Let’s open up one of those datasets
ds = xr.open_dataset('/glade/p/cesm/amwg/amwg_diagnostics/obs_data/CERES-EBAF_01_climo.nc')
ds
<xarray.Dataset> Dimensions: (lon: 360, lat: 180, time: 1) Coordinates: * lon (lon) float32 0.5 1.5 2.5 3.5 4.5 ... 355.5 356.5 357.5 358.5 359.5 * lat (lat) float32 -89.5 -88.5 -87.5 -86.5 -85.5 ... 86.5 87.5 88.5 89.5 * time (time) float32 1.0 Data variables: SOLIN (time, lat, lon) float32 495.0 495.0 495.0 495.0 ... 0.0 0.0 0.0 FLUT (time, lat, lon) float32 187.4 187.4 187.4 ... 170.7 170.7 170.7 FLUTC (time, lat, lon) float32 188.8 188.8 188.8 ... 178.9 178.9 178.9 FSNTOA (time, lat, lon) float32 147.8 147.8 147.8 ... -0.049 -0.049 -0.049 FSNTOAC (time, lat, lon) float32 150.9 150.9 150.9 ... -0.006 -0.006 -0.006 SWCF (time, lat, lon) float32 -3.149 -3.149 -3.149 ... -0.043 -0.043 LWCF (time, lat, lon) float32 1.391 1.391 1.391 ... 8.272 8.272 8.272 RESTOA (time, lat, lon) float32 -39.6 -39.6 -39.6 ... -170.7 -170.7 -170.7 ALBEDO (time, lat, lon) float32 0.7015 0.7015 0.7015 ... nan nan nan ALBEDOC (time, lat, lon) float32 0.6951 0.6951 0.6951 ... nan nan nan gw (lat) float64 0.0001523 0.0004569 0.0007613 ... 0.0004569 0.0001523 Attributes: version: This is version 2.8: March 7, 2014 institution: NASA Langley Research Center comment: Data is from East to West and South to North. Climat... title: CERES EBAF (Energy Balanced and Filled) Fluxes. Mont... AMWG_author: Cecile Hannay AMWG_creation_date: Thu Jul 24 16:08:10 MDT 2014 for AMWG package history: Thu Jul 24 16:08:10 2014: ncks -A -v gw CERES2_01_cl... NCO: 20140724
- lon: 360
- lat: 180
- time: 1
- lon(lon)float320.5 1.5 2.5 ... 357.5 358.5 359.5
- long_name :
- longitude
- standard_name :
- longitude
- units :
- degrees_east
- valid_range :
- [-180. 360.]
array([ 0.5, 1.5, 2.5, ..., 357.5, 358.5, 359.5], dtype=float32)
- lat(lat)float32-89.5 -88.5 -87.5 ... 88.5 89.5
- long_name :
- latitude
- standard_name :
- latitude
- units :
- degrees_north
- valid_range :
- [-90. 90.]
- actual_range :
- [-89.5 89.5]
array([-89.5, -88.5, -87.5, -86.5, -85.5, -84.5, -83.5, -82.5, -81.5, -80.5, -79.5, -78.5, -77.5, -76.5, -75.5, -74.5, -73.5, -72.5, -71.5, -70.5, -69.5, -68.5, -67.5, -66.5, -65.5, -64.5, -63.5, -62.5, -61.5, -60.5, -59.5, -58.5, -57.5, -56.5, -55.5, -54.5, -53.5, -52.5, -51.5, -50.5, -49.5, -48.5, -47.5, -46.5, -45.5, -44.5, -43.5, -42.5, -41.5, -40.5, -39.5, -38.5, -37.5, -36.5, -35.5, -34.5, -33.5, -32.5, -31.5, -30.5, -29.5, -28.5, -27.5, -26.5, -25.5, -24.5, -23.5, -22.5, -21.5, -20.5, -19.5, -18.5, -17.5, -16.5, -15.5, -14.5, -13.5, -12.5, -11.5, -10.5, -9.5, -8.5, -7.5, -6.5, -5.5, -4.5, -3.5, -2.5, -1.5, -0.5, 0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5, 10.5, 11.5, 12.5, 13.5, 14.5, 15.5, 16.5, 17.5, 18.5, 19.5, 20.5, 21.5, 22.5, 23.5, 24.5, 25.5, 26.5, 27.5, 28.5, 29.5, 30.5, 31.5, 32.5, 33.5, 34.5, 35.5, 36.5, 37.5, 38.5, 39.5, 40.5, 41.5, 42.5, 43.5, 44.5, 45.5, 46.5, 47.5, 48.5, 49.5, 50.5, 51.5, 52.5, 53.5, 54.5, 55.5, 56.5, 57.5, 58.5, 59.5, 60.5, 61.5, 62.5, 63.5, 64.5, 65.5, 66.5, 67.5, 68.5, 69.5, 70.5, 71.5, 72.5, 73.5, 74.5, 75.5, 76.5, 77.5, 78.5, 79.5, 80.5, 81.5, 82.5, 83.5, 84.5, 85.5, 86.5, 87.5, 88.5, 89.5], dtype=float32)
- time(time)float321.0
array([1.], dtype=float32)
- SOLIN(time, lat, lon)float32...
- average_op_ncl :
- dim_avg_n over dimension(s): time
- long_name :
- Incoming Solar Flux, Monthly Means
- standard_name :
- Incoming Solar Flux
- units :
- W m-2
- valid_min :
- 0.00000
- valid_max :
- 800.000
array([[[495.0077 , 495.0077 , ..., 495.0077 , 495.0077 ], [494.86923, 494.86923, ..., 494.86923, 494.86923], ..., [ 0. , 0. , ..., 0. , 0. ], [ 0. , 0. , ..., 0. , 0. ]]], dtype=float32)
- FLUT(time, lat, lon)float32...
- long_name :
- Top of The Atmosphere Longwave Flux, Monthly Means, All-Sky conditions
- standard_name :
- TOA Longwave Flux - All-Sky
- units :
- W m-2
- valid_min :
- 0.00000
- valid_max :
- 400.000
array([[[187.38461, 187.38461, ..., 187.38461, 187.38461], [190.79231, 190.79231, ..., 190.79231, 190.79231], ..., [170.49231, 170.49231, ..., 170.49231, 170.49231], [170.66924, 170.66924, ..., 170.66924, 170.66924]]], dtype=float32)
- FLUTC(time, lat, lon)float32...
- long_name :
- Top of The Atmosphere Longwave Flux, Monthly Means, Clear-Sky conditions
- standard_name :
- TOA Longwave Flux - Clear-Sky
- units :
- W m-2
- valid_min :
- 0.00000
- valid_max :
- 400.000
array([[[188.78462, 188.78462, ..., 188.78462, 188.78462], [192.3923 , 192.3923 , ..., 192.3923 , 192.3923 ], ..., [179.29231, 179.29231, ..., 179.29231, 179.29231], [178.93077, 178.93077, ..., 178.93077, 178.93077]]], dtype=float32)
- FSNTOA(time, lat, lon)float32...
- lunits :
- W m-2
- long_name :
- TOA net shortwave
array([[[ 1.477692e+02, 1.477692e+02, ..., 1.477692e+02, 1.477692e+02], [ 1.536615e+02, 1.536615e+02, ..., 1.536615e+02, 1.536615e+02], ..., [-4.900000e-02, -4.900000e-02, ..., -4.900000e-02, -4.900000e-02], [-4.900000e-02, -4.900000e-02, ..., -4.900000e-02, -4.900000e-02]]], dtype=float32)
- FSNTOAC(time, lat, lon)float32...
- long_name :
- TOA clear-sky net shortwave
- lunits :
- W m-2
array([[[ 1.509154e+02, 1.509154e+02, ..., 1.509154e+02, 1.509154e+02], [ 1.565077e+02, 1.565077e+02, ..., 1.565077e+02, 1.565077e+02], ..., [-6.000000e-03, -6.000000e-03, ..., -6.000000e-03, -6.000000e-03], [-6.000000e-03, -6.000000e-03, ..., -6.000000e-03, -6.000000e-03]]], dtype=float32)
- SWCF(time, lat, lon)float32...
- long_name :
- Top of The Atmosphere Cloud Radiative Effects Shortwave Flux, Monthly Means
- standard_name :
- TOA CRE Shortwave Flux
- units :
- W m-2
- valid_min :
- -400.000
- valid_max :
- 100.000
array([[[-3.149 , -3.149 , ..., -3.149 , -3.149 ], [-2.844985, -2.844985, ..., -2.844985, -2.844985], ..., [-0.043 , -0.043 , ..., -0.043 , -0.043 ], [-0.043 , -0.043 , ..., -0.043 , -0.043 ]]], dtype=float32)
- LWCF(time, lat, lon)float32...
- long_name :
- Top of The Atmosphere Cloud Radiative Effects Longwave Flux, Monthly Means
- standard_name :
- TOA CRE Longwave Flux
- units :
- W m-2
- valid_min :
- -100.000
- valid_max :
- 300.000
array([[[1.391205, 1.391205, ..., 1.391205, 1.391205], [1.616079, 1.616079, ..., 1.616079, 1.616079], ..., [8.813915, 8.813915, ..., 8.813915, 8.813915], [8.271692, 8.271692, ..., 8.271692, 8.271692]]], dtype=float32)
- RESTOA(time, lat, lon)float32...
- long_name :
- residual energy at TOA
- standard_name :
- TOA Net Flux - All-Sky
- units :
- W m-2
- valid_min :
- -400.000
- valid_max :
- 400.000
array([[[ -39.603077, -39.603077, ..., -39.603077, -39.603077], [ -37.104614, -37.104614, ..., -37.104614, -37.104614], ..., [-170.55385 , -170.55385 , ..., -170.55385 , -170.55385 ], [-170.71538 , -170.71538 , ..., -170.71538 , -170.71538 ]]], dtype=float32)
- ALBEDO(time, lat, lon)float32...
- long_name :
- TOA albedo
array([[[0.701483, 0.701483, ..., 0.701483, 0.701483], [0.689493, 0.689493, ..., 0.689493, 0.689493], ..., [ nan, nan, ..., nan, nan], [ nan, nan, ..., nan, nan]]], dtype=float32)
- ALBEDOC(time, lat, lon)float32...
- long_name :
- TOA clear-sky albedo
array([[[0.695126, 0.695126, ..., 0.695126, 0.695126], [0.683734, 0.683734, ..., 0.683734, 0.683734], ..., [ nan, nan, ..., nan, nan], [ nan, nan, ..., nan, nan]]], dtype=float32)
- gw(lat)float64...
- long_name :
- latitude weight
array([0.000152, 0.000457, 0.000761, 0.001065, 0.001369, 0.001673, 0.001976, 0.002278, 0.00258 , 0.002881, 0.003181, 0.00348 , 0.003778, 0.004074, 0.00437 , 0.004664, 0.004957, 0.005248, 0.005538, 0.005826, 0.006112, 0.006397, 0.006679, 0.006959, 0.007238, 0.007514, 0.007788, 0.008059, 0.008328, 0.008594, 0.008858, 0.009119, 0.009378, 0.009633, 0.009886, 0.010135, 0.010381, 0.010625, 0.010865, 0.011102, 0.011335, 0.011565, 0.011791, 0.012014, 0.012233, 0.012448, 0.01266 , 0.012868, 0.013072, 0.013271, 0.013467, 0.013659, 0.013846, 0.01403 , 0.014209, 0.014384, 0.014554, 0.01472 , 0.014881, 0.015038, 0.01519 , 0.015338, 0.015481, 0.015619, 0.015753, 0.015882, 0.016006, 0.016125, 0.016239, 0.016348, 0.016452, 0.016551, 0.016645, 0.016734, 0.016818, 0.016897, 0.016971, 0.017039, 0.017103, 0.017161, 0.017214, 0.017261, 0.017304, 0.017341, 0.017373, 0.017399, 0.017421, 0.017436, 0.017447, 0.017452, 0.017452, 0.017447, 0.017436, 0.017421, 0.017399, 0.017373, 0.017341, 0.017304, 0.017261, 0.017214, 0.017161, 0.017103, 0.017039, 0.016971, 0.016897, 0.016818, 0.016734, 0.016645, 0.016551, 0.016452, 0.016348, 0.016239, 0.016125, 0.016006, 0.015882, 0.015753, 0.015619, 0.015481, 0.015338, 0.01519 , 0.015038, 0.014881, 0.01472 , 0.014554, 0.014384, 0.014209, 0.01403 , 0.013846, 0.013659, 0.013467, 0.013271, 0.013072, 0.012868, 0.01266 , 0.012448, 0.012233, 0.012014, 0.011791, 0.011565, 0.011335, 0.011102, 0.010865, 0.010625, 0.010381, 0.010135, 0.009886, 0.009633, 0.009378, 0.009119, 0.008858, 0.008594, 0.008328, 0.008059, 0.007788, 0.007514, 0.007238, 0.006959, 0.006679, 0.006397, 0.006112, 0.005826, 0.005538, 0.005248, 0.004957, 0.004664, 0.00437 , 0.004074, 0.003778, 0.00348 , 0.003181, 0.002881, 0.00258 , 0.002278, 0.001976, 0.001673, 0.001369, 0.001065, 0.000761, 0.000457, 0.000152])
- version :
- This is version 2.8: March 7, 2014
- institution :
- NASA Langley Research Center
- comment :
- Data is from East to West and South to North. Climatology from 03/2000 to 02/2013.
- title :
- CERES EBAF (Energy Balanced and Filled) Fluxes. Monthly Averages and 13-year Climatology.
- AMWG_author :
- Cecile Hannay
- AMWG_creation_date :
- Thu Jul 24 16:08:10 MDT 2014 for AMWG package
- history :
- Thu Jul 24 16:08:10 2014: ncks -A -v gw CERES2_01_climo.nc CERES-EBAF_01_climo.nc
- NCO :
- 20140724
We see that this dataset is gridded on a global 0.5° grid, with several variables related to solar fluxes (ex. TOA net shortwave
)
Build a parser using ecgtools
#
We can use the ecgtools
package to help build an intake-esm
catalog for this dataset!
We start first by using pathlib
to generically look at the filepaths
path = pathlib.Path(files[0])
path.stem
'AIRS_01_climo'
This path can be split using .split('_')
, separates the path into the following:
observational dataset source
month number or season
“climo”
path.stem.split('_')
['AIRS', '01', 'climo']
We can also gather useful insight by opening the file!
ds = xr.open_dataset(files[0])
Let’s look at the variable “Temperature” (T
)
ds.T
<xarray.DataArray 'T' (time: 1, lev: 13, lat: 94, lon: 192)> [234624 values with dtype=float32] Coordinates: * lat (lat) float64 -88.54 -86.65 -84.75 -82.85 ... 84.75 86.65 88.54 * time (time) int32 1 * lev (lev) float32 1e+03 925.0 850.0 700.0 ... 200.0 150.0 100.0 70.0 * lon (lon) float32 0.0 1.875 3.75 5.625 7.5 ... 352.5 354.4 356.2 358.1 Attributes: units: K long_name: Temperature climatology: AIRS monthly climatology 9/2002-8/2006
- time: 1
- lev: 13
- lat: 94
- lon: 192
- ...
[234624 values with dtype=float32]
- lat(lat)float64-88.54 -86.65 ... 86.65 88.54
- units :
- degrees_north
- long_name :
- latitude
array([-88.541946, -86.653168, -84.753227, -82.850769, -80.947357, -79.04348 , -77.139351, -75.235054, -73.330658, -71.426186, -69.52166 , -67.617104, -65.712509, -63.807896, -61.903259, -59.998611, -58.093948, -56.189278, -54.284599, -52.379913, -50.47522 , -48.570518, -46.665817, -44.761108, -42.8564 , -40.951687, -39.04697 , -37.14225 , -35.23753 , -33.332806, -31.428082, -29.523355, -27.618628, -25.7139 , -23.80917 , -21.90444 , -19.999708, -18.094976, -16.190243, -14.28551 , -12.380776, -10.476042, -8.571308, -6.666573, -4.761838, -2.857103, -0.952368, 0.952368, 2.857103, 4.761838, 6.666573, 8.571308, 10.476042, 12.380776, 14.28551 , 16.190243, 18.094976, 19.999708, 21.90444 , 23.80917 , 25.7139 , 27.618628, 29.523355, 31.428082, 33.332806, 35.23753 , 37.14225 , 39.04697 , 40.951687, 42.8564 , 44.761108, 46.665817, 48.570518, 50.47522 , 52.379913, 54.284599, 56.189278, 58.093948, 59.998611, 61.903259, 63.807896, 65.712509, 67.617104, 69.52166 , 71.426186, 73.330658, 75.235054, 77.139351, 79.04348 , 80.947357, 82.850769, 84.753227, 86.653168, 88.541946])
- time(time)int321
array([1], dtype=int32)
- lev(lev)float321e+03 925.0 850.0 ... 100.0 70.0
- fill_value :
- 1e+36
- units :
- hPa
- long_name :
- Pressure
array([1000., 925., 850., 700., 600., 500., 400., 300., 250., 200., 150., 100., 70.], dtype=float32)
- lon(lon)float320.0 1.875 3.75 ... 356.2 358.1
- units :
- degrees_east
- long_name :
- longitude
array([ 0. , 1.875, 3.75 , 5.625, 7.5 , 9.375, 11.25 , 13.125, 15. , 16.875, 18.75 , 20.625, 22.5 , 24.375, 26.25 , 28.125, 30. , 31.875, 33.75 , 35.625, 37.5 , 39.375, 41.25 , 43.125, 45. , 46.875, 48.75 , 50.625, 52.5 , 54.375, 56.25 , 58.125, 60. , 61.875, 63.75 , 65.625, 67.5 , 69.375, 71.25 , 73.125, 75. , 76.875, 78.75 , 80.625, 82.5 , 84.375, 86.25 , 88.125, 90. , 91.875, 93.75 , 95.625, 97.5 , 99.375, 101.25 , 103.125, 105. , 106.875, 108.75 , 110.625, 112.5 , 114.375, 116.25 , 118.125, 120. , 121.875, 123.75 , 125.625, 127.5 , 129.375, 131.25 , 133.125, 135. , 136.875, 138.75 , 140.625, 142.5 , 144.375, 146.25 , 148.125, 150. , 151.875, 153.75 , 155.625, 157.5 , 159.375, 161.25 , 163.125, 165. , 166.875, 168.75 , 170.625, 172.5 , 174.375, 176.25 , 178.125, 180. , 181.875, 183.75 , 185.625, 187.5 , 189.375, 191.25 , 193.125, 195. , 196.875, 198.75 , 200.625, 202.5 , 204.375, 206.25 , 208.125, 210. , 211.875, 213.75 , 215.625, 217.5 , 219.375, 221.25 , 223.125, 225. , 226.875, 228.75 , 230.625, 232.5 , 234.375, 236.25 , 238.125, 240. , 241.875, 243.75 , 245.625, 247.5 , 249.375, 251.25 , 253.125, 255. , 256.875, 258.75 , 260.625, 262.5 , 264.375, 266.25 , 268.125, 270. , 271.875, 273.75 , 275.625, 277.5 , 279.375, 281.25 , 283.125, 285. , 286.875, 288.75 , 290.625, 292.5 , 294.375, 296.25 , 298.125, 300. , 301.875, 303.75 , 305.625, 307.5 , 309.375, 311.25 , 313.125, 315. , 316.875, 318.75 , 320.625, 322.5 , 324.375, 326.25 , 328.125, 330. , 331.875, 333.75 , 335.625, 337.5 , 339.375, 341.25 , 343.125, 345. , 346.875, 348.75 , 350.625, 352.5 , 354.375, 356.25 , 358.125], dtype=float32)
- units :
- K
- long_name :
- Temperature
- climatology :
- AIRS monthly climatology 9/2002-8/2006
You’ll notice that we have additional attributes about the dataset which we can use, including:
units
long_name
climatology, which includes more information about what dates were used
Adding the parsing function#
def parse_amwg_obs(file):
"""Atmospheric observational data used within the AMWG diagnostic package
Parameters
----------
file: str
filepath from the data directory including observational climatologies
Returns
-------
info: dict
Dictionary with information to add as columns in the data catalog
"""
file = pathlib.Path(file)
info_list = []
info = {}
try:
stem = file.stem
split = stem.split('_')
info['source'] = split[0]
temporal = split[-2]
if len(split[-2]) == 2:
month_number = int(temporal)
info['time_period'] = 'monthly'
info['temporal'] = datetime(2020, month_number, 1).strftime('%b').upper()
elif len(temporal) == 3:
info['time_period'] = 'seasonal'
info['temporal'] = temporal
else:
info['time_period'] = np.nan
info['temporal'] = np.nan
with xr.open_dataset(file, chunks={}, decode_times=False) as ds:
variables = [v for v, da in ds.variables.items() if 'time' in da.dims]
info['variables'] = variables
info['path'] = str(path)
return info
except Exception:
return {INVALID_ASSET: file, TRACEBACK: traceback.format_exc()}
'time' in ds['T'].dims
True
info = parse_amwg_obs(files[0])
info
{'source': 'AIRS',
'time_period': 'monthly',
'temporal': 'JAN',
'variables': ['T', 'time', 'RELHUM', 'O3', 'SHUM'],
'path': '/glade/p/cesm/amwg/amwg_diagnostics/obs_data/AIRS_01_climo.nc'}
pd.DataFrame(info)
source | time_period | temporal | variables | path | |
---|---|---|---|---|---|
0 | AIRS | monthly | JAN | T | /glade/p/cesm/amwg/amwg_diagnostics/obs_data/A... |
1 | AIRS | monthly | JAN | time | /glade/p/cesm/amwg/amwg_diagnostics/obs_data/A... |
2 | AIRS | monthly | JAN | RELHUM | /glade/p/cesm/amwg/amwg_diagnostics/obs_data/A... |
3 | AIRS | monthly | JAN | O3 | /glade/p/cesm/amwg/amwg_diagnostics/obs_data/A... |
4 | AIRS | monthly | JAN | SHUM | /glade/p/cesm/amwg/amwg_diagnostics/obs_data/A... |
We can test this on a single file!
This dictionary can easily be converted into a pandas.Dataframe
pd.DataFrame([info, info])
source | time_period | temporal | variables | path | |
---|---|---|---|---|---|
0 | AIRS | monthly | JAN | [T, time, RELHUM, O3, SHUM] | /glade/p/cesm/amwg/amwg_diagnostics/obs_data/A... |
1 | AIRS | monthly | JAN | [T, time, RELHUM, O3, SHUM] | /glade/p/cesm/amwg/amwg_diagnostics/obs_data/A... |
Use ecgtools
to build the catalog#
Now that we have our parser ready to go, we can use ecgtools
to build the catalog
b = Builder(
# Directory with the output
f"/glade/p/cesm/amwg/amwg_diagnostics/obs_data/",
depth=3,
# Number of jobs to execute - should be equal to # threads you are using
njobs=-1,
)
Once we set up the builder, can call .build
, passing in the parser we built
b.build(parse_amwg_obs)
b.df
source | time_period | temporal | variables | path | |
---|---|---|---|---|---|
0 | AIRS | monthly | JAN | [T, time, RELHUM, O3, SHUM] | /glade/p/cesm/amwg/amwg_diagnostics/obs_data/A... |
1 | AIRS | monthly | FEB | [T, time, RELHUM, O3, SHUM] | /glade/p/cesm/amwg/amwg_diagnostics/obs_data/A... |
2 | AIRS | monthly | MAR | [T, time, RELHUM, O3, SHUM] | /glade/p/cesm/amwg/amwg_diagnostics/obs_data/A... |
3 | AIRS | monthly | APR | [T, time, RELHUM, O3, SHUM] | /glade/p/cesm/amwg/amwg_diagnostics/obs_data/A... |
4 | AIRS | monthly | MAY | [T, time, RELHUM, O3, SHUM] | /glade/p/cesm/amwg/amwg_diagnostics/obs_data/A... |
... | ... | ... | ... | ... | ... |
3778 | mlsw | seasonal | ANN | [T, time, Z3, H2O, O3, RELHUM] | /glade/p/cesm/amwg/amwg_diagnostics/obs_data/A... |
3779 | mlsw | seasonal | DJF | [T, time, Z3, H2O, O3, RELHUM] | /glade/p/cesm/amwg/amwg_diagnostics/obs_data/A... |
3780 | mlsw | seasonal | JJA | [T, time, Z3, H2O, O3, RELHUM] | /glade/p/cesm/amwg/amwg_diagnostics/obs_data/A... |
3781 | mlsw | seasonal | MAM | [T, time, Z3, H2O, O3, RELHUM] | /glade/p/cesm/amwg/amwg_diagnostics/obs_data/A... |
3782 | mlsw | seasonal | SON | [T, time, Z3, H2O, O3, RELHUM] | /glade/p/cesm/amwg/amwg_diagnostics/obs_data/A... |
1243 rows × 5 columns
Save the catalog#
b.save(
# File path - could save as .csv (uncompressed csv) or .csv.gz (compressed csv)
"/glade/work/mgrover/intake-esm-catalogs/amwg_obs_datasets.csv",
# Column name including filepath
path_column_name='path',
# Column name including variables
variable_column_name='variables',
# Data file format - could be netcdf or zarr (in this case, netcdf)
data_format="netcdf",
# Which attributes to groupby when reading in variables using intake-esm
groupby_attrs=["source", "time_period"],
# Aggregations which are fed into xarray when reading in data using intake
aggregations=[
{
'type': 'join_new',
'attribute_name': 'temporal',
'options': {'coords': 'minimal', 'compat': 'override'},
},
],
)
Saved catalog location: /glade/work/mgrover/intake-esm-catalogs/amwg_obs_datasets.json and /glade/work/mgrover/intake-esm-catalogs/amwg_obs_datasets.csv
/glade/scratch/mgrover/ipykernel_140729/3449059562.py:1: UserWarning: Unable to parse 2541 assets/files. A list of these assets can be found in /glade/work/mgrover/intake-esm-catalogs/invalid_assets_amwg_obs_datasets.csv.
b.save(
Read in Observational Data from the Catalog#
We use intake-esm
to read in the observational data from the catalog we just created
obs_catalog = intake.open_esm_datastore(
"/glade/work/mgrover/intake-esm-catalogs/amwg_obs_datasets.json",
csv_kwargs={"converters": {"variables": ast.literal_eval}},
sep="/",
)
We are interested in Temperature (T
), from the AIRS
dataset, which comes from NASA
obs_catalog_subset = obs_catalog.search(variables='T', source='AIRS')
dsets = obs_catalog_subset.to_dataset_dict()
--> The keys in the returned dictionary of datasets are constructed as follows:
'source/time_period'
Our dictionary of datasets has two keys, one for seasonal and the other for monthly climatologies
dsets.keys()
dict_keys(['AIRS/seasonal', 'AIRS/monthly'])
seasonal_obs_ds = dsets['AIRS/seasonal']
monthly_obs_ds = dsets['AIRS/monthly']
Plot Observational Temperature Climatologies#
Now that we read in our data, we can plot up our results!
Plot Seasonal Temperature Climatologies from Observations#
seasonal_obs_plot = (
seasonal_obs_ds.isel(time=0, lev=range(3)).T.hvplot.quadmesh(
rasterize=True, groupby=['lev', 'temporal'], cmap='magma', projection=ccrs.Robinson()
)
* gf.coastline
)
seasonal_obs_plot
Plot Monthly Temperature Climatologies from Observations#
monthly_obs_plot = (
monthly_obs_ds.isel(time=0, lev=range(3)).T.hvplot.quadmesh(
rasterize=True, groupby=['lev', 'temporal'], cmap='magma', projection=ccrs.Robinson()
)
* gf.coastline
)
monthly_obs_plot
Read in CESM Output#
cesm_data_catalog = intake.open_esm_datastore(
'/glade/collections/cmip/catalog/intake-esm-datastore/catalogs/glade-cesm2-le.json'
)
/glade/work/mgrover/miniconda3/envs/cesm-collections-dev/lib/python3.9/site-packages/IPython/core/interactiveshell.py:3441: DtypeWarning: Columns (8,9) have mixed types.Specify dtype option on import or set low_memory=False.
exec(code_obj, self.user_global_ns, self.user_ns)
cluster = NCARCluster(memory='10 GB')
cluster.scale(20)
client = Client(cluster)
client
Client
Client-a916e23c-077a-11ec-83b6-3cecef1b11fa
Connection method: Cluster object | Cluster type: PBSCluster |
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/mgrover/proxy/37300/status |
Cluster Info
PBSCluster
c5c7baf9
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/mgrover/proxy/37300/status | Workers: 20 |
Total threads: 40 | Total memory: 186.20 GiB |
Scheduler Info
Scheduler
Scheduler-21aaa4ef-3fb4-4cf3-8445-824a2fbf35c5
Comm: tcp://10.12.206.54:39511 | Workers: 20 |
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/mgrover/proxy/37300/status | Total threads: 40 |
Started: Just now | Total memory: 186.20 GiB |
Workers
Worker: 0
Comm: tcp://10.12.206.33:44592 | Total threads: 2 |
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/mgrover/proxy/37012/status | Memory: 9.31 GiB |
Nanny: tcp://10.12.206.33:43117 | |
Local directory: /glade/scratch/mgrover/dask/casper-dav/local-dir/dask-worker-space/worker-m222qwsu |
Worker: 1
Comm: tcp://10.12.206.40:37640 | Total threads: 2 |
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/mgrover/proxy/38447/status | Memory: 9.31 GiB |
Nanny: tcp://10.12.206.40:33049 | |
Local directory: /glade/scratch/mgrover/dask/casper-dav/local-dir/dask-worker-space/worker-4d1rk9dp |
Worker: 2
Comm: tcp://10.12.206.36:35739 | Total threads: 2 |
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/mgrover/proxy/45826/status | Memory: 9.31 GiB |
Nanny: tcp://10.12.206.36:33354 | |
Local directory: /glade/scratch/mgrover/dask/casper-dav/local-dir/dask-worker-space/worker-vufi_4f6 |
Worker: 3
Comm: tcp://10.12.206.36:38619 | Total threads: 2 |
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/mgrover/proxy/39734/status | Memory: 9.31 GiB |
Nanny: tcp://10.12.206.36:36501 | |
Local directory: /glade/scratch/mgrover/dask/casper-dav/local-dir/dask-worker-space/worker-y402nce5 |
Worker: 4
Comm: tcp://10.12.206.57:33624 | Total threads: 2 |
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/mgrover/proxy/32910/status | Memory: 9.31 GiB |
Nanny: tcp://10.12.206.57:45128 | |
Local directory: /glade/scratch/mgrover/dask/casper-dav/local-dir/dask-worker-space/worker-2r47banl |
Worker: 5
Comm: tcp://10.12.206.52:43348 | Total threads: 2 |
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/mgrover/proxy/40363/status | Memory: 9.31 GiB |
Nanny: tcp://10.12.206.52:33827 | |
Local directory: /glade/scratch/mgrover/dask/casper-dav/local-dir/dask-worker-space/worker-sioaa0mt |
Worker: 6
Comm: tcp://10.12.206.36:35941 | Total threads: 2 |
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/mgrover/proxy/41518/status | Memory: 9.31 GiB |
Nanny: tcp://10.12.206.36:36320 | |
Local directory: /glade/scratch/mgrover/dask/casper-dav/local-dir/dask-worker-space/worker-__saonvw |
Worker: 7
Comm: tcp://10.12.206.36:45373 | Total threads: 2 |
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/mgrover/proxy/42370/status | Memory: 9.31 GiB |
Nanny: tcp://10.12.206.36:46702 | |
Local directory: /glade/scratch/mgrover/dask/casper-dav/local-dir/dask-worker-space/worker-f6zb330g |
Worker: 8
Comm: tcp://10.12.206.36:34834 | Total threads: 2 |
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/mgrover/proxy/42757/status | Memory: 9.31 GiB |
Nanny: tcp://10.12.206.36:36298 | |
Local directory: /glade/scratch/mgrover/dask/casper-dav/local-dir/dask-worker-space/worker-mecj2384 |
Worker: 9
Comm: tcp://10.12.206.40:46231 | Total threads: 2 |
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/mgrover/proxy/35795/status | Memory: 9.31 GiB |
Nanny: tcp://10.12.206.40:42007 | |
Local directory: /glade/scratch/mgrover/dask/casper-dav/local-dir/dask-worker-space/worker-xmf87s1j |
Worker: 10
Comm: tcp://10.12.206.53:44356 | Total threads: 2 |
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/mgrover/proxy/39194/status | Memory: 9.31 GiB |
Nanny: tcp://10.12.206.53:34434 | |
Local directory: /glade/scratch/mgrover/dask/casper-dav/local-dir/dask-worker-space/worker-qsv6hesl |
Worker: 11
Comm: tcp://10.12.206.36:39578 | Total threads: 2 |
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/mgrover/proxy/41349/status | Memory: 9.31 GiB |
Nanny: tcp://10.12.206.36:39417 | |
Local directory: /glade/scratch/mgrover/dask/casper-dav/local-dir/dask-worker-space/worker-azjmpit9 |
Worker: 12
Comm: tcp://10.12.206.36:38517 | Total threads: 2 |
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/mgrover/proxy/46172/status | Memory: 9.31 GiB |
Nanny: tcp://10.12.206.36:40589 | |
Local directory: /glade/scratch/mgrover/dask/casper-dav/local-dir/dask-worker-space/worker-nwj08h26 |
Worker: 13
Comm: tcp://10.12.206.57:39800 | Total threads: 2 |
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/mgrover/proxy/39636/status | Memory: 9.31 GiB |
Nanny: tcp://10.12.206.57:41925 | |
Local directory: /glade/scratch/mgrover/dask/casper-dav/local-dir/dask-worker-space/worker-kibi9ldk |
Worker: 14
Comm: tcp://10.12.206.36:41381 | Total threads: 2 |
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/mgrover/proxy/39753/status | Memory: 9.31 GiB |
Nanny: tcp://10.12.206.36:33598 | |
Local directory: /glade/scratch/mgrover/dask/casper-dav/local-dir/dask-worker-space/worker-qssg0k7_ |
Worker: 15
Comm: tcp://10.12.206.52:45124 | Total threads: 2 |
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/mgrover/proxy/44387/status | Memory: 9.31 GiB |
Nanny: tcp://10.12.206.52:40194 | |
Local directory: /glade/scratch/mgrover/dask/casper-dav/local-dir/dask-worker-space/worker-gfevyho4 |
Worker: 16
Comm: tcp://10.12.206.32:40429 | Total threads: 2 |
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/mgrover/proxy/35112/status | Memory: 9.31 GiB |
Nanny: tcp://10.12.206.32:34670 | |
Local directory: /glade/scratch/mgrover/dask/casper-dav/local-dir/dask-worker-space/worker-yv6ztsjs |
Worker: 17
Comm: tcp://10.12.206.36:45101 | Total threads: 2 |
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/mgrover/proxy/39483/status | Memory: 9.31 GiB |
Nanny: tcp://10.12.206.36:40381 | |
Local directory: /glade/scratch/mgrover/dask/casper-dav/local-dir/dask-worker-space/worker-l9t684r1 |
Worker: 18
Comm: tcp://10.12.206.36:44174 | Total threads: 2 |
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/mgrover/proxy/44713/status | Memory: 9.31 GiB |
Nanny: tcp://10.12.206.36:45567 | |
Local directory: /glade/scratch/mgrover/dask/casper-dav/local-dir/dask-worker-space/worker-gjnimb52 |
Worker: 19
Comm: tcp://10.12.206.36:40561 | Total threads: 2 |
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/mgrover/proxy/41639/status | Memory: 9.31 GiB |
Nanny: tcp://10.12.206.36:44220 | |
Local directory: /glade/scratch/mgrover/dask/casper-dav/local-dir/dask-worker-space/worker-gngr7asx |
Query for monthly temperature (T
) values, using the historical experiment#
cesm_data_catalog_subset = cesm_data_catalog.search(
variable='T', control_branch_year=1001, frequency='month_1', experiment='historical'
)
Since we taking the average over time, we choose to chunk by vertical levels (lev
)
dsets = cesm_data_catalog_subset.to_dataset_dict(cdf_kwargs={'chunks': {'lev': 5}})
--> The keys in the returned dictionary of datasets are constructed as follows:
'component.experiment.stream.forcing_variant.control_branch_year.variable'
We only have a single key - with the component.experiment.stream.forcing_variant.control_branch_year.variable
ds = dsets['atm.historical.cam.h0.cmip6.1001.T']
Our dataset has a chunk size of ~125 mb which is around the optimal size!
ds.T
<xarray.DataArray 'T' (member_id: 1, time: 1980, lev: 32, lat: 192, lon: 288)> dask.array<broadcast_to, shape=(1, 1980, 32, 192, 288), dtype=float32, chunksize=(1, 120, 5, 192, 288), chunktype=numpy.ndarray> Coordinates: * lat (lat) float64 -90.0 -89.06 -88.12 -87.17 ... 88.12 89.06 90.0 * lon (lon) float64 0.0 1.25 2.5 3.75 5.0 ... 355.0 356.2 357.5 358.8 * lev (lev) float64 3.643 7.595 14.36 24.61 ... 936.2 957.5 976.3 992.6 * time (time) object 1860-02-01 00:00:00 ... 1980-01-01 00:00:00 * member_id (member_id) <U11 'r1i1001p1f1' Attributes: mdims: 1 units: K long_name: Temperature cell_methods: time: mean
- member_id: 1
- time: 1980
- lev: 32
- lat: 192
- lon: 288
- dask.array<chunksize=(1, 120, 5, 192, 288), meta=np.ndarray>
Array Chunk Bytes 13.05 GiB 126.56 MiB Shape (1, 1980, 32, 192, 288) (1, 120, 5, 192, 288) Count 374 Tasks 119 Chunks Type float32 numpy.ndarray - lat(lat)float64-90.0 -89.06 -88.12 ... 89.06 90.0
- long_name :
- latitude
- units :
- degrees_north
array([-90. , -89.057592, -88.115183, -87.172775, -86.230366, -85.287958, -84.34555 , -83.403141, -82.460733, -81.518325, -80.575916, -79.633508, -78.691099, -77.748691, -76.806283, -75.863874, -74.921466, -73.979058, -73.036649, -72.094241, -71.151832, -70.209424, -69.267016, -68.324607, -67.382199, -66.439791, -65.497382, -64.554974, -63.612565, -62.670157, -61.727749, -60.78534 , -59.842932, -58.900524, -57.958115, -57.015707, -56.073298, -55.13089 , -54.188482, -53.246073, -52.303665, -51.361257, -50.418848, -49.47644 , -48.534031, -47.591623, -46.649215, -45.706806, -44.764398, -43.82199 , -42.879581, -41.937173, -40.994764, -40.052356, -39.109948, -38.167539, -37.225131, -36.282723, -35.340314, -34.397906, -33.455497, -32.513089, -31.570681, -30.628272, -29.685864, -28.743455, -27.801047, -26.858639, -25.91623 , -24.973822, -24.031414, -23.089005, -22.146597, -21.204188, -20.26178 , -19.319372, -18.376963, -17.434555, -16.492147, -15.549738, -14.60733 , -13.664921, -12.722513, -11.780105, -10.837696, -9.895288, -8.95288 , -8.010471, -7.068063, -6.125654, -5.183246, -4.240838, -3.298429, -2.356021, -1.413613, -0.471204, 0.471204, 1.413613, 2.356021, 3.298429, 4.240838, 5.183246, 6.125654, 7.068063, 8.010471, 8.95288 , 9.895288, 10.837696, 11.780105, 12.722513, 13.664921, 14.60733 , 15.549738, 16.492147, 17.434555, 18.376963, 19.319372, 20.26178 , 21.204188, 22.146597, 23.089005, 24.031414, 24.973822, 25.91623 , 26.858639, 27.801047, 28.743455, 29.685864, 30.628272, 31.570681, 32.513089, 33.455497, 34.397906, 35.340314, 36.282723, 37.225131, 38.167539, 39.109948, 40.052356, 40.994764, 41.937173, 42.879581, 43.82199 , 44.764398, 45.706806, 46.649215, 47.591623, 48.534031, 49.47644 , 50.418848, 51.361257, 52.303665, 53.246073, 54.188482, 55.13089 , 56.073298, 57.015707, 57.958115, 58.900524, 59.842932, 60.78534 , 61.727749, 62.670157, 63.612565, 64.554974, 65.497382, 66.439791, 67.382199, 68.324607, 69.267016, 70.209424, 71.151832, 72.094241, 73.036649, 73.979058, 74.921466, 75.863874, 76.806283, 77.748691, 78.691099, 79.633508, 80.575916, 81.518325, 82.460733, 83.403141, 84.34555 , 85.287958, 86.230366, 87.172775, 88.115183, 89.057592, 90. ])
- lon(lon)float640.0 1.25 2.5 ... 356.2 357.5 358.8
- long_name :
- longitude
- units :
- degrees_east
array([ 0. , 1.25, 2.5 , ..., 356.25, 357.5 , 358.75])
- lev(lev)float643.643 7.595 14.36 ... 976.3 992.6
- long_name :
- hybrid level at midpoints (1000*(A+B))
- units :
- hPa
- positive :
- down
- standard_name :
- atmosphere_hybrid_sigma_pressure_coordinate
- formula_terms :
- a: hyam b: hybm p0: P0 ps: PS
array([ 3.643466, 7.59482 , 14.356632, 24.61222 , 35.92325 , 43.19375 , 51.677499, 61.520498, 73.750958, 87.82123 , 103.317127, 121.547241, 142.994039, 168.22508 , 197.908087, 232.828619, 273.910817, 322.241902, 379.100904, 445.992574, 524.687175, 609.778695, 691.38943 , 763.404481, 820.858369, 859.534767, 887.020249, 912.644547, 936.198398, 957.48548 , 976.325407, 992.556095])
- time(time)object1860-02-01 00:00:00 ... 1980-01-...
- long_name :
- time
- bounds :
- time_bnds
array([cftime.datetime(1860, 2, 1, 0, 0, 0, 0, calendar='noleap', has_year_zero=True), cftime.datetime(1860, 3, 1, 0, 0, 0, 0, calendar='noleap', has_year_zero=True), cftime.datetime(1860, 4, 1, 0, 0, 0, 0, calendar='noleap', has_year_zero=True), ..., cftime.datetime(1979, 11, 1, 0, 0, 0, 0, calendar='noleap', has_year_zero=True), cftime.datetime(1979, 12, 1, 0, 0, 0, 0, calendar='noleap', has_year_zero=True), cftime.datetime(1980, 1, 1, 0, 0, 0, 0, calendar='noleap', has_year_zero=True)], dtype=object)
- member_id(member_id)<U11'r1i1001p1f1'
array(['r1i1001p1f1'], dtype='<U11')
- mdims :
- 1
- units :
- K
- long_name :
- Temperature
- cell_methods :
- time: mean
Compute Monthly and Seasonal Averages#
Over the summer, additional functionality was added to GeoCAT, with functions supporting climatology
calculations
cesm_monthly_mean_temperature = geocat.comp.climatology(ds.T, 'month')
cesm_seasonal_mean_temperature = geocat.comp.climatology(ds.T, 'season')
cesm_monthly_mean_temperature.load()
cesm_seasonal_mean_temperature.load()
<xarray.DataArray 'T' (member_id: 1, season: 4, lev: 32, lat: 192, lon: 288)> array([[[[[271.27997, 271.27997, 271.27997, ..., 271.27997, 271.27997, 271.27997], [271.22708, 271.22745, 271.22787, ..., 271.226 , 271.22632, 271.2267 ], [271.1701 , 271.1706 , 271.1711 , ..., 271.16895, 271.16928, 271.1697 ], ..., [204.17365, 204.15831, 204.14384, ..., 204.22487, 204.20694, 204.18985], [204.95549, 204.9503 , 204.94557, ..., 204.97383, 204.96724, 204.96115], [206.02718, 206.02718, 206.02718, ..., 206.02718, 206.02718, 206.02718]], [[255.13527, 255.13527, 255.13527, ..., 255.13527, 255.13527, 255.13527], [255.05142, 255.05226, 255.05318, ..., 255.0489 , 255.04971, 255.05055], [254.94571, 254.94685, 254.94814, ..., 254.94264, 254.94356, 254.9446 ], ... [265.77487, 265.7838 , 265.7927 , ..., 265.747 , 265.75638, 265.76566], [265.74924, 265.7531 , 265.757 , ..., 265.7369 , 265.7411 , 265.7452 ], [265.73276, 265.73276, 265.73276, ..., 265.7328 , 265.73276, 265.73276]], [[217.79984, 217.79984, 217.79984, ..., 217.79984, 217.79984, 217.79984], [218.30846, 218.30481, 218.30156, ..., 218.32202, 218.31757, 218.3122 ], [218.94579, 218.91786, 218.88835, ..., 219.02962, 219. , 218.97177], ..., [265.71143, 265.72287, 265.73416, ..., 265.67624, 265.6879 , 265.69962], [265.69626, 265.701 , 265.70557, ..., 265.68204, 265.68677, 265.6915 ], [265.68402, 265.68405, 265.68408, ..., 265.68396, 265.684 , 265.684 ]]]]], dtype=float32) Coordinates: * lat (lat) float64 -90.0 -89.06 -88.12 -87.17 ... 88.12 89.06 90.0 * lon (lon) float64 0.0 1.25 2.5 3.75 5.0 ... 355.0 356.2 357.5 358.8 * lev (lev) float64 3.643 7.595 14.36 24.61 ... 936.2 957.5 976.3 992.6 * member_id (member_id) <U11 'r1i1001p1f1' * season (season) object 'DJF' 'JJA' 'MAM' 'SON' Attributes: mdims: 1 units: K long_name: Temperature cell_methods: time: mean
- member_id: 1
- season: 4
- lev: 32
- lat: 192
- lon: 288
- 271.3 271.3 271.3 271.3 271.3 271.3 ... 265.7 265.7 265.7 265.7 265.7
array([[[[[271.27997, 271.27997, 271.27997, ..., 271.27997, 271.27997, 271.27997], [271.22708, 271.22745, 271.22787, ..., 271.226 , 271.22632, 271.2267 ], [271.1701 , 271.1706 , 271.1711 , ..., 271.16895, 271.16928, 271.1697 ], ..., [204.17365, 204.15831, 204.14384, ..., 204.22487, 204.20694, 204.18985], [204.95549, 204.9503 , 204.94557, ..., 204.97383, 204.96724, 204.96115], [206.02718, 206.02718, 206.02718, ..., 206.02718, 206.02718, 206.02718]], [[255.13527, 255.13527, 255.13527, ..., 255.13527, 255.13527, 255.13527], [255.05142, 255.05226, 255.05318, ..., 255.0489 , 255.04971, 255.05055], [254.94571, 254.94685, 254.94814, ..., 254.94264, 254.94356, 254.9446 ], ... [265.77487, 265.7838 , 265.7927 , ..., 265.747 , 265.75638, 265.76566], [265.74924, 265.7531 , 265.757 , ..., 265.7369 , 265.7411 , 265.7452 ], [265.73276, 265.73276, 265.73276, ..., 265.7328 , 265.73276, 265.73276]], [[217.79984, 217.79984, 217.79984, ..., 217.79984, 217.79984, 217.79984], [218.30846, 218.30481, 218.30156, ..., 218.32202, 218.31757, 218.3122 ], [218.94579, 218.91786, 218.88835, ..., 219.02962, 219. , 218.97177], ..., [265.71143, 265.72287, 265.73416, ..., 265.67624, 265.6879 , 265.69962], [265.69626, 265.701 , 265.70557, ..., 265.68204, 265.68677, 265.6915 ], [265.68402, 265.68405, 265.68408, ..., 265.68396, 265.684 , 265.684 ]]]]], dtype=float32)
- lat(lat)float64-90.0 -89.06 -88.12 ... 89.06 90.0
- long_name :
- latitude
- units :
- degrees_north
array([-90. , -89.057592, -88.115183, -87.172775, -86.230366, -85.287958, -84.34555 , -83.403141, -82.460733, -81.518325, -80.575916, -79.633508, -78.691099, -77.748691, -76.806283, -75.863874, -74.921466, -73.979058, -73.036649, -72.094241, -71.151832, -70.209424, -69.267016, -68.324607, -67.382199, -66.439791, -65.497382, -64.554974, -63.612565, -62.670157, -61.727749, -60.78534 , -59.842932, -58.900524, -57.958115, -57.015707, -56.073298, -55.13089 , -54.188482, -53.246073, -52.303665, -51.361257, -50.418848, -49.47644 , -48.534031, -47.591623, -46.649215, -45.706806, -44.764398, -43.82199 , -42.879581, -41.937173, -40.994764, -40.052356, -39.109948, -38.167539, -37.225131, -36.282723, -35.340314, -34.397906, -33.455497, -32.513089, -31.570681, -30.628272, -29.685864, -28.743455, -27.801047, -26.858639, -25.91623 , -24.973822, -24.031414, -23.089005, -22.146597, -21.204188, -20.26178 , -19.319372, -18.376963, -17.434555, -16.492147, -15.549738, -14.60733 , -13.664921, -12.722513, -11.780105, -10.837696, -9.895288, -8.95288 , -8.010471, -7.068063, -6.125654, -5.183246, -4.240838, -3.298429, -2.356021, -1.413613, -0.471204, 0.471204, 1.413613, 2.356021, 3.298429, 4.240838, 5.183246, 6.125654, 7.068063, 8.010471, 8.95288 , 9.895288, 10.837696, 11.780105, 12.722513, 13.664921, 14.60733 , 15.549738, 16.492147, 17.434555, 18.376963, 19.319372, 20.26178 , 21.204188, 22.146597, 23.089005, 24.031414, 24.973822, 25.91623 , 26.858639, 27.801047, 28.743455, 29.685864, 30.628272, 31.570681, 32.513089, 33.455497, 34.397906, 35.340314, 36.282723, 37.225131, 38.167539, 39.109948, 40.052356, 40.994764, 41.937173, 42.879581, 43.82199 , 44.764398, 45.706806, 46.649215, 47.591623, 48.534031, 49.47644 , 50.418848, 51.361257, 52.303665, 53.246073, 54.188482, 55.13089 , 56.073298, 57.015707, 57.958115, 58.900524, 59.842932, 60.78534 , 61.727749, 62.670157, 63.612565, 64.554974, 65.497382, 66.439791, 67.382199, 68.324607, 69.267016, 70.209424, 71.151832, 72.094241, 73.036649, 73.979058, 74.921466, 75.863874, 76.806283, 77.748691, 78.691099, 79.633508, 80.575916, 81.518325, 82.460733, 83.403141, 84.34555 , 85.287958, 86.230366, 87.172775, 88.115183, 89.057592, 90. ])
- lon(lon)float640.0 1.25 2.5 ... 356.2 357.5 358.8
- long_name :
- longitude
- units :
- degrees_east
array([ 0. , 1.25, 2.5 , ..., 356.25, 357.5 , 358.75])
- lev(lev)float643.643 7.595 14.36 ... 976.3 992.6
- long_name :
- hybrid level at midpoints (1000*(A+B))
- units :
- hPa
- positive :
- down
- standard_name :
- atmosphere_hybrid_sigma_pressure_coordinate
- formula_terms :
- a: hyam b: hybm p0: P0 ps: PS
array([ 3.643466, 7.59482 , 14.356632, 24.61222 , 35.92325 , 43.19375 , 51.677499, 61.520498, 73.750958, 87.82123 , 103.317127, 121.547241, 142.994039, 168.22508 , 197.908087, 232.828619, 273.910817, 322.241902, 379.100904, 445.992574, 524.687175, 609.778695, 691.38943 , 763.404481, 820.858369, 859.534767, 887.020249, 912.644547, 936.198398, 957.48548 , 976.325407, 992.556095])
- member_id(member_id)<U11'r1i1001p1f1'
array(['r1i1001p1f1'], dtype='<U11')
- season(season)object'DJF' 'JJA' 'MAM' 'SON'
array(['DJF', 'JJA', 'MAM', 'SON'], dtype=object)
- mdims :
- 1
- units :
- K
- long_name :
- Temperature
- cell_methods :
- time: mean
Plot CESM2-LE Temperature Climatologies#
Plot up the Seasonal Mean Temperature#
We select for our single member_id
, and choose the last few vertical levels since the vertical dimensions are sorted from the top of the atmosphere, to the bottom, which is the inverse of how the observational data is sorted
cesm_seasonal_temperature_plot = (
cesm_seasonal_mean_temperature.isel(member_id=0, lev=range(-5, -1)).T.hvplot.quadmesh(
rasterize=True,
groupby=['lev', 'season'],
x='lon',
y='lat',
cmap='magma',
projection=ccrs.Robinson(),
project=True,
)
* gf.coastline
)
cesm_seasonal_temperature_plot
Plot up the Monthly Mean Temperature#
cesm_monthly_temperature_plot = (
cesm_monthly_mean_temperature.isel(member_id=0, lev=range(-5, -1)).T.hvplot.quadmesh(
rasterize=True,
groupby=['lev', 'month'],
x='lon',
y='lat',
cmap='magma',
projection=ccrs.Robinson(),
project=True,
)
* gf.coastline
)
cesm_monthly_temperature_plot
Conclusion#
Throughout this example, we showed how useful intake-esm
can be when querying datasets, both observational and model datasets! We also showed you can use hvPlot to plot interactive maps, even transforming to different coordinate reference systems. While the observational and model datasets did not have matching spatial or vertical coordinates, plotting these comparisons as a first cut can still be useful!
If you are interested in using the data catalogs, go for it! Here are the paths:
AMWG Observational Comparison Catalog -
/glade/work/mgrover/intake-esm-catalogs/amwg_obs_datasets.json
CESM2 Large Ensemble Catalog =
/glade/collections/cmip/catalog/intake-esm-datastore/catalogs/glade-cesm2-le.json