Comparing Atmospheric Model Output with Observations Using Intake-ESM#
Comparing models and observations is a critical component of climate diagnostic packages. This process can be challenging though - given the number of observational datasets to compare against, and the difference in spatiotemporal resolutions. In the previous iteration of the diagnostics package used for atmospheric data from the Community Earth System Model (CESM), they used pre-computed, observational datasets stored in a directory on the GLADE filesystem (/glade/p/cesm/amwg/amwg_diagnostics/obs_data)
Within this example, we walk though generating an intake-esm catalog from the observational data, reading in CESM data, and compare models and observations
Imports#
import ast
import glob
import pathlib
import traceback
import warnings
from datetime import datetime
import cartopy.crs as ccrs
import geocat.comp
import geoviews.feature as gf
import holoviews as hv
import hvplot
import hvplot.xarray
import intake
import matplotlib.pyplot as plt
import pandas as pd
import xarray as xr
from distributed import Client
from ecgtools import Builder
from ecgtools.builder import INVALID_ASSET, TRACEBACK
from ncar_jobqueue import NCARCluster
hv.extension('bokeh')
Build an Intake-ESM catalog from the observational dataset#
Before we jump into building the catalog, let’s take a look at what data we are working with
Investigate the File Structure and Sample Dataset#
files = sorted(glob.glob('/glade/p/cesm/amwg/amwg_diagnostics/obs_data/*'))
files[::20]
['/glade/p/cesm/amwg/amwg_diagnostics/obs_data/AIRS_01_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/ARM_annual_cycle_twp_c2_cmbe_sound_p_f.cdf',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/CERES-EBAF_01_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/CERES2_04_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/CERES_07_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/CLOUDSATCOSP_07_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/CLOUDSAT_10_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/ECMWF_09_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/EP.ERAI_DJF_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/ERAI_04_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/ERBE_07_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/ERS_12_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/GPCP_JJA_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/HadISST_CL_03_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/HadISST_PD_02_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/HadISST_PI_05_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/ISCCPCOSP_07_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/ISCCPFD_07_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/ISCCP_12_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/JRA25_SON_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/LEGATES_04_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/MERRAW_19x2_09_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/MERRA_12_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/MISRCOSP_JJA_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/MODIS_ANN_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/NVAP_03_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/PRECL_07_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/SSMI_09_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/SSMI_SEAICE_DJF_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/TRMM_MAM_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/WARREN_DJF_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/WILLMOTT_04_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/XIEARKIN_09_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/mlsg_10_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/mlso_ANN_climo.nc',
'/glade/p/cesm/amwg/amwg_diagnostics/obs_data/mlsw_MAM_climo.nc']
Observational datasetsets in this directory follow the convention source_(month or season)_climo.nc.
Let’s open up one of those datasets
ds = xr.open_dataset('/glade/p/cesm/amwg/amwg_diagnostics/obs_data/CERES-EBAF_01_climo.nc')
ds
<xarray.Dataset>
Dimensions: (lon: 360, lat: 180, time: 1)
Coordinates:
* lon (lon) float32 0.5 1.5 2.5 3.5 4.5 ... 355.5 356.5 357.5 358.5 359.5
* lat (lat) float32 -89.5 -88.5 -87.5 -86.5 -85.5 ... 86.5 87.5 88.5 89.5
* time (time) float32 1.0
Data variables:
SOLIN (time, lat, lon) float32 495.0 495.0 495.0 495.0 ... 0.0 0.0 0.0
FLUT (time, lat, lon) float32 187.4 187.4 187.4 ... 170.7 170.7 170.7
FLUTC (time, lat, lon) float32 188.8 188.8 188.8 ... 178.9 178.9 178.9
FSNTOA (time, lat, lon) float32 147.8 147.8 147.8 ... -0.049 -0.049 -0.049
FSNTOAC (time, lat, lon) float32 150.9 150.9 150.9 ... -0.006 -0.006 -0.006
SWCF (time, lat, lon) float32 -3.149 -3.149 -3.149 ... -0.043 -0.043
LWCF (time, lat, lon) float32 1.391 1.391 1.391 ... 8.272 8.272 8.272
RESTOA (time, lat, lon) float32 -39.6 -39.6 -39.6 ... -170.7 -170.7 -170.7
ALBEDO (time, lat, lon) float32 0.7015 0.7015 0.7015 ... nan nan nan
ALBEDOC (time, lat, lon) float32 0.6951 0.6951 0.6951 ... nan nan nan
gw (lat) float64 0.0001523 0.0004569 0.0007613 ... 0.0004569 0.0001523
Attributes:
version: This is version 2.8: March 7, 2014
institution: NASA Langley Research Center
comment: Data is from East to West and South to North. Climat...
title: CERES EBAF (Energy Balanced and Filled) Fluxes. Mont...
AMWG_author: Cecile Hannay
AMWG_creation_date: Thu Jul 24 16:08:10 MDT 2014 for AMWG package
history: Thu Jul 24 16:08:10 2014: ncks -A -v gw CERES2_01_cl...
NCO: 20140724- lon: 360
- lat: 180
- time: 1
- lon(lon)float320.5 1.5 2.5 ... 357.5 358.5 359.5
- long_name :
- longitude
- standard_name :
- longitude
- units :
- degrees_east
- valid_range :
- [-180. 360.]
array([ 0.5, 1.5, 2.5, ..., 357.5, 358.5, 359.5], dtype=float32)
- lat(lat)float32-89.5 -88.5 -87.5 ... 88.5 89.5
- long_name :
- latitude
- standard_name :
- latitude
- units :
- degrees_north
- valid_range :
- [-90. 90.]
- actual_range :
- [-89.5 89.5]
array([-89.5, -88.5, -87.5, -86.5, -85.5, -84.5, -83.5, -82.5, -81.5, -80.5, -79.5, -78.5, -77.5, -76.5, -75.5, -74.5, -73.5, -72.5, -71.5, -70.5, -69.5, -68.5, -67.5, -66.5, -65.5, -64.5, -63.5, -62.5, -61.5, -60.5, -59.5, -58.5, -57.5, -56.5, -55.5, -54.5, -53.5, -52.5, -51.5, -50.5, -49.5, -48.5, -47.5, -46.5, -45.5, -44.5, -43.5, -42.5, -41.5, -40.5, -39.5, -38.5, -37.5, -36.5, -35.5, -34.5, -33.5, -32.5, -31.5, -30.5, -29.5, -28.5, -27.5, -26.5, -25.5, -24.5, -23.5, -22.5, -21.5, -20.5, -19.5, -18.5, -17.5, -16.5, -15.5, -14.5, -13.5, -12.5, -11.5, -10.5, -9.5, -8.5, -7.5, -6.5, -5.5, -4.5, -3.5, -2.5, -1.5, -0.5, 0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5, 10.5, 11.5, 12.5, 13.5, 14.5, 15.5, 16.5, 17.5, 18.5, 19.5, 20.5, 21.5, 22.5, 23.5, 24.5, 25.5, 26.5, 27.5, 28.5, 29.5, 30.5, 31.5, 32.5, 33.5, 34.5, 35.5, 36.5, 37.5, 38.5, 39.5, 40.5, 41.5, 42.5, 43.5, 44.5, 45.5, 46.5, 47.5, 48.5, 49.5, 50.5, 51.5, 52.5, 53.5, 54.5, 55.5, 56.5, 57.5, 58.5, 59.5, 60.5, 61.5, 62.5, 63.5, 64.5, 65.5, 66.5, 67.5, 68.5, 69.5, 70.5, 71.5, 72.5, 73.5, 74.5, 75.5, 76.5, 77.5, 78.5, 79.5, 80.5, 81.5, 82.5, 83.5, 84.5, 85.5, 86.5, 87.5, 88.5, 89.5], dtype=float32) - time(time)float321.0
array([1.], dtype=float32)
- SOLIN(time, lat, lon)float32...
- average_op_ncl :
- dim_avg_n over dimension(s): time
- long_name :
- Incoming Solar Flux, Monthly Means
- standard_name :
- Incoming Solar Flux
- units :
- W m-2
- valid_min :
- 0.00000
- valid_max :
- 800.000
array([[[495.0077 , 495.0077 , ..., 495.0077 , 495.0077 ], [494.86923, 494.86923, ..., 494.86923, 494.86923], ..., [ 0. , 0. , ..., 0. , 0. ], [ 0. , 0. , ..., 0. , 0. ]]], dtype=float32) - FLUT(time, lat, lon)float32...
- long_name :
- Top of The Atmosphere Longwave Flux, Monthly Means, All-Sky conditions
- standard_name :
- TOA Longwave Flux - All-Sky
- units :
- W m-2
- valid_min :
- 0.00000
- valid_max :
- 400.000
array([[[187.38461, 187.38461, ..., 187.38461, 187.38461], [190.79231, 190.79231, ..., 190.79231, 190.79231], ..., [170.49231, 170.49231, ..., 170.49231, 170.49231], [170.66924, 170.66924, ..., 170.66924, 170.66924]]], dtype=float32) - FLUTC(time, lat, lon)float32...
- long_name :
- Top of The Atmosphere Longwave Flux, Monthly Means, Clear-Sky conditions
- standard_name :
- TOA Longwave Flux - Clear-Sky
- units :
- W m-2
- valid_min :
- 0.00000
- valid_max :
- 400.000
array([[[188.78462, 188.78462, ..., 188.78462, 188.78462], [192.3923 , 192.3923 , ..., 192.3923 , 192.3923 ], ..., [179.29231, 179.29231, ..., 179.29231, 179.29231], [178.93077, 178.93077, ..., 178.93077, 178.93077]]], dtype=float32) - FSNTOA(time, lat, lon)float32...
- lunits :
- W m-2
- long_name :
- TOA net shortwave
array([[[ 1.477692e+02, 1.477692e+02, ..., 1.477692e+02, 1.477692e+02], [ 1.536615e+02, 1.536615e+02, ..., 1.536615e+02, 1.536615e+02], ..., [-4.900000e-02, -4.900000e-02, ..., -4.900000e-02, -4.900000e-02], [-4.900000e-02, -4.900000e-02, ..., -4.900000e-02, -4.900000e-02]]], dtype=float32) - FSNTOAC(time, lat, lon)float32...
- long_name :
- TOA clear-sky net shortwave
- lunits :
- W m-2
array([[[ 1.509154e+02, 1.509154e+02, ..., 1.509154e+02, 1.509154e+02], [ 1.565077e+02, 1.565077e+02, ..., 1.565077e+02, 1.565077e+02], ..., [-6.000000e-03, -6.000000e-03, ..., -6.000000e-03, -6.000000e-03], [-6.000000e-03, -6.000000e-03, ..., -6.000000e-03, -6.000000e-03]]], dtype=float32) - SWCF(time, lat, lon)float32...
- long_name :
- Top of The Atmosphere Cloud Radiative Effects Shortwave Flux, Monthly Means
- standard_name :
- TOA CRE Shortwave Flux
- units :
- W m-2
- valid_min :
- -400.000
- valid_max :
- 100.000
array([[[-3.149 , -3.149 , ..., -3.149 , -3.149 ], [-2.844985, -2.844985, ..., -2.844985, -2.844985], ..., [-0.043 , -0.043 , ..., -0.043 , -0.043 ], [-0.043 , -0.043 , ..., -0.043 , -0.043 ]]], dtype=float32) - LWCF(time, lat, lon)float32...
- long_name :
- Top of The Atmosphere Cloud Radiative Effects Longwave Flux, Monthly Means
- standard_name :
- TOA CRE Longwave Flux
- units :
- W m-2
- valid_min :
- -100.000
- valid_max :
- 300.000
array([[[1.391205, 1.391205, ..., 1.391205, 1.391205], [1.616079, 1.616079, ..., 1.616079, 1.616079], ..., [8.813915, 8.813915, ..., 8.813915, 8.813915], [8.271692, 8.271692, ..., 8.271692, 8.271692]]], dtype=float32) - RESTOA(time, lat, lon)float32...
- long_name :
- residual energy at TOA
- standard_name :
- TOA Net Flux - All-Sky
- units :
- W m-2
- valid_min :
- -400.000
- valid_max :
- 400.000
array([[[ -39.603077, -39.603077, ..., -39.603077, -39.603077], [ -37.104614, -37.104614, ..., -37.104614, -37.104614], ..., [-170.55385 , -170.55385 , ..., -170.55385 , -170.55385 ], [-170.71538 , -170.71538 , ..., -170.71538 , -170.71538 ]]], dtype=float32) - ALBEDO(time, lat, lon)float32...
- long_name :
- TOA albedo
array([[[0.701483, 0.701483, ..., 0.701483, 0.701483], [0.689493, 0.689493, ..., 0.689493, 0.689493], ..., [ nan, nan, ..., nan, nan], [ nan, nan, ..., nan, nan]]], dtype=float32) - ALBEDOC(time, lat, lon)float32...
- long_name :
- TOA clear-sky albedo
array([[[0.695126, 0.695126, ..., 0.695126, 0.695126], [0.683734, 0.683734, ..., 0.683734, 0.683734], ..., [ nan, nan, ..., nan, nan], [ nan, nan, ..., nan, nan]]], dtype=float32) - gw(lat)float64...
- long_name :
- latitude weight
array([0.000152, 0.000457, 0.000761, 0.001065, 0.001369, 0.001673, 0.001976, 0.002278, 0.00258 , 0.002881, 0.003181, 0.00348 , 0.003778, 0.004074, 0.00437 , 0.004664, 0.004957, 0.005248, 0.005538, 0.005826, 0.006112, 0.006397, 0.006679, 0.006959, 0.007238, 0.007514, 0.007788, 0.008059, 0.008328, 0.008594, 0.008858, 0.009119, 0.009378, 0.009633, 0.009886, 0.010135, 0.010381, 0.010625, 0.010865, 0.011102, 0.011335, 0.011565, 0.011791, 0.012014, 0.012233, 0.012448, 0.01266 , 0.012868, 0.013072, 0.013271, 0.013467, 0.013659, 0.013846, 0.01403 , 0.014209, 0.014384, 0.014554, 0.01472 , 0.014881, 0.015038, 0.01519 , 0.015338, 0.015481, 0.015619, 0.015753, 0.015882, 0.016006, 0.016125, 0.016239, 0.016348, 0.016452, 0.016551, 0.016645, 0.016734, 0.016818, 0.016897, 0.016971, 0.017039, 0.017103, 0.017161, 0.017214, 0.017261, 0.017304, 0.017341, 0.017373, 0.017399, 0.017421, 0.017436, 0.017447, 0.017452, 0.017452, 0.017447, 0.017436, 0.017421, 0.017399, 0.017373, 0.017341, 0.017304, 0.017261, 0.017214, 0.017161, 0.017103, 0.017039, 0.016971, 0.016897, 0.016818, 0.016734, 0.016645, 0.016551, 0.016452, 0.016348, 0.016239, 0.016125, 0.016006, 0.015882, 0.015753, 0.015619, 0.015481, 0.015338, 0.01519 , 0.015038, 0.014881, 0.01472 , 0.014554, 0.014384, 0.014209, 0.01403 , 0.013846, 0.013659, 0.013467, 0.013271, 0.013072, 0.012868, 0.01266 , 0.012448, 0.012233, 0.012014, 0.011791, 0.011565, 0.011335, 0.011102, 0.010865, 0.010625, 0.010381, 0.010135, 0.009886, 0.009633, 0.009378, 0.009119, 0.008858, 0.008594, 0.008328, 0.008059, 0.007788, 0.007514, 0.007238, 0.006959, 0.006679, 0.006397, 0.006112, 0.005826, 0.005538, 0.005248, 0.004957, 0.004664, 0.00437 , 0.004074, 0.003778, 0.00348 , 0.003181, 0.002881, 0.00258 , 0.002278, 0.001976, 0.001673, 0.001369, 0.001065, 0.000761, 0.000457, 0.000152])
- version :
- This is version 2.8: March 7, 2014
- institution :
- NASA Langley Research Center
- comment :
- Data is from East to West and South to North. Climatology from 03/2000 to 02/2013.
- title :
- CERES EBAF (Energy Balanced and Filled) Fluxes. Monthly Averages and 13-year Climatology.
- AMWG_author :
- Cecile Hannay
- AMWG_creation_date :
- Thu Jul 24 16:08:10 MDT 2014 for AMWG package
- history :
- Thu Jul 24 16:08:10 2014: ncks -A -v gw CERES2_01_climo.nc CERES-EBAF_01_climo.nc
- NCO :
- 20140724
We see that this dataset is gridded on a global 0.5° grid, with several variables related to solar fluxes (ex. TOA net shortwave)
Build a parser using ecgtools#
We can use the ecgtools package to help build an intake-esm catalog for this dataset!
We start first by using pathlib to generically look at the filepaths
path = pathlib.Path(files[0])
path.stem
'AIRS_01_climo'
This path can be split using .split('_'), separates the path into the following:
observational dataset source
month number or season
“climo”
path.stem.split('_')
['AIRS', '01', 'climo']
We can also gather useful insight by opening the file!
ds = xr.open_dataset(files[0])
Let’s look at the variable “Temperature” (T)
ds.T
<xarray.DataArray 'T' (time: 1, lev: 13, lat: 94, lon: 192)>
[234624 values with dtype=float32]
Coordinates:
* lat (lat) float64 -88.54 -86.65 -84.75 -82.85 ... 84.75 86.65 88.54
* time (time) int32 1
* lev (lev) float32 1e+03 925.0 850.0 700.0 ... 200.0 150.0 100.0 70.0
* lon (lon) float32 0.0 1.875 3.75 5.625 7.5 ... 352.5 354.4 356.2 358.1
Attributes:
units: K
long_name: Temperature
climatology: AIRS monthly climatology 9/2002-8/2006- time: 1
- lev: 13
- lat: 94
- lon: 192
- ...
[234624 values with dtype=float32]
- lat(lat)float64-88.54 -86.65 ... 86.65 88.54
- units :
- degrees_north
- long_name :
- latitude
array([-88.541946, -86.653168, -84.753227, -82.850769, -80.947357, -79.04348 , -77.139351, -75.235054, -73.330658, -71.426186, -69.52166 , -67.617104, -65.712509, -63.807896, -61.903259, -59.998611, -58.093948, -56.189278, -54.284599, -52.379913, -50.47522 , -48.570518, -46.665817, -44.761108, -42.8564 , -40.951687, -39.04697 , -37.14225 , -35.23753 , -33.332806, -31.428082, -29.523355, -27.618628, -25.7139 , -23.80917 , -21.90444 , -19.999708, -18.094976, -16.190243, -14.28551 , -12.380776, -10.476042, -8.571308, -6.666573, -4.761838, -2.857103, -0.952368, 0.952368, 2.857103, 4.761838, 6.666573, 8.571308, 10.476042, 12.380776, 14.28551 , 16.190243, 18.094976, 19.999708, 21.90444 , 23.80917 , 25.7139 , 27.618628, 29.523355, 31.428082, 33.332806, 35.23753 , 37.14225 , 39.04697 , 40.951687, 42.8564 , 44.761108, 46.665817, 48.570518, 50.47522 , 52.379913, 54.284599, 56.189278, 58.093948, 59.998611, 61.903259, 63.807896, 65.712509, 67.617104, 69.52166 , 71.426186, 73.330658, 75.235054, 77.139351, 79.04348 , 80.947357, 82.850769, 84.753227, 86.653168, 88.541946]) - time(time)int321
array([1], dtype=int32)
- lev(lev)float321e+03 925.0 850.0 ... 100.0 70.0
- fill_value :
- 1e+36
- units :
- hPa
- long_name :
- Pressure
array([1000., 925., 850., 700., 600., 500., 400., 300., 250., 200., 150., 100., 70.], dtype=float32) - lon(lon)float320.0 1.875 3.75 ... 356.2 358.1
- units :
- degrees_east
- long_name :
- longitude
array([ 0. , 1.875, 3.75 , 5.625, 7.5 , 9.375, 11.25 , 13.125, 15. , 16.875, 18.75 , 20.625, 22.5 , 24.375, 26.25 , 28.125, 30. , 31.875, 33.75 , 35.625, 37.5 , 39.375, 41.25 , 43.125, 45. , 46.875, 48.75 , 50.625, 52.5 , 54.375, 56.25 , 58.125, 60. , 61.875, 63.75 , 65.625, 67.5 , 69.375, 71.25 , 73.125, 75. , 76.875, 78.75 , 80.625, 82.5 , 84.375, 86.25 , 88.125, 90. , 91.875, 93.75 , 95.625, 97.5 , 99.375, 101.25 , 103.125, 105. , 106.875, 108.75 , 110.625, 112.5 , 114.375, 116.25 , 118.125, 120. , 121.875, 123.75 , 125.625, 127.5 , 129.375, 131.25 , 133.125, 135. , 136.875, 138.75 , 140.625, 142.5 , 144.375, 146.25 , 148.125, 150. , 151.875, 153.75 , 155.625, 157.5 , 159.375, 161.25 , 163.125, 165. , 166.875, 168.75 , 170.625, 172.5 , 174.375, 176.25 , 178.125, 180. , 181.875, 183.75 , 185.625, 187.5 , 189.375, 191.25 , 193.125, 195. , 196.875, 198.75 , 200.625, 202.5 , 204.375, 206.25 , 208.125, 210. , 211.875, 213.75 , 215.625, 217.5 , 219.375, 221.25 , 223.125, 225. , 226.875, 228.75 , 230.625, 232.5 , 234.375, 236.25 , 238.125, 240. , 241.875, 243.75 , 245.625, 247.5 , 249.375, 251.25 , 253.125, 255. , 256.875, 258.75 , 260.625, 262.5 , 264.375, 266.25 , 268.125, 270. , 271.875, 273.75 , 275.625, 277.5 , 279.375, 281.25 , 283.125, 285. , 286.875, 288.75 , 290.625, 292.5 , 294.375, 296.25 , 298.125, 300. , 301.875, 303.75 , 305.625, 307.5 , 309.375, 311.25 , 313.125, 315. , 316.875, 318.75 , 320.625, 322.5 , 324.375, 326.25 , 328.125, 330. , 331.875, 333.75 , 335.625, 337.5 , 339.375, 341.25 , 343.125, 345. , 346.875, 348.75 , 350.625, 352.5 , 354.375, 356.25 , 358.125], dtype=float32)
- units :
- K
- long_name :
- Temperature
- climatology :
- AIRS monthly climatology 9/2002-8/2006
You’ll notice that we have additional attributes about the dataset which we can use, including:
units
long_name
climatology, which includes more information about what dates were used
Adding the parsing function#
def parse_amwg_obs(file):
"""Atmospheric observational data used within the AMWG diagnostic package
Parameters
----------
file: str
filepath from the data directory including observational climatologies
Returns
-------
info: dict
Dictionary with information to add as columns in the data catalog
"""
file = pathlib.Path(file)
info_list = []
info = {}
try:
stem = file.stem
split = stem.split('_')
info['source'] = split[0]
temporal = split[-2]
if len(split[-2]) == 2:
month_number = int(temporal)
info['time_period'] = 'monthly'
info['temporal'] = datetime(2020, month_number, 1).strftime('%b').upper()
elif len(temporal) == 3:
info['time_period'] = 'seasonal'
info['temporal'] = temporal
else:
info['time_period'] = np.nan
info['temporal'] = np.nan
with xr.open_dataset(file, chunks={}, decode_times=False) as ds:
variables = [v for v, da in ds.variables.items() if 'time' in da.dims]
info['variables'] = variables
info['path'] = str(path)
return info
except Exception:
return {INVALID_ASSET: file, TRACEBACK: traceback.format_exc()}
'time' in ds['T'].dims
True
info = parse_amwg_obs(files[0])
info
{'source': 'AIRS',
'time_period': 'monthly',
'temporal': 'JAN',
'variables': ['T', 'time', 'RELHUM', 'O3', 'SHUM'],
'path': '/glade/p/cesm/amwg/amwg_diagnostics/obs_data/AIRS_01_climo.nc'}
pd.DataFrame(info)
| source | time_period | temporal | variables | path | |
|---|---|---|---|---|---|
| 0 | AIRS | monthly | JAN | T | /glade/p/cesm/amwg/amwg_diagnostics/obs_data/A... |
| 1 | AIRS | monthly | JAN | time | /glade/p/cesm/amwg/amwg_diagnostics/obs_data/A... |
| 2 | AIRS | monthly | JAN | RELHUM | /glade/p/cesm/amwg/amwg_diagnostics/obs_data/A... |
| 3 | AIRS | monthly | JAN | O3 | /glade/p/cesm/amwg/amwg_diagnostics/obs_data/A... |
| 4 | AIRS | monthly | JAN | SHUM | /glade/p/cesm/amwg/amwg_diagnostics/obs_data/A... |
We can test this on a single file!
This dictionary can easily be converted into a pandas.Dataframe
pd.DataFrame([info, info])
| source | time_period | temporal | variables | path | |
|---|---|---|---|---|---|
| 0 | AIRS | monthly | JAN | [T, time, RELHUM, O3, SHUM] | /glade/p/cesm/amwg/amwg_diagnostics/obs_data/A... |
| 1 | AIRS | monthly | JAN | [T, time, RELHUM, O3, SHUM] | /glade/p/cesm/amwg/amwg_diagnostics/obs_data/A... |
Use ecgtools to build the catalog#
Now that we have our parser ready to go, we can use ecgtools to build the catalog
b = Builder(
# Directory with the output
f"/glade/p/cesm/amwg/amwg_diagnostics/obs_data/",
depth=3,
# Number of jobs to execute - should be equal to # threads you are using
njobs=-1,
)
Once we set up the builder, can call .build, passing in the parser we built
b.build(parse_amwg_obs)
b.df
| source | time_period | temporal | variables | path | |
|---|---|---|---|---|---|
| 0 | AIRS | monthly | JAN | [T, time, RELHUM, O3, SHUM] | /glade/p/cesm/amwg/amwg_diagnostics/obs_data/A... |
| 1 | AIRS | monthly | FEB | [T, time, RELHUM, O3, SHUM] | /glade/p/cesm/amwg/amwg_diagnostics/obs_data/A... |
| 2 | AIRS | monthly | MAR | [T, time, RELHUM, O3, SHUM] | /glade/p/cesm/amwg/amwg_diagnostics/obs_data/A... |
| 3 | AIRS | monthly | APR | [T, time, RELHUM, O3, SHUM] | /glade/p/cesm/amwg/amwg_diagnostics/obs_data/A... |
| 4 | AIRS | monthly | MAY | [T, time, RELHUM, O3, SHUM] | /glade/p/cesm/amwg/amwg_diagnostics/obs_data/A... |
| ... | ... | ... | ... | ... | ... |
| 3778 | mlsw | seasonal | ANN | [T, time, Z3, H2O, O3, RELHUM] | /glade/p/cesm/amwg/amwg_diagnostics/obs_data/A... |
| 3779 | mlsw | seasonal | DJF | [T, time, Z3, H2O, O3, RELHUM] | /glade/p/cesm/amwg/amwg_diagnostics/obs_data/A... |
| 3780 | mlsw | seasonal | JJA | [T, time, Z3, H2O, O3, RELHUM] | /glade/p/cesm/amwg/amwg_diagnostics/obs_data/A... |
| 3781 | mlsw | seasonal | MAM | [T, time, Z3, H2O, O3, RELHUM] | /glade/p/cesm/amwg/amwg_diagnostics/obs_data/A... |
| 3782 | mlsw | seasonal | SON | [T, time, Z3, H2O, O3, RELHUM] | /glade/p/cesm/amwg/amwg_diagnostics/obs_data/A... |
1243 rows × 5 columns
Save the catalog#
b.save(
# File path - could save as .csv (uncompressed csv) or .csv.gz (compressed csv)
"/glade/work/mgrover/intake-esm-catalogs/amwg_obs_datasets.csv",
# Column name including filepath
path_column_name='path',
# Column name including variables
variable_column_name='variables',
# Data file format - could be netcdf or zarr (in this case, netcdf)
data_format="netcdf",
# Which attributes to groupby when reading in variables using intake-esm
groupby_attrs=["source", "time_period"],
# Aggregations which are fed into xarray when reading in data using intake
aggregations=[
{
'type': 'join_new',
'attribute_name': 'temporal',
'options': {'coords': 'minimal', 'compat': 'override'},
},
],
)
Saved catalog location: /glade/work/mgrover/intake-esm-catalogs/amwg_obs_datasets.json and /glade/work/mgrover/intake-esm-catalogs/amwg_obs_datasets.csv
/glade/scratch/mgrover/ipykernel_140729/3449059562.py:1: UserWarning: Unable to parse 2541 assets/files. A list of these assets can be found in /glade/work/mgrover/intake-esm-catalogs/invalid_assets_amwg_obs_datasets.csv.
b.save(
Read in Observational Data from the Catalog#
We use intake-esm to read in the observational data from the catalog we just created
obs_catalog = intake.open_esm_datastore(
"/glade/work/mgrover/intake-esm-catalogs/amwg_obs_datasets.json",
csv_kwargs={"converters": {"variables": ast.literal_eval}},
sep="/",
)
We are interested in Temperature (T), from the AIRS dataset, which comes from NASA
obs_catalog_subset = obs_catalog.search(variables='T', source='AIRS')
dsets = obs_catalog_subset.to_dataset_dict()
--> The keys in the returned dictionary of datasets are constructed as follows:
'source/time_period'
Our dictionary of datasets has two keys, one for seasonal and the other for monthly climatologies
dsets.keys()
dict_keys(['AIRS/seasonal', 'AIRS/monthly'])
seasonal_obs_ds = dsets['AIRS/seasonal']
monthly_obs_ds = dsets['AIRS/monthly']
Plot Observational Temperature Climatologies#
Now that we read in our data, we can plot up our results!
Plot Seasonal Temperature Climatologies from Observations#
seasonal_obs_plot = (
seasonal_obs_ds.isel(time=0, lev=range(3)).T.hvplot.quadmesh(
rasterize=True, groupby=['lev', 'temporal'], cmap='magma', projection=ccrs.Robinson()
)
* gf.coastline
)
seasonal_obs_plot
Plot Monthly Temperature Climatologies from Observations#
monthly_obs_plot = (
monthly_obs_ds.isel(time=0, lev=range(3)).T.hvplot.quadmesh(
rasterize=True, groupby=['lev', 'temporal'], cmap='magma', projection=ccrs.Robinson()
)
* gf.coastline
)
monthly_obs_plot