Using Intake-ESM’s New Derived Variable Functionality#

Last week, Anderson Banihirwe added a new feature to intake-esm, enabling users to add “derived variables” to catalogs! This is an exciting addition, and although this has not been included in a release yet, you are welcome to test out the functionality!

Keep in mind that this is an initial prototype and the API is likely to change.

What is a “Derived Variable”#

A “derived variable” in this case is a variable that doesn’t itself exist in an intake-esm catalog, but can be computed (i.e., “derived”) from variables that do exist in the catalog. Currently, the derived variable implementation requires variables on the same grid, etc.; i.e., it assumes that all variables involved can be merged within the same dataset.

An example of a derived variable could be temperature in degrees Fahrenheit. Often times, climate model models write temperature in Celsius or Kelvin, but the user may want degrees Fahrenheit! This is a really simple example; derived variables could include more sophsticated diagnostic output like aggregations of terms in a tracer budget or gradients in a particular field.

A traditional workflow for “derived variables” might consist of the following:

Load the data
Apply some function to the loaded datasets
Plot the output

This has shortcomings in the context of an automated process. For example, imagine the following.

for 

But what if we could couple those first two steps? What if we could have some set of **variable definitions**, consisting of variable requirements, such as `dependent variables`, and a function which derives the quantity. This is what the `derived_variable` funtionality offers in `intake-esm`! This enables users to share a "registry" of derived variables across catalogs!

Let's get started with an example!

Installing the Development Version of Intake-ESM#

Since this has not yet been released in a version of Intake-ESM, you can install the development version using the following:

pip install git+https://github.com/intake/intake-esm.git

Imports#

import holoviews as hv
import hvplot
import hvplot.xarray
import intake
import numpy as np
from distributed import Client
from intake_esm.derived import DerivedVariableRegistry
from ncar_jobqueue import NCARCluster

hv.extension('bokeh')

Spin up a Dask Cluster#

cluster = NCARCluster()
cluster.scale(10)
client = Client(cluster)
client

/glade/work/mgrover/miniconda3/envs/cesm-collections-dev/lib/python3.9/site-packages/distributed/node.py:160: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 40217 instead
  warnings.warn(

Client

Client-a73eb1b3-3352-11ec-8fa4-3cecef1b11fa

Connection method: Cluster object	Cluster type: dask_jobqueue.PBSCluster
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/mgrover/proxy/40217/status

Cluster Info

PBSCluster

0bcacd1c

Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/mgrover/proxy/40217/status	Workers: 0
Total threads: 0	Total memory: 0 B

Scheduler Info

Scheduler

Scheduler-1d915812-9bca-44c2-aa26-a00dc4ddb19f

Comm: tcp://10.12.206.54:38580	Workers: 0
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/mgrover/proxy/40217/status	Total threads: 0
Started: Just now	Total memory: 0 B

Workers

How to add a Derived Variable#

Let’s compute a derived variable - wind speed! This can be derived from using the zonal (U) and meridional (V) components of the wind.

def calc_wind_speed(u, v):
    return np.sqrt(u**2 + v**2)

Creating our Derived Variable Registry#

We need to instantiate our derived variable registry, which will store our derived variable information! We use the variable dvr for this (DerivedVariableRegistry).

dvr = DerivedVariableRegistry()

In order to register this derived variable we need to add a decorator for our function, as seen below. This allows us to define our derived variable, dependent variables, and the function associated with the calculation.

@dvr.register(variable='wind_speed', dependent_variables=['U', 'V'])
def calc_wind_speed(ds):
    ds['wind_speed'] = np.sqrt(ds.U**2 + ds.V**2)
    return ds

You’ll notice dvr now has a registered variable, wind_speed, which was defined in the cell above!

dvr

DerivedVariableRegistry({'wind_speed': DerivedVariable(func=<function calc_wind_speed at 0x2b84d46c84c0>, variable='wind_speed', dependent_variables=['U', 'V'])})

Read in Data with our Registry#

In this case, we will use data from the CESM Large Ensemble. We load in our derived variable catalog using the registry argument.

data_catalog = intake.open_esm_datastore(
    'https://raw.githubusercontent.com/NCAR/cesm-lens-aws/master/intake-catalogs/aws-cesm1-le.json',
    registry=dvr,
)

You’ll notice we have a new field - derived_variable which has 1 unique value.

data_catalog

aws-cesm1-le catalog with 56 dataset(s) from 442 asset(s):

	unique
variable	78
long_name	75
component	5
experiment	4
frequency	6
vertical_levels	3
spatial_domain	5
units	25
start_time	12
end_time	13
path	427
derived_variable	1

Let’s also subset for monthly frequency, as well as the 20th century (20C) and RCP 8.5 (RCP85) experiments.

catalog_subset = data_catalog.search(
    variable=['wind_speed'], frequency='monthly', experiment=['20C', 'RCP85']
)

catalog_subset

aws-cesm1-le catalog with 2 dataset(s) from 4 asset(s):

	unique
variable	2
long_name	2
component	1
experiment	2
frequency	1
vertical_levels	1
spatial_domain	1
units	1
start_time	2
end_time	2
path	4
derived_variable	1

Calling `to_dataset_dict` to Load in the Data#

We load in the data, which lazily adds our calculation for wind_speed to the datasets!

dsets = catalog_subset.to_dataset_dict(
    xarray_open_kwargs={'backend_kwargs': {'storage_options': {'anon': True}}}
)

--> The keys in the returned dictionary of datasets are constructed as follows:
	'component.experiment.frequency'

100.00% [2/2 00:00<00:00]

ds = dsets['atm.RCP85.monthly']
ds

<xarray.Dataset>
Dimensions:     (member_id: 40, time: 1140, lev: 30, lat: 192, lon: 288, nbnd: 2)
Coordinates:
  * lat         (lat) float64 -90.0 -89.06 -88.12 -87.17 ... 88.12 89.06 90.0
  * lev         (lev) float64 3.643 7.595 14.36 24.61 ... 957.5 976.3 992.6
  * lon         (lon) float64 0.0 1.25 2.5 3.75 5.0 ... 355.0 356.2 357.5 358.8
  * member_id   (member_id) int64 1 2 3 4 5 6 7 8 ... 34 35 101 102 103 104 105
  * time        (time) object 2006-01-16 12:00:00 ... 2100-12-16 12:00:00
    time_bnds   (time, nbnd) object dask.array<chunksize=(1140, 2), meta=np.ndarray>
Dimensions without coordinates: nbnd
Data variables:
    U           (member_id, time, lev, lat, lon) float32 dask.array<chunksize=(1, 18, 30, 192, 288), meta=np.ndarray>
    V           (member_id, time, lev, lat, lon) float32 dask.array<chunksize=(1, 18, 30, 192, 288), meta=np.ndarray>
    wind_speed  (member_id, time, lev, lat, lon) float32 dask.array<chunksize=(1, 18, 30, 192, 288), meta=np.ndarray>
Attributes: (12/21)
    Conventions:                       CF-1.0
    NCO:                               4.3.4
    Version:                           $Name$
    host:                              tcs-f02n07
    important_note:                    This data is part of the project 'Blin...
    initial_file:                      b.e11.B20TRC5CNBDRD.f09_g16.105.cam.i....
    ...                                ...
    intake_esm_attrs/vertical_levels:  30.0
    intake_esm_attrs/spatial_domain:   global
    intake_esm_attrs/units:            m/s
    intake_esm_attrs/start_time:       2006-01-16 12:00:00
    intake_esm_attrs/end_time:         2100-12-16 12:00:00
    intake_esm_dataset_key:            atm.RCP85.monthly

xarray.Dataset

Dimensions:
- member_id: 40
- time: 1140
- lev: 30
- lat: 192
- lon: 288
- nbnd: 2

Coordinates: (6)

lat

(lat)

float64

-90.0 -89.06 -88.12 ... 89.06 90.0

axis :: Y
bounds :: lat_bnds
long_name :: latitude
standard_name :: latitude
units :: degrees_north

array([-90.      , -89.057594, -88.115181, -87.172775, -86.23037 , -85.287956,
       -84.345551, -83.403145, -82.460732, -81.518326, -80.575912, -79.633507,
       -78.691101, -77.748688, -76.806282, -75.863876, -74.921463, -73.979057,
       -73.036652, -72.094238, -71.151833, -70.209427, -69.267014, -68.324608,
       -67.382202, -66.439789, -65.497383, -64.554977, -63.612564, -62.670158,
       -61.727749, -60.785339, -59.842934, -58.900524, -57.958115, -57.015705,
       -56.073299, -55.13089 , -54.18848 , -53.246075, -52.303665, -51.361256,
       -50.41885 , -49.47644 , -48.534031, -47.591621, -46.649216, -45.706806,
       -44.764397, -43.821991, -42.879581, -41.937172, -40.994766, -40.052357,
       -39.109947, -38.167538, -37.225132, -36.282722, -35.340313, -34.397907,
       -33.455498, -32.513088, -31.570681, -30.628273, -29.685863, -28.743456,
       -27.801046, -26.858639, -25.916231, -24.973822, -24.031414, -23.089005,
       -22.146597, -21.204189, -20.26178 , -19.319372, -18.376963, -17.434555,
       -16.492147, -15.549738, -14.607329, -13.664922, -12.722513, -11.780105,
       -10.837696,  -9.895288,  -8.95288 ,  -8.010471,  -7.068063,  -6.125654,
        -5.183246,  -4.240838,  -3.298429,  -2.356021,  -1.413613,  -0.471204,
         0.471204,   1.413613,   2.356021,   3.298429,   4.240838,   5.183246,
         6.125654,   7.068063,   8.010471,   8.95288 ,   9.895288,  10.837696,
        11.780105,  12.722513,  13.664922,  14.607329,  15.549738,  16.492147,
        17.434555,  18.376963,  19.319372,  20.26178 ,  21.204189,  22.146597,
        23.089005,  24.031414,  24.973822,  25.916231,  26.858639,  27.801046,
        28.743456,  29.685863,  30.628273,  31.570681,  32.513088,  33.455498,
        34.397907,  35.340313,  36.282722,  37.225132,  38.167538,  39.109947,
        40.052357,  40.994766,  41.937172,  42.879581,  43.821991,  44.764397,
        45.706806,  46.649216,  47.591621,  48.534031,  49.47644 ,  50.41885 ,
        51.361256,  52.303665,  53.246075,  54.18848 ,  55.13089 ,  56.073299,
        57.015705,  57.958115,  58.900524,  59.842934,  60.785339,  61.727749,
        62.670158,  63.612564,  64.554977,  65.497383,  66.439789,  67.382202,
        68.324608,  69.267014,  70.209427,  71.151833,  72.094238,  73.036652,
        73.979057,  74.921463,  75.863876,  76.806282,  77.748688,  78.691101,
        79.633507,  80.575912,  81.518326,  82.460732,  83.403145,  84.345551,
        85.287956,  86.23037 ,  87.172775,  88.115181,  89.057594,  90.      ])

lev

(lev)

float64

3.643 7.595 14.36 ... 976.3 992.6

formula_terms :: a: hyam b: hybm p0: P0 ps: PS
long_name :: hybrid level at midpoints (1000*(A+B))
positive :: down
standard_name :: atmosphere_hybrid_sigma_pressure_coordinate
units :: level

array([  3.643466,   7.59482 ,  14.356632,  24.61222 ,  38.2683  ,  54.59548 ,
        72.012451,  87.82123 , 103.317127, 121.547241, 142.994039, 168.22508 ,
       197.908087, 232.828619, 273.910817, 322.241902, 379.100904, 445.992574,
       524.687175, 609.778695, 691.38943 , 763.404481, 820.858369, 859.534767,
       887.020249, 912.644547, 936.198398, 957.48548 , 976.325407, 992.556095])

lon
(lon)
float64
0.0 1.25 2.5 ... 356.2 357.5 358.8
long_name :
longitude
units :
degrees_east
```
array([  0.  ,   1.25,   2.5 , ..., 356.25, 357.5 , 358.75])
```

member_id

(member_id)

int64

1 2 3 4 5 6 ... 101 102 103 104 105

array([  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,  13,  14,
        15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,  27,  28,
        29,  30,  31,  32,  33,  34,  35, 101, 102, 103, 104, 105])

time

(time)

object

2006-01-16 12:00:00 ... 2100-12-...

bounds :: time_bnds
long_name :: time

array([cftime.DatetimeNoLeap(2006, 1, 16, 12, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeNoLeap(2006, 2, 15, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeNoLeap(2006, 3, 16, 12, 0, 0, 0, has_year_zero=True),
       ...,
       cftime.DatetimeNoLeap(2100, 10, 16, 12, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeNoLeap(2100, 11, 16, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeNoLeap(2100, 12, 16, 12, 0, 0, 0, has_year_zero=True)],
      dtype=object)

time_bnds

(time, nbnd)

object

dask.array<chunksize=(1140, 2), meta=np.ndarray>

long_name :: time interval endpoints

	Array	Chunk
Bytes	17.81 kiB	17.81 kiB
Shape	(1140, 2)	(1140, 2)
Count	5 Tasks	1 Chunks
Type	object	numpy.ndarray

Data variables: (3)

U

(member_id, time, lev, lat, lon)

float32

dask.array<chunksize=(1, 18, 30, 192, 288), meta=np.ndarray>

cell_methods :: time: mean
long_name :: Zonal wind
mdims :: 1
units :: m/s

	Array	Chunk
Bytes	281.80 GiB	113.91 MiB
Shape	(40, 1140, 30, 192, 288)	(1, 18, 30, 192, 288)
Count	2561 Tasks	2560 Chunks
Type	float32	numpy.ndarray

V

(member_id, time, lev, lat, lon)

float32

dask.array<chunksize=(1, 18, 30, 192, 288), meta=np.ndarray>

cell_methods :: time: mean
long_name :: Meridional wind
mdims :: 1
units :: m/s

	Array	Chunk
Bytes	281.80 GiB	113.91 MiB
Shape	(40, 1140, 30, 192, 288)	(1, 18, 30, 192, 288)
Count	2561 Tasks	2560 Chunks
Type	float32	numpy.ndarray

wind_speed

(member_id, time, lev, lat, lon)

float32

dask.array<chunksize=(1, 18, 30, 192, 288), meta=np.ndarray>

intake_esm_attrs/derivation :: dependent_variables: ['U', 'V']

	Array	Chunk
Bytes	281.80 GiB	113.91 MiB
Shape	(40, 1140, 30, 192, 288)	(1, 18, 30, 192, 288)
Count	15362 Tasks	2560 Chunks
Type	float32	numpy.ndarray

Attributes: (21)
Conventions :
CF-1.0
NCO :
4.3.4
Version :
$Name$
host :
tcs-f02n07
important_note :
This data is part of the project 'Blind Evaluation of Lossy Data-Compression in LENS'. Please exercise caution before using this data for other purposes.
initial_file :
b.e11.B20TRC5CNBDRD.f09_g16.105.cam.i.2006-01-01-00000.nc
logname :
mudryk
nco_openmp_thread_number :
1
revision_Id :
$Id$
source :
CAM
title :
UNSET
topography_file :
/scratch/p/pjk/mudryk/cesm1_1_2_LENS/inputdata/atm/cam/topo/USGS-gtopo30_0.9x1.25_remap_c051027.nc
intake_esm_attrs/component :
atm
intake_esm_attrs/experiment :
RCP85
intake_esm_attrs/frequency :
monthly
intake_esm_attrs/vertical_levels :
30.0
intake_esm_attrs/spatial_domain :
global
intake_esm_attrs/units :
m/s
intake_esm_attrs/start_time :
2006-01-16 12:00:00
intake_esm_attrs/end_time :
2100-12-16 12:00:00
intake_esm_dataset_key :
atm.RCP85.monthly

Apply an Annual Mean Calculation#

Let’s apply an annual average to the data - since all the years are 365 days long, we do not need any special weighting

annual_mean = ds.groupby('time.year').mean('time')

Plot the Output#

We use hvPlot here to plot a global map, selecting only the bottom 8 levels of the atmosphere.

annual_mean.isel(lev=range(-8, 0)).wind_speed.hvplot.quadmesh(
    x='lon',
    y='lat',
    geo=True,
    rasterize=True,
    cmap='viridis',
    coastline=True,
    title='Wind Speed (m/s)',
)

Reading WRF data into Xarray and Visualizing the Output using hvPlot Advanced Plotting Tutorial

22 October 2021

Recent Posts

Archives

Using Intake-ESM’s New Derived Variable Functionality#

What is a “Derived Variable”#

Installing the Development Version of Intake-ESM#

Imports#

Spin up a Dask Cluster#

Client

Cluster Info

PBSCluster

Scheduler Info

Scheduler

Workers

How to add a Derived Variable#

Creating our Derived Variable Registry#

Read in Data with our Registry#

Calling `to_dataset_dict` to Load in the Data#

Apply an Annual Mean Calculation#

Plot the Output#

22 October 2021

Recent Posts

Archives

Using Intake-ESM’s New Derived Variable Functionality#

What is a “Derived Variable”#

Installing the Development Version of Intake-ESM#

Imports#

Spin up a Dask Cluster#

Client

Cluster Info

PBSCluster

Scheduler Info

Scheduler

Workers

How to add a Derived Variable#

Creating our Derived Variable Registry#

Read in Data with our Registry#

Calling to_dataset_dict to Load in the Data#

Apply an Annual Mean Calculation#

Plot the Output#

Calling `to_dataset_dict` to Load in the Data#