Creating Visualizations of Intake-ESM Catalogs#

A common initial task when working with a new dataset is figuring out what data is available. This is especially true when working with climate ensembles with several components and time-frequency output (ex. Community Earth System Model Large Ensemble, CESM-LE). Here, we will examine different methods of investigating this catalog

Imports#

Here, we will use intake-esm and graphviz, which can be installed using the following (including jupyterlab too!)

conda install -c conda-forge jupyterlab intake-esm graphviz

Once you install these packages, open jupyterlab!

import intake
from graphviz import Digraph

Read in intake-esm catalog#

col = intake.open_esm_datastore(
    'https://raw.githubusercontent.com/NCAR/cesm-lens-aws/master/intake-catalogs/aws-cesm1-le.json'
)

Typically, the process is to read in the dataframe containing the metadata, but this can be tough to read/understand what data is all there

col.df

	variable	long_name	component	experiment	frequency	vertical_levels	spatial_domain	units	start_time	end_time	path
0	FLNS	net longwave flux at surface	atm	20C	daily	1.0	global	W/m2	1920-01-01 12:00:00	2005-12-31 12:00:00	s3://ncar-cesm-lens/atm/daily/cesmLE-20C-FLNS....
1	FLNSC	clearsky net longwave flux at surface	atm	20C	daily	1.0	global	W/m2	1920-01-01 12:00:00	2005-12-31 12:00:00	s3://ncar-cesm-lens/atm/daily/cesmLE-20C-FLNSC...
2	FLUT	upwelling longwave flux at top of model	atm	20C	daily	1.0	global	W/m2	1920-01-01 12:00:00	2005-12-31 12:00:00	s3://ncar-cesm-lens/atm/daily/cesmLE-20C-FLUT....
3	FSNS	net solar flux at surface	atm	20C	daily	1.0	global	W/m2	1920-01-01 12:00:00	2005-12-31 12:00:00	s3://ncar-cesm-lens/atm/daily/cesmLE-20C-FSNS....
4	FSNSC	clearsky net solar flux at surface	atm	20C	daily	1.0	global	W/m2	1920-01-01 12:00:00	2005-12-31 12:00:00	s3://ncar-cesm-lens/atm/daily/cesmLE-20C-FSNSC...
...	...	...	...	...	...	...	...	...	...	...	...
430	WVEL	vertical velocity	ocn	RCP85	monthly	60.0	global_ocean	centimeter/s	2006-01-16 12:00:00	2100-12-16 12:00:00	s3://ncar-cesm-lens/ocn/monthly/cesmLE-RCP85-W...
431	NaN	NaN	ocn	CTRL	static	NaN	global_ocean	NaN	NaN	NaN	s3://ncar-cesm-lens/ocn/static/grid.zarr
432	NaN	NaN	ocn	HIST	static	NaN	global_ocean	NaN	NaN	NaN	s3://ncar-cesm-lens/ocn/static/grid.zarr
433	NaN	NaN	ocn	RCP85	static	NaN	global_ocean	NaN	NaN	NaN	s3://ncar-cesm-lens/ocn/static/grid.zarr
434	NaN	NaN	ocn	20C	static	NaN	global_ocean	NaN	NaN	NaN	s3://ncar-cesm-lens/ocn/static/grid.zarr

435 rows × 11 columns

You can search via intake-esm, using the following syntax

cat = col.search(experiment='20C', frequency='monthly')

Here again, it is tough to see everything that is here, also it requires knowing which experiments are in the dataset, and which frequency you are looking for

cat.df

	variable	long_name	component	experiment	frequency	vertical_levels	spatial_domain	units	start_time	end_time	path
0	FLNS	net longwave flux at surface	atm	20C	monthly	1.0	global	W/m2	1920-01-16 12:00:00	2005-12-16 12:00:00	s3://ncar-cesm-lens/atm/monthly/cesmLE-20C-FLN...
1	FLNSC	clearsky net longwave flux at surface	atm	20C	monthly	1.0	global	W/m2	1920-01-16 12:00:00	2005-12-16 12:00:00	s3://ncar-cesm-lens/atm/monthly/cesmLE-20C-FLN...
2	FLUT	upwelling longwave flux at top of model	atm	20C	monthly	1.0	global	W/m2	1920-01-16 12:00:00	2005-12-16 12:00:00	s3://ncar-cesm-lens/atm/monthly/cesmLE-20C-FLU...
3	FSNS	net solar flux at surface	atm	20C	monthly	1.0	global	W/m2	1920-01-16 12:00:00	2005-12-16 12:00:00	s3://ncar-cesm-lens/atm/monthly/cesmLE-20C-FSN...
4	FSNSC	clearsky net solar flux at surface	atm	20C	monthly	1.0	global	W/m2	1920-01-16 12:00:00	2005-12-16 12:00:00	s3://ncar-cesm-lens/atm/monthly/cesmLE-20C-FSN...
...	...	...	...	...	...	...	...	...	...	...	...
60	VNT	flux of heat in grid-y direction	ocn	20C	monthly	60.0	global_ocean	degC/s	1920-01-16 12:00:00	2005-12-16 12:00:00	s3://ncar-cesm-lens/ocn/monthly/cesmLE-20C-VNT...
61	VVEL	velocity in grid-y direction	ocn	20C	monthly	60.0	global_ocean	centimeter/s	1920-01-16 12:00:00	2005-12-16 12:00:00	s3://ncar-cesm-lens/ocn/monthly/cesmLE-20C-VVE...
62	WTS	salt flux across top face	ocn	20C	monthly	60.0	global_ocean	gram/kilogram/s	1920-01-16 12:00:00	2005-12-16 12:00:00	s3://ncar-cesm-lens/ocn/monthly/cesmLE-20C-WTS...
63	WTT	heat flux across top face	ocn	20C	monthly	60.0	global_ocean	degC/s	1920-01-16 12:00:00	2005-12-16 12:00:00	s3://ncar-cesm-lens/ocn/monthly/cesmLE-20C-WTT...
64	WVEL	vertical velocity	ocn	20C	monthly	60.0	global_ocean	centimeter/s	1920-01-16 12:00:00	2005-12-16 12:00:00	s3://ncar-cesm-lens/ocn/monthly/cesmLE-20C-WVE...

65 rows × 11 columns

Using Graphviz in a Jupyter Notebook#

Graphviz offers an interface to create network graphs

Main “components” of Graphviz #

Digraph class
- This is the main class that is used to build the visualization - typically assign to a variable dot, but you can use any variable you like!
Node
- The “bubbles” which contain a numbered label (ex. ‘1’) and a label (ex. ‘HIST’)
- These can be connected together - the numbered label must be a unique integer
Edge
- Edges connect the different nodes, using the numbered indices (ex. .edge('1', '3') would connect the first and third nodes

Example of case visualization#

# Create Digraph object
dot = Digraph()

# Create the first node which serves as the main parent
dot.node('1', label='HIST')

dot.node('2', label='ocn')
dot.edge('1', '2')

# Add a monthly child from the ocn component parent
dot.node('3', label='monthly')
dot.edge('2', '3')

# Add a daily child from the ocn component parent
dot.node('4', label='daily')
dot.edge('2', '4')

# Add an atm component node and connect to experiment parent
dot.node('5', label='atm')
dot.edge('1', '5')

# Add a monthly child from the atm component parent
dot.node('6', label='monthly')
dot.edge('5', '6')

# Add a weekly child from the atm component parent
dot.node('7', label='weekly')
dot.edge('5', '7')

# Visualize the graph
dot

../../../_images/e696afcdb2fcbec7e9268399b9609587a5f0f51f6c5249afe3b3a0fcd46f0376.svg

Looping through the CESM-LE catalog#

Let’s apply this to our data catalog, assigning the dataframe with dataset attributes to df

df = col.df

# Create Digraph object - use the left to right orientation instead of vertical
dot = Digraph(graph_attr={'rankdir': 'LR'})

# Start counting at one for node numbers
num_node = 1

# Loop through the different experiments
for experiment in df.experiment.unique():
    exp_i = num_node
    dot.node(str(exp_i), label=experiment)
    num_node += 1

    # Loop through the different components in each experiment
    for component in df.loc[df.experiment == experiment].component.unique():
        comp_i = num_node
        dot.node(str(comp_i), label=component)
        dot.edge(str(exp_i), str(comp_i))
        num_node += 1

        # Loop through the frequency in each component within each experiment
        for frequency in df.loc[
            (df.experiment == experiment) & (df.component == component)
        ].frequency.unique():
            freq_i = num_node
            dot.node(str(freq_i), label=frequency)
            dot.edge(str(comp_i), str(freq_i))
            num_node += 1
        comp_i += 1
    exp_i += 1

dot

../../../_images/943c5e4d80e0ce8f02ecfac7b6ec89c7db19f3e5e45efe9d570e3860a84633d4.svg

Conclusion#

Graphviz can be a helpful tool when visualizing what data is within your data catalog - I hope this provides a good starting point in terms of using this with intake-esm catalogs!

Xarray Tutorial CESM Diagnostics Discussion

11 June 2021

Recent Posts

Archives

Creating Visualizations of Intake-ESM Catalogs#

Imports#

Read in intake-esm catalog#

Using Graphviz in a Jupyter Notebook#

Main “components” of Graphviz #

Example of case visualization#

Looping through the CESM-LE catalog#

Conclusion#

11 June 2021

Recent Posts

Archives

Creating Visualizations of Intake-ESM Catalogs#

Imports#

Read in intake-esm catalog#

Using Graphviz in a Jupyter Notebook#

Main “components” of Graphviz#

Example of case visualization#

Looping through the CESM-LE catalog#

Conclusion#

Main “components” of Graphviz #