Creating Model Documentation Using Jupyterbook and Intake-esm#

A common step to any project is documenting your data and your data workflow. Fortunately, open tools in the scientific python ecosystem make that much easier! In this example, we will cover creating your github repo, creating the catalog, visualizing the catalog, and generating a static webpage you can share with collaborators!

Fair Warning#

This week’s post is quite detailed, so just a warning! If you would like to look at the finished product, check out the following

By the end of this post, we will cover how to build a webpage that looks like this CESM book page

Create your Github Repository#

Go to Github and select “New” in the top lefthand corner next to “Repositories” - this will pull up the following window. Once you are here, go ahead and name your repository!

Be sure to add:

  • Repository name

  • Description

  • README

  • Gitignore (use the python template)

  • Choose a license

screen_grab

Clone your Repository#

At this point, you can go ahead and clone your repository! You can either clone to your local machine, or to some Jupyterhub (such as the NCAR Jupyterhub), which will do in this case.

Clone to your machine!#

We want to clone to the repository within the Jupyterhub, so once logging on, we open a terminal and paste the link using the following syntax

git clone https://github.com/mgrover1/cesm-test-data.git

Create a docs directory#

Now that you cloned the repository, move into it and create a docs directory using the following

cd cesm-test-data
mkdir docs

Build your Catalog#

Open a new Jupyter Notebook called model_documentation.ipynb within the docs directory and select a development environment which includes the following:

  • jupyter-book

  • ecgtools

If you haven’t installed these yet, you can use conda and pip (ecgtools is not yet on conda-forge)

conda install -c conda-forge jupyter-book intake-esm graphviz
pip install ecgtools

In this case, follow the instructions in the Building an Intake-esm catalog from CESM2 History Files post provides the instructions for building the data catalog

Read the Catalog and Visualize the Components and Frequency#

A couple weeks ago, we covered Creating Visualizations of Intake-ESM Catalogs which is helpful for understanding how Graphviz works!

Imports#

import intake
from graphviz import Digraph

Read in the Test History Catalog#

col = intake.open_esm_datastore('/glade/work/mgrover/cesm-hist-test.json')

We will assign the dataframe from the catalog to its own variable

df = col.df

Visualize the Catalog#

Using the Diagraph object from the Graphviz library, we setup a loop to create the visualization using the three categories

  • Case

  • Component

  • Frequency

# Create Digraph object - use the left to right orientation instead of vertical
dot = Digraph(graph_attr={'rankdir': 'LR'})

# Save the catalog as a pdf
dot.format = 'pdf'

# Start counting at one for node numbers
num_node = 1

# Loop through the different cases
for case in df.case.unique():
    case_i = num_node
    dot.node(str(case_i), label=case)
    num_node += 1

    # Loop through the different components in each case
    for component in df.loc[df.case == case].component.unique():
        comp_i = num_node
        dot.node(str(comp_i), label=component)
        dot.edge(str(case_i), str(comp_i))
        num_node += 1

        # Loop through the frequency in each component within each experiment
        for frequency in df.loc[(df.case == case) & (df.component == component)].frequency.unique():
            freq_i = num_node

            # Pull out the the stream information
            stream = df.loc[
                (df.case == case) & (df.component == component) & (df.frequency == frequency)
            ].stream.values[0]

            # Add both stream and frequency information to these bubbles
            dot.node(str(freq_i), label=f'stream: {stream} \n frequency: {frequency}')
            dot.edge(str(comp_i), str(freq_i))
            num_node += 1
        comp_i += 1
    case_i += 1

Now visualize it in inline by running a cell with just the dot object

dot
../../../_images/0be4bbd0c5d5e7161727484e566351d39c92ddba78f9f336746f983c8aa1656e.svg

Save the Visualization#

In the block of code above, we specified dot.format = 'pdf' which will ensure that when we save the graph, it is in PDF format. Other options include (but not limited to) svg and png!

The Diagraph method for saving is .render() with the filename in the argument (within the parentheses)

dot.render('cesm_test_catalog')
'cesm_test_catalog.pdf'

This saved a file within your directory called cesm_test_catalog.pdf! You can double click this within your file browser to take a look

Add Jupyterbook files#

If you are curious about Jupyterbook, be sure to checkout their official documentation, specifically their building your first book tutorial!

The main two files we need now are

  • The table of contents (_toc.yml)

  • The config file (_config.yml)

Go ahead and create text files with your docs directory using those identical filenames

Adding to your Config (_config.yml) file#

Within your _config.yml file, input the following

title: "CESM Test Data"
logo: images/cesm.jpg
execute:
  execute_notebooks: "off"

Add to your Table of Contents (_toc.yml) file#

This is where you place your content - in this case, the model_documentation.ipynb notebook. Jupyterbook does not require you specify the file type here - so leave off the .ipynb

- file: model_documentation

Build your Book!#

Now that you have your content, config file, and table of contents, it’s time to build the book. Make sure you are in your repository root directory, and run the following

jupyter-book build docs

If it built correctly, you should see the following

===============================================================================

Finished generating HTML for book.
Your book's HTML pages are here:
    docs/_build/html/
You can look at your book by opening this file in a browser:
    docs/_build/html/index.html
Or paste this line directly into your browser bar:
    file:///glade/work/mgrover/git_repos/cesm-test-data/docs/_build/html/index.html            

===============================================================================

View the Book on Github#

It can be difficult to view the book on the Jupyterhub, but fortunately we can use Github for publishing this online!

The Jupyterbook publish your book online docs are helpful here, so we use the second part here describing using Github Pages

Install Github Pages Import#

If you have not done so already, install the following

pip install ghp-import

Build the book and push to your Github Pages branch#

Move to your docs directory again, and run the following (after building your book)

ghp-import -n -p -f _build/html

Rebuilding your book#

If you make changes to your notebook or want to rebuild your book, run the following within your project root directory

jupyter-book build docs

and this within your docs directory

ghp-import -n -p -f _build/html

Go checkout your book!#

Your book will be published along the following url structure

{github_username}.github.io/{repository_name}

For this example, the book can be found here

https://mgrover1.github.io/cesm-test-data/

Conclusion#

This was a fairly in-depth post which covered content from previous ESDS blog posts, but I hope this provides a starting point for documentation your data, visualizing the data available, and sharing your data documentation with others!