{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Creating Model Documentation Using Jupyterbook and Intake-esm\n", "\n", "A common step to any project is documenting your data and your data workflow. Fortunately, open tools in the scientific python ecosystem make that much easier! In this example, we will cover creating your github repo, creating the catalog, visualizing the catalog, and generating a static webpage you can share with collaborators!\n", "\n", "## Fair Warning\n", "This week's post is quite detailed, so just a warning! If you would like to look at the finished product, check out the following\n", "\n", "* [Github repository with the content built here](https://github.com/mgrover1/cesm-test-data)\n", "* [Finished website with content](https://mgrover1.github.io/cesm-test-data/)\n", "\n", "By the end of this post, we will cover how to build a webpage that looks like this\n", "![CESM book page](../images/cesm_book_page.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create your Github Repository\n", "\n", "Go to [Github](https://github.com/) and select \"New\" in the top lefthand corner next to \"Repositories\" - this will pull up the following window. Once you are here, go ahead and name your repository!\n", "\n", "Be sure to add:\n", "* Repository name\n", "* Description\n", "* README\n", "* Gitignore (use the python template)\n", "* Choose a license\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![screen_grab](../images/github_screen_grab.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Clone your Repository\n", "At this point, you can go ahead and clone your repository! You can either clone to your local machine, or to some Jupyterhub (such as the [NCAR Jupyterhub](https://jupyterhub.ucar.edu)), which will do in this case.\n", "\n", "### Copy the link from Github\n", "Copy the link from Github by clicking on the green \"Code\" button\n", "![Github Clone Link](../images/github_clone_link.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Clone to your machine!\n", "We want to clone to the repository within the [Jupyterhub](https://jupyterhub.ucar.edu), so once logging on, we open a terminal and paste the link using the following syntax\n", "\n", "```bash\n", "git clone https://github.com/mgrover1/cesm-test-data.git\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create a docs directory\n", "Now that you cloned the repository, move into it and create a `docs` directory using the following\n", "\n", "```bash\n", "cd cesm-test-data\n", "mkdir docs\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Build your Catalog\n", "Open a new Jupyter Notebook called `model_documentation.ipynb` within the `docs` directory and select a development environment which includes the following:\n", "- jupyter-book\n", "- ecgtools\n", "\n", "If you haven't installed these yet, you can use conda and pip (ecgtools is not yet on conda-forge)\n", "\n", "```bash\n", "conda install -c conda-forge jupyter-book intake-esm graphviz\n", "pip install ecgtools\n", "```\n", "\n", "In this case, follow the instructions in the [Building an Intake-esm catalog from CESM2 History Files](https://ncar.github.io/esds/posts/ecgtools-history-files-example/) post provides the instructions for building the data catalog" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Read the Catalog and Visualize the Components and Frequency\n", "A couple weeks ago, we covered [Creating Visualizations of Intake-ESM Catalogs](https://ncar.github.io/esds/posts/graphviz_example/) which is helpful for understanding how [`Graphviz`](https://graphviz.readthedocs.io/en/stable/manual.html) works!\n", "\n", "### Imports" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "import intake\n", "from graphviz import Digraph" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Read in the Test History Catalog" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "col = intake.open_esm_datastore('/glade/work/mgrover/cesm-hist-test.json')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will assign the dataframe from the catalog to its own variable" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "df = col.df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Visualize the Catalog\n", "Using the `Diagraph` object from the [`Graphviz` library](https://graphviz.readthedocs.io/en/stable/manual.html), we setup a loop to create the visualization using the three categories\n", "* Case\n", "* Component\n", "* Frequency" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# Create Digraph object - use the left to right orientation instead of vertical\n", "dot = Digraph(graph_attr={'rankdir': 'LR'})\n", "\n", "# Save the catalog as a pdf\n", "dot.format = 'pdf'\n", "\n", "# Start counting at one for node numbers\n", "num_node = 1\n", "\n", "# Loop through the different cases\n", "for case in df.case.unique():\n", " case_i = num_node\n", " dot.node(str(case_i), label=case)\n", " num_node += 1\n", "\n", " # Loop through the different components in each case\n", " for component in df.loc[df.case == case].component.unique():\n", " comp_i = num_node\n", " dot.node(str(comp_i), label=component)\n", " dot.edge(str(case_i), str(comp_i))\n", " num_node += 1\n", "\n", " # Loop through the frequency in each component within each experiment\n", " for frequency in df.loc[(df.case == case) & (df.component == component)].frequency.unique():\n", " freq_i = num_node\n", "\n", " # Pull out the the stream information\n", " stream = df.loc[\n", " (df.case == case) & (df.component == component) & (df.frequency == frequency)\n", " ].stream.values[0]\n", "\n", " # Add both stream and frequency information to these bubbles\n", " dot.node(str(freq_i), label=f'stream: {stream} \\n frequency: {frequency}')\n", " dot.edge(str(comp_i), str(freq_i))\n", " num_node += 1\n", " comp_i += 1\n", " case_i += 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now visualize it in inline by running a cell with just the `dot` object" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "%3\n", "\n", "\n", "1\n", "\n", "b.e20.B1850.f19_g17.test\n", "\n", "\n", "2\n", "\n", "atm\n", "\n", "\n", "1->2\n", "\n", "\n", "\n", "\n", "4\n", "\n", "ocn\n", "\n", "\n", "1->4\n", "\n", "\n", "\n", "\n", "9\n", "\n", "lnd\n", "\n", "\n", "1->9\n", "\n", "\n", "\n", "\n", "11\n", "\n", "glc\n", "\n", "\n", "1->11\n", "\n", "\n", "\n", "\n", "13\n", "\n", "rof\n", "\n", "\n", "1->13\n", "\n", "\n", "\n", "\n", "15\n", "\n", "ice\n", "\n", "\n", "1->15\n", "\n", "\n", "\n", "\n", "3\n", "\n", "stream: cam.h0 \n", " frequency: month_1\n", "\n", "\n", "2->3\n", "\n", "\n", "\n", "\n", "5\n", "\n", "stream: pop.h \n", " frequency: month_1\n", "\n", "\n", "4->5\n", "\n", "\n", "\n", "\n", "6\n", "\n", "stream: pop.h.ecosys.nday1 \n", " frequency: day_1\n", "\n", "\n", "4->6\n", "\n", "\n", "\n", "\n", "7\n", "\n", "stream: pop.h.ecosys.nyear1 \n", " frequency: year_1\n", "\n", "\n", "4->7\n", "\n", "\n", "\n", "\n", "8\n", "\n", "stream: pop.h \n", " frequency: once\n", "\n", "\n", "4->8\n", "\n", "\n", "\n", "\n", "10\n", "\n", "stream: clm2.h0 \n", " frequency: month_1\n", "\n", "\n", "9->10\n", "\n", "\n", "\n", "\n", "12\n", "\n", "stream: cism.h \n", " frequency: year_1\n", "\n", "\n", "11->12\n", "\n", "\n", "\n", "\n", "14\n", "\n", "stream: mosart.h0 \n", " frequency: month_1\n", "\n", "\n", "13->14\n", "\n", "\n", "\n", "\n", "16\n", "\n", "stream: cice.h \n", " frequency: month_1\n", "\n", "\n", "15->16\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dot" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Save the Visualization\n", "In the block of code above, we specified `dot.format = 'pdf'` which will ensure that when we save the graph, it is in PDF format. Other options include (but not limited to) `svg` and `png`!\n", "\n", "The `Diagraph` method for saving is `.render()` with the filename in the argument (within the parentheses)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'cesm_test_catalog.pdf'" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dot.render('cesm_test_catalog')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This saved a file within your directory called `cesm_test_catalog.pdf`! You can double click this within your file browser to take a look" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Add Jupyterbook files\n", "If you are curious about [Jupyterbook](https://jupyterbook.org/intro.html), be sure to checkout their [official documentation](https://jupyterbook.org/intro.html), specifically their [building your first book](https://jupyterbook.org/start/your-first-book.html) tutorial!\n", "\n", "The main two files we need now are\n", "* The table of contents (`_toc.yml`)\n", "* The config file (`_config.yml`)\n", "\n", "Go ahead and create text files with your docs directory using those identical filenames" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Download a sample CESM logo\n", "I copied over a copy of the CESM logo to Github, which can be downloaded using the following (be sure to save to an an `image` directory in `docs`\n", "\n", "```bash\n", "wget https://raw.githubusercontent.com/mgrover1/cesm-workflow/main/images/cesm.jpg\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Adding to your Config (`_config.yml`) file\n", "Within your `_config.yml` file, input the following\n", "\n", "```bash\n", "title: \"CESM Test Data\"\n", "logo: images/cesm.jpg\n", "execute:\n", " execute_notebooks: \"off\"\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Add to your Table of Contents (`_toc.yml`) file\n", "\n", "This is where you place your content - in this case, the `model_documentation.ipynb` notebook. Jupyterbook does not require you specify the file type here - so leave off the `.ipynb`\n", "```bash\n", "- file: model_documentation\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Build your Book!\n", "Now that you have your content, config file, and table of contents, it's time to build the book. Make sure you are in your repository root directory, and run the following\n", "```bash\n", "jupyter-book build docs\n", "```\n", "\n", "If it built correctly, you should see the following\n", "```\n", "===============================================================================\n", "\n", "Finished generating HTML for book.\n", "Your book's HTML pages are here:\n", " docs/_build/html/\n", "You can look at your book by opening this file in a browser:\n", " docs/_build/html/index.html\n", "Or paste this line directly into your browser bar:\n", " file:///glade/work/mgrover/git_repos/cesm-test-data/docs/_build/html/index.html \n", "\n", "===============================================================================\n", "```\n", "\n", "## View the Book on Github\n", "It can be difficult to view the book on the Jupyterhub, but fortunately we can use Github for publishing this online!\n", "\n", "The [Jupyterbook publish your book online](https://jupyterbook.org/start/publish.html) docs are helpful here, so we use the second part here [describing using Github Pages](https://jupyterbook.org/start/publish.html#publish-your-book-online-with-github-pages)\n", "\n", "### Install Github Pages Import\n", "\n", "If you have not done so already, install the following\n", "\n", "```bash\n", "pip install ghp-import\n", "```\n", "\n", "### Build the book and push to your Github Pages branch\n", "\n", "Move to your docs directory again, and run the following (after building your book)\n", "\n", "```bash\n", "ghp-import -n -p -f _build/html\n", "```\n", "\n", "### Rebuilding your book\n", "If you make changes to your notebook or want to rebuild your book, run the following within your project root directory\n", "```bash\n", "jupyter-book build docs\n", "```\n", "\n", "and this within your `docs` directory\n", "```bash\n", "ghp-import -n -p -f _build/html\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Go checkout your book!\n", "Your book will be published along the following url structure\n", "\n", "\n", "`{github_username}.github.io/{repository_name}`\n", "\n", "For this example, the book can be found here\n", "\n", "[**https://mgrover1.github.io/cesm-test-data/**](https://mgrover1.github.io/cesm-test-data/)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conclusion\n", "This was a fairly in-depth post which covered content from previous [ESDS blog posts](https://ncar.github.io/esds/blog/), but I hope this provides a starting point for documentation your data, visualizing the data available, and sharing your data documentation with others!" ] } ], "metadata": { "author": "Max Grover", "date": "2021-06-25", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.4" }, "tags": "jupyter,cesm,intake,documentation", "title": "Creating Model Documentation Using Jupyterbook and Intake-esm" }, "nbformat": 4, "nbformat_minor": 4 }