A collection of Jupyter notebooks that stream Earth System Science data from Open Science Data Federation (OSDF) origins using PelicanFS, and run analysis on a variety of HPC and cloud platforms.
Browse the rendered book: https://
New to OSDF or PelicanFS? Project Pythia’s OSDF Cookbook is the recommended introduction — its first chapters cover the OSDF concept and PelicanFS in depth. For background on how NCAR integrated OSDF with its data infrastructure, see Integration of OSDF with NCAR’s data infrastructure: Interim Project Report (Oct 2025).
Quick Start¶
git clone https://github.com/NCAR/osdf-examples.git
cd osdf-examples
python -m venv .venv && source .venv/bin/activate # or use conda
pip install -r requirements.txt
jupyter labNew here? Start with notebooks/simple_aws_example.ipynb
(runs on a laptop, no credentials required).
What’s inside¶
The repository is organized by data origin — the OSDF origin a notebook streams data from. Each notebook also indicates the compute platform it was tested on. Browse the Notebook Gallery for the full, tagged list.
Data origins¶
GDEX / NCAR Data Origin — datasets streamed from NCAR’s OSDF origins, which are read from NCAR’s Geoscience Data Exchange (GDEX). Covers CESM2 LENS, ERA5, JRA-3Q, DART, CONUS404, NA-CORDEX, SAAG, HadISST, and more.
AWS Open Data — CESM2 LENS, CMIP6 zarr (~27 GCMs), HRRR, NOAA SONAR, Sentinel-2 streamed via the AWS open-data origin.
Cross-origin workflows — examples that combine two or more origins (e.g. bias-correcting a CESM AWS dataset against an NCAR ERA5 dataset).
Compute platforms covered¶
NCAR Casper · TACC Stampede3 · Indiana Jetstream2 · OSPool · laptop
Most notebooks are designed to run on a user’s own machine via a Dask
LocalCluster. The compute-platform mentions andplatform:tags indicate where each notebook was verified (e.g. via PBS on Casper), not the only place it can run — flip the cluster switch in the notebook to use aLocalClusterinstead.
Workflow types¶
Bias correction · climatology · ML (logistic-regression Niño 3.4 prediction) · benchmarking · diagnostic visualization · equilibrium climate sensitivity.
Finding a notebook¶
Each notebook is tagged in its frontmatter with a faceted scheme so you can filter by axis instead of guessing keywords:
| Facet | Examples |
|---|---|
origin: | aws, ncar-posix, ncar-object-store |
platform: | casper, stampede3, jetstream2, ospool, laptop |
dataset: | cesm, cmip6, era5, conus404, na-cordex, hrrr, dart, jra3q, hadisst |
task: | bias-correction, climatology, ml, benchmark, visualization, ecs |
level: | beginner, intermediate, advanced |
The rendered Jupyter Book exposes these tags as filters. See the Notebook Gallery for a tagged index, or Contributing to OSDF-Examples for the tag conventions when adding new notebooks.
Repository structure¶
docs/ Markdown overviews and the notebook gallery
notebooks/ All workflow notebooks (subfolders for ML and NDC workflows)
scripts/ Non-notebook code (e.g. OSPool batch examples)
myst.yml Jupyter Book configuration / table of contentsHow to contribute¶
Contributions are welcome from anyone — you do not need an NCAR HPC account. Notebooks that run on a laptop, on the cloud, or on any HPC system are all in scope, as long as they demonstrate accessing data via OSDF/PelicanFS.
Fork the repository.
Create a feature branch:
git checkout -b example/my-amazing-example.Add your notebook with the standard frontmatter and tags (see Contributing to OSDF-Examples).
Open a pull request describing the dataset, origin, and compute platform.
If you’re contributing a workflow that requires NCAR HPC access, please note that in the notebook so external readers know what to expect.
Citing¶
If you use any workflow in this repository, please cite via the DOI badge above.
Support¶
Bug reports and feature requests: please open a GitHub Issue.
- Harsha Hampapura, Riley Conroy, Emma Turetsky, & Joanmarie Del Vecchio. (2025). NCAR/osdf_examples: osdf-example-workflows-1.0.1. Zenodo. 10.5281/ZENODO.16863133