Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Introduction

Welcome to the OSDF Examples repository! This repository provides example notebooks and scripts that demonstrate how to access data via the Open Science Data Federation (OSDF) using PelicanFS. All the notebooks show how to stream geoscience data into your workflows and perform an interesting calculation or visualization.

A short primer on OSDF and PelicanFS

If accessing scientific data still feels like “download a giant archive, then analyze it locally,” OSDF is the alternative. The Open Science Data Federation is an NSF-funded content-distribution layer for science: it sits in front of existing repositories and streams data over HTTPS to wherever your code is running.

Two pieces of jargon worth knowing:

You don’t have to think about origins and caches when you read data — the Pelican packages handle this transparently. In this repository, we use the Pelican Python client PelicanFS, an FSSpec implementation, which plugs into anything that already speaks FSSpec: xarray, intake, intake-esm, pandas. The two URL schemes you’ll see throughout this book:

SchemeFormatUsed for
osdfosdf:///<namespace-path>OSDF data — note the three slashes
pelicanpelican://<federation-host>/<namespace-path>Other Pelican federations

Common namespaces in this book:

A typical xarray + zarr call looks like:

import xarray as xr
ds = xr.open_zarr("osdf:///aws-opendata/us-west-2/cmip6-pds/.../...")

For a deeper introduction with executable examples, see Project Pythia’s OSDF Cookbook — its first chapters cover the OSDF concept and PelicanFS usage in detail. To learn how NCAR integrated OSDF with its data infrastructure, see Integration of OSDF with NCAR’s data infrastructure: Interim Project Report (Oct 2025).

Find a notebook

The collection is organized by data origin rather than a fixed list of notebooks, so it scales as new examples are added. Use whichever entry point matches what you have:

The full tagged index lives in the Notebook Gallery.

How notebooks are tagged

Every notebook carries a faceted set of tags in its frontmatter so users can filter by axis (compute platform, data origin, dataset, task, level). The facets are:

FacetExamples
origin:aws, ncar-posix, ncar-object-store
platform:casper, stampede3, jetstream2, ospool, laptop
dataset:cesm, cmip6, era5, conus404, na-cordex, hrrr, dart, jra3q, hadisst
task:bias-correction, climatology, ml, benchmark, visualization, ecs
level:beginner, intermediate, advanced

NCAR has two OSDF origins: ncar-posix (POSIX storage; namespace osdf:///ncar/gdex/... — older notebooks may use osdf:///ncar/rda/..., which is the same origin under its previous name) and ncar-object-store (NCAR’s object storage, currently called Boreas; namespace osdf:///ncar-gdex/...).

Searching by tag

Tags are full-text indexed by the book’s search bar (the magnifying-glass icon at the top of every page, or press / on your keyboard). Type a tag value to find every notebook that carries it. For example:

Each visible tag pill on a notebook page (or in the gallery) is also a clickable link into the Tag Index, where you can see every notebook that shares that tag in one place.

A note on platform: tags. Most notebooks are designed to run on a user’s own machine via a Dask LocalCluster, and only opt into PBS/Slurm when a flag is set. The platform: tag therefore documents where the notebook has been verified to run, not the only place it can run. A notebook tagged platform:casper was tested on Casper using PBS; flip the cluster switch in the notebook (e.g. USE_PBS_SCHEDULER = False) and the same notebook runs locally.

For the full taxonomy and conventions, see CONTRIBUTING.md.

How is the repository organized?

This repository is organized into sections based mostly on the data origins from which the data is accessed and the computational platforms used to execute the notebooks.

Access methods

Some notebooks use intake/intake-ESM catalogs in conjunction with PelicanFS to stream data. Others use PelicanFS directly to load data into xarray.

Repository structure