Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

GDEX Intake ESM catalogs

Code used to generate the intake-esm catalogs used to access various datasets in NCAR’s GDEX and examples of utilizing the generated catalogs.

Overview

This repository contains tools and scripts for generating intake-ESM catalogs that provide unified access to diverse Earth science datasets. While intake-ESM was originally designed for Earth System Model output, we extend its use to observations, reanalysis data, and other Earth science datasets.

Repo Usage

The primary tool is generator/create_catalog.py. It generates an intake-ESM catalog for a specific dataset directory.

Basic CLI

python generator/create_catalog.py <directory> \
    [--out <output directory>] \
    [--catalog_name <name>] \
    [--description <description>] \
    [--exclude <glob> ...] \
    [--include <glob> ...] \
    [--depth <int>] \
    [--ignore_vars <var name> ...] \
    [--var_metadata <json string|filename>] \
    [--global_metadata <json string|filename>] \
    [--output_format <csv_and_json|single_json>] \
    [--data_format <netcdf|zarr|reference>] \
    [--make_remote]

Options (brief)

Example

python generator/create_catalog.py \
  /gdex/data/need/to/be/cataloged/ \
  --data_format reference \
  --out /data/path/to/store/catalog \
  --output_format csv_and_json \
  --catalog_name intake_catalog \
  --description "reference catalog" \
  --depth 0 \
  --include "*.zarr" \
  --exclude "*.tmp" \
  --ignore_vars utc_date \
  --var_metadata var_meta.json \
  --global_metadata global_meta.json \
  --make_remote

Notes

Key Features

1. Custom Catalog Generation Tools (ecgtools)

We use a custom fork of ecgtools, Currently, pin to commit SHA = 0b3d5b5d0082812e85c821c00c2d619eed0ae3cd along with custom scripts to generate our catalogs. This allows us to:

2. Broad Dataset Support

Although intake-ESM is primarily meant for Earth System Model output, we leverage the package to generate catalogs for:

We strive to match our vocabulary (column names) with conventions used by other major data providers including:

3. Multiple Access Methods

Our catalogs support different data access patterns through three main flavors:

a) POSIX

Direct filesystem access for users on NCAR HPC systems (Casper, Derecho)

b) HTTPS

Web-based access for remote users and standard HTTP protocols

c) OSDF (Open Science Data Federation)

Distributed access through the Open Science Data Federation for broader community access

Catelog Usage Examples

For comprehensive usage examples and tutorials for the generated catelog:

Support and Contributions

Issues and Feature Requests

We welcome feedback from the community! Please use GitHub issues for:

Note: While we appreciate all feature requests, please understand that we may not be able to fulfill all requests due to resource constraints and project priorities.

Getting Help

  1. Check the documentation and examples linked above

  2. Search existing GitHub issues for similar problems

  3. Open a new issue with detailed information about your use case

Repository Structure

├── README.md
├── requirements.txt
├── generator/          # Core catalog generation tools
│   ├── create_catalog.py
│   └── modify_catalog.py
├── notebooks/          # Example notebooks and development work
└── test/              # Test scripts

Installation

git clone https://github.com/NCAR/gdex-intake-esm.git
cd gdex-intake-esm
pip install -r requirements.txt