Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Introduction

Welcome to the GDEX ARCO Kerchunk documentation. This project provides tools and scripts to generate Kerchunk reference files and associated metadata, enabling efficient, cloud‑optimized access to datasets served on NCAR’s GDEX infrastructure.

What is GDEX ARCO Kerchunk?

ARCO (Analysis‑Ready Cloud‑Optimized) describes datasets and workflows prepared for efficient analysis in cloud or object‑store environments. The aim is to make data discoverable and directly usable without heavy preprocessing or repeated full‑downloads.

kerchunk is a lightweight approach that preserves the original file format (e.g., netCDF) by creating a JSON “reference” that maps logical array chunks to byte ranges inside the original files. User (via fsspec/xarray or Zarr-compatible tools) use that reference to read only the needed chunks over HTTP/S3 as if the data were stored chunk‑by‑chunk like a native Zarr store. Benefits include:

Key Features

🛠️ Custom Kerchunk Reference Generation

The script src/create_kerchunk.py produces Kerchunk reference files for datasets and offers flexible configuration to match different workflows. You can create either a single aggregated (combined) reference that maps many data files into one logical store, or individual reference files per source file.

Modes

Use combined references when you need a single logical view of many files; use per-file references when minimizing reference size and fetch time is the priority.

🌐 Multiple Access Methods

Generated kerchunk reference files support three access patterns :

Quick Start

Generate a basic catalog:

python src/create_kerchunk.py 
    --action combine 
    --directory /path/to/input_directory 
    --output_directory /path/to/output_directory 
    --extensions <data source file format> 
    --filename <output_filename> 
    --output_format <parquet|json|zarr> 
    [--make_remote]

For comprehensive usage examples:

Repository Structure

Content

This documentation provides:

  1. Understanding the kerchunk reference file generation process

  2. Accessing generated kerchunk file through different methods