Accessing NetCDF and GRIB file collections as cloud-native virtual datasets using Kerchunk
The Kerchunk library provides a way to map the internal layout and chunking of various gridded scientific data files to virtual Zarr datasets. By creating standardised reference files users can now directly access the compressed chunks of data contained in the original files as a single virtual dataset.
The Kerchunk package provides two key functionalities: (1) With Kerchunk, users can create standardised reference files for existing spatial data stores (HDF5, NetCDF, GRIB, FITS, TIFF, GeoTIFF). These reference files can then be used to access data stored in these formats in a cloud-optimised manner, to the extent that if the data has been optimally chunked in the existing files, Kerchunk will allow for read performance equal to newer formats like Zarr. (2) Reference files, created by Kerchunk, can be used to create unique virtual datasets that map to data contained within a single file or across multiple files in different locations and for modifications to be made to the data attributes stored within the reference files. This allows end users to access analysis-ready datasets, simply by opening a reference file, without any modifications needing to be made to the original source data.
Last updated: May 16 2025 at 17:14 UTC