Posts by Deepak Cherian

Thinking through CESM data access

We want to read a large number of netCDF files, combine them to form a single dataset, and then analyze that. How do we think about it?

In pseudocode we want

Read more ...


Analyzing and Visualizing CAM-SE Output in Python

We demonstrate a variety of options for analyzing and visualizing output from the Community Atmosphere Model (CAM) with the spectral element (SE) grid in Python. This notebook was developed for the ESDS Collaborative Work Time on Unstructured Grids, which took place on April 17, 2023. A recap of the related CAM-SE discussion can be found here.

Regrid CAM-SE output using map file

../../../_images/817331336ac2ab88e948f527c691a3538a3d43641bd5be2eb8296e7597bb4851.png

Read more ...


Recap: Unstructured Grid Collaborative Work Time

ESDS hosted our first Collaborative Work Time event on April 17, 2023. The topic of the session was “Working With Unstructured Grids”. Our goal is to encourage cross-lab collaboration and build lasting science-software partnerships.

The event was hybrid with in-person attendees in the Damon Room at the Mesa Lab. A lucky overlap with the Improving Scientific Software conference, meant that collaborators from the Department of Energy were also able to attend in-person.

Photo of unstructured grids collaborative work time session.

Read more ...


Using Kerchunk with CESM Timeseries Data on the Cloud

We benchmark reading a subset of the CESM2-Large Ensemble stored as a collection of netCDF files on the cloud (Amazon / AWS) from Casper. We use a single ensemble member historical experiment with daily data from 1850 to 2009, with a total dataset size of 600+ GB, from 13 netCDF4 files.

We read in two ways:

Read more ...


Virtual aggregate CESM MOM6 datasets with kerchunk

This notebook is adapted from the work by Lucas Sterzinger (an NCAR SIParCS intern in 2021).

This notebook was updated to

Read more ...


Regridding using xESMF and an existing weights file

A fairly common request is to use an existing ESMF weights file to regrid a Xarray Dataset (1, 2). Applying weights in general should be easy: read weights then apply them using dot or tensordot on the input dataset.

In the Xarray/Dask/Pangeo ecosystem, xESMF provides an interface to ESMF for convenient regridding, includiing parallelization with Dask. Here we demonstrate how to use an existing ESMF weights file with xESMF specifically for CAM-SE.

../../../_images/fd44e9f7bac7bfb9e60950a50b7f51ec81040235662db40028a587eea1fe403e.png

Read more ...


Debugging dask workflows: Detrending

Detrending - subtracting a trend, commonly a linear fit, from the data - along the time dimension is a common workflow in the climate sciences.

Here’s an example

../../../_images/e25cf220f15e78e7aa8d54f8e9d18a52e56c57a24b0f30bbd7bc7410543329f9.png

Read more ...


Sparse arrays and the CESM land model component

An underappreciated feature of Xarray + Dask is the ability to plug in different array types. Usually we work with Xarray wrapping a Dask array which in turn uses NumPy arrays for each block; or just Xarray wrapping NumPy arrays directly. NumPy arrays are dense in-memory arrays. Other array types exist:

sparse for sparse arrays

../../../_images/c4d9bf482f609b5c8823a94f55faa35a854a944a89bdc967aff406eb134a52c5.png

Read more ...