I am trying to write a workflow using Pangeo-Forge-Recipes, which will hopefully run on NCAR HPC. Has anyone here used any of the pangeo-forge-recipes on Casper/Derecho? I am not finding any examples and am struggling with building a workflow. The main issue seems to be with how to work with apache Beam on the HPC and/or using Dask clusters with this library.
@Harsha Hampapura, is this anything you've run into / explored in your work?
@Negin Sobhani not sure if you might have some thoughts here as well.
Unfortunately, I haven't used pangeo-forge-recipes on Casper/Derecho before.
Thanks so much for tagging me @Katelyn FitzGerald
@Kevin Sampson If you can use dask cluster, that is preferred compared to apache beam workflow.
I had this discussion with the developers last year and there were some discussions on extending apache backend with flink to support distributed systems then but I am not sure what the status of that is currently. Please feel free to send an email to cislhelp@ucar.edu if you want HPC consultants to explore this further.
Thank you @Negin Sobhani . I can use dask/xarray workflows but they seem wildly inefficient (extremely large graphs being produced, long wait times for workers to get jobs from the scheduler) and I was exploring the more recently developed pangeo-forge-recipes as a way of testing alternative workflows. This is all centered around using kerchunk to produce zarr-like arrays from many (300k+) netCDF files.
Last updated: May 16 2025 at 17:14 UTC