Stream: python-questions

Topic: Pangeo-Forge-Recipes on NCAR HPC


view this post on Zulip Kevin Sampson (Aug 20 2024 at 16:14):

I am trying to write a workflow using Pangeo-Forge-Recipes, which will hopefully run on NCAR HPC. Has anyone here used any of the pangeo-forge-recipes on Casper/Derecho? I am not finding any examples and am struggling with building a workflow. The main issue seems to be with how to work with apache Beam on the HPC and/or using Dask clusters with this library.

view this post on Zulip Katelyn FitzGerald (Aug 21 2024 at 17:25):

@Harsha Hampapura, is this anything you've run into / explored in your work?

@Negin Sobhani not sure if you might have some thoughts here as well.

view this post on Zulip Harsha Hampapura (Aug 21 2024 at 17:35):

Unfortunately, I haven't used pangeo-forge-recipes on Casper/Derecho before.

view this post on Zulip Negin Sobhani (Aug 21 2024 at 19:41):

Thanks so much for tagging me @Katelyn FitzGerald

@Kevin Sampson If you can use dask cluster, that is preferred compared to apache beam workflow.

I had this discussion with the developers last year and there were some discussions on extending apache backend with flink to support distributed systems then but I am not sure what the status of that is currently. Please feel free to send an email to cislhelp@ucar.edu if you want HPC consultants to explore this further.

view this post on Zulip Kevin Sampson (Aug 21 2024 at 20:05):

Thank you @Negin Sobhani . I can use dask/xarray workflows but they seem wildly inefficient (extremely large graphs being produced, long wait times for workers to get jobs from the scheduler) and I was exploring the more recently developed pangeo-forge-recipes as a way of testing alternative workflows. This is all centered around using kerchunk to produce zarr-like arrays from many (300k+) netCDF files.


Last updated: May 16 2025 at 17:14 UTC