Stream: python-questions

Topic: Sparse PFT-level output in Python


view this post on Zulip James King (Jan 03 2025 at 15:09):

Hi all,

I've previously used this nifty bit of code by @Deepak Cherian and @Katie Dagon to process 1D output from CLM5 (i.e. carbon variables at PFT level). I last did this a couple of years ago and am now having problems when I return to it. I can read in an input file and process it into an xarray with no issues. However, attempting to do fairly basic processing immediately kills my Jupyter notebook (which has 16GB of memory). This happens if, for example, I attempt to take the global sum of some variable and then either plot the output as a timeseries or dump the values as a numpy array.

Illustrative example below, which takes as input a file containing PFT-level CLM5 output on a 1deg global grid covering 2025-2100 and using the functions defined in the link:

ifile = xr.open_dataset(file.nc)
pft_constants = xr.open_dataset(clm5_params_file.nc)
pftnames = pft_constants.pftname

sparse_data = convert_pft_variables_to_sparse(ifile, pftnames)
out = sparse_data.FireEmis_TOT.isel(vegtype=1).sum(("lon", "lat"))
print(out.values)

At this point, the notebook crashes. Should I expect large memory requirements for this?

view this post on Zulip Katelyn FitzGerald (Jan 03 2025 at 18:42):

I wonder if some of the operations you're doing are converting large arrays to non-sparse array types. Have you tested out the analysis operations you're interested in on a smaller subset to see if the resulting arrays are still sparse array types? You should be able to see the array type in the description of the Xarray DataArray in your notebook.

view this post on Zulip Will Wieder (Jan 07 2025 at 18:17):

@James King I've also noticed that the pft regridding is a bit of a memory hog (and I'm only looking at a decade of data, not 75 years). I can't remember if the script is friendly with dask, but I'd recommend throwing more memory at the problem, either for asking for more when you log onto jupyter hub or by using dask workers?

view this post on Zulip James King (Jan 10 2025 at 09:50):

Thanks @Katelyn FitzGerald and @Will Wieder - alas I'm working on a non-NCAR system which doesn't allow me to request any more memory than that for a Jupyter notebook! Testing on a smaller subset of data sounds wise, this is probably a case of me being impatient to produce some nice plots...

view this post on Zulip James King (Jan 10 2025 at 15:03):

And yes, the script makes use of dask


Last updated: May 16 2025 at 17:14 UTC