Hi all,
I've previously used this nifty bit of code by @Deepak Cherian and @Katie Dagon to process 1D output from CLM5 (i.e. carbon variables at PFT level). I last did this a couple of years ago and am now having problems when I return to it. I can read in an input file and process it into an xarray with no issues. However, attempting to do fairly basic processing immediately kills my Jupyter notebook (which has 16GB of memory). This happens if, for example, I attempt to take the global sum of some variable and then either plot the output as a timeseries or dump the values as a numpy array.
Illustrative example below, which takes as input a file containing PFT-level CLM5 output on a 1deg global grid covering 2025-2100 and using the functions defined in the link:
ifile = xr.open_dataset(file.nc)
pft_constants = xr.open_dataset(clm5_params_file.nc)
pftnames = pft_constants.pftname
sparse_data = convert_pft_variables_to_sparse(ifile, pftnames)
out = sparse_data.FireEmis_TOT.isel(vegtype=1).sum(("lon", "lat"))
print(out.values)
At this point, the notebook crashes. Should I expect large memory requirements for this?
I wonder if some of the operations you're doing are converting large arrays to non-sparse array types. Have you tested out the analysis operations you're interested in on a smaller subset to see if the resulting arrays are still sparse array types? You should be able to see the array type in the description of the Xarray DataArray in your notebook.
@James King I've also noticed that the pft regridding is a bit of a memory hog (and I'm only looking at a decade of data, not 75 years). I can't remember if the script is friendly with dask, but I'd recommend throwing more memory at the problem, either for asking for more when you log onto jupyter hub or by using dask workers?
Thanks @Katelyn FitzGerald and @Will Wieder - alas I'm working on a non-NCAR system which doesn't allow me to request any more memory than that for a Jupyter notebook! Testing on a smaller subset of data sounds wise, this is probably a case of me being impatient to produce some nice plots...
And yes, the script makes use of dask
Last updated: May 16 2025 at 17:14 UTC