Sparse-gridding CLM PFTs: Too many dimensions · python-questions

The functions to_sparse and convert_pft_variables_to_sparse that @Deepak Cherian and @Katie Dagon described at https://ncar.github.io/esds/posts/2022/sparse-PFT-gridding/ are awesome, but despite a comment in the former, they only work with variables up to 2 dimensions. Would some hero like to generalize it to work with an arbitrary number of dimensions?

As an example, I added some error handling to show this for the file /glade/work/samrabin/ctsm53019_f09_BNF_hist/lnd/hist/ctsm53019_f09_BNF_hist.clm2.h5.1851-01-01-00000.nc (using pftnames from the file /glade/campaign/cesm/cesmdata/cseg/inputdata/lnd/clm2/paramdata/ctsm60_params.c241119.nc):

Processed pfts1d_lon with dims ('pft',)
Processed pfts1d_lat with dims ('pft',)
Processed pfts1d_ixy with dims ('pft',)
Processed pfts1d_jxy with dims ('pft',)
Processed pfts1d_gi with dims ('pft',)
Processed pfts1d_li with dims ('pft',)
Processed pfts1d_ci with dims ('pft',)
Processed pfts1d_wtgcell with dims ('pft',)
Processed pfts1d_wtlunit with dims ('pft',)
Processed pfts1d_wtcol with dims ('pft',)
Processed pfts1d_itype_veg with dims ('pft',)
Processed pfts1d_itype_col with dims ('pft',)
Processed pfts1d_itype_lunit with dims ('pft',)
Processed pfts1d_active with dims ('pft',)
Processed GDD20_BASELINE with dims ('time', 'pft')
Processed GDD20_SEASON_END with dims ('time', 'pft')
Processed GDD20_SEASON_START with dims ('time', 'pft')
Processed GRAINC_TO_FOOD_ANN with dims ('time', 'pft')
Processed GRAINC_TO_SEED_ANN with dims ('time', 'pft')
Processed GRAINN_TO_FOOD_ANN with dims ('time', 'pft')
Processed GRAINN_TO_SEED_ANN with dims ('time', 'pft')
Can't handle GDDACCUM_PERHARV with dims ('time', 'mxharvests', 'pft')
Can't handle GDDHARV_PERHARV with dims ('time', 'mxharvests', 'pft')
Can't handle GRAINC_TO_FOOD_PERHARV with dims ('time', 'mxharvests', 'pft')
Can't handle GRAINC_TO_SEED_PERHARV with dims ('time', 'mxharvests', 'pft')
Can't handle GRAINN_TO_FOOD_PERHARV with dims ('time', 'mxharvests', 'pft')
Can't handle GRAINN_TO_SEED_PERHARV with dims ('time', 'mxharvests', 'pft')
Can't handle HARVEST_REASON_PERHARV with dims ('time', 'mxharvests', 'pft')
Can't handle HDATES with dims ('time', 'mxharvests', 'pft')
Can't handle HUI_PERHARV with dims ('time', 'mxharvests', 'pft')
Can't handle SDATES_PERHARV with dims ('time', 'mxharvests', 'pft')
Can't handle SOWING_REASON_PERHARV with dims ('time', 'mxharvests', 'pft')
Can't handle SYEARS_PERHARV with dims ('time', 'mxharvests', 'pft')
Can't handle SDATES with dims ('time', 'mxsowings', 'pft')
Can't handle SWINDOW_ENDS with dims ('time', 'mxsowings', 'pft')
Can't handle SWINDOW_STARTS with dims ('time', 'mxsowings', 'pft')

Sam Rabin (Feb 03 2025 at 19:32):

Moreover, the resulting Dataset fails to compute() if the input Dataset is comprised of more than one timestep (e.g., having done xr.open_mfdataset() on the 1851 and 1852 files instead of just on the 1851 file:

Traceback (most recent call last):
  File "/glade/u/home/samrabin/sparse_array_testing/sparse_array_testing.py", line 160, in <module>
    ds_sel = ds_gridded["GRAINC_TO_FOOD_ANN"].isel(time=0, vegtype=17).compute()
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/glade/work/samrabin/conda-envs/cupid-analysis/lib/python3.11/site-packages/xarray/core/dataarray.py", line 1206, in compute
    return new.load(**kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/glade/work/samrabin/conda-envs/cupid-analysis/lib/python3.11/site-packages/xarray/core/dataarray.py", line 1174, in load
    ds = self._to_temp_dataset().load(**kwargs)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/glade/work/samrabin/conda-envs/cupid-analysis/lib/python3.11/site-packages/xarray/core/dataset.py", line 900, in load
    evaluated_data: tuple[np.ndarray[Any, Any], ...] = chunkmanager.compute(
                                                       ^^^^^^^^^^^^^^^^^^^^^
  File "/glade/work/samrabin/conda-envs/cupid-analysis/lib/python3.11/site-packages/xarray/namedarray/daskmanager.py", line 85, in compute
    return compute(*data, **kwargs)  # type: ignore[no-untyped-call, no-any-return]
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/glade/work/samrabin/conda-envs/cupid-analysis/lib/python3.11/site-packages/dask/base.py", line 662, in compute
    results = schedule(dsk, keys, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/glade/u/home/samrabin/sparse_array_testing/sparse_array_testing.py", line 50, in to_sparse
    coords = np.stack([itime] + tostack, axis=0)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/glade/work/samrabin/conda-envs/cupid-analysis/lib/python3.11/site-packages/numpy/core/shape_base.py", line 449, in stack
    raise ValueError('all input arrays must have the same shape')
ValueError: all input arrays must have the same shape

Sam Rabin (Feb 03 2025 at 19:44):

(The latter error can be worked around by just gridding each file one-by-one, then concatenating them along time.)

Deepak Cherian (Feb 05 2025 at 23:07):

Sam Rabin (Feb 07 2025 at 16:53):

Thanks, Deepak! I'll give it a go when I get a chance to come back to this work, but it does indeed look more flexible.

Stream: python-questions

Topic: Sparse-gridding CLM PFTs: Too many dimensions

Sam Rabin (Feb 03 2025 at 19:04):

Sam Rabin (Feb 03 2025 at 19:32):

Sam Rabin (Feb 03 2025 at 19:44):

Deepak Cherian (Feb 05 2025 at 23:07):

Sam Rabin (Feb 07 2025 at 16:53):