The functions to_sparse
and convert_pft_variables_to_sparse
that @Deepak Cherian and @Katie Dagon described at https://ncar.github.io/esds/posts/2022/sparse-PFT-gridding/ are awesome, but despite a comment in the former, they only work with variables up to 2 dimensions. Would some hero like to generalize it to work with an arbitrary number of dimensions?
As an example, I added some error handling to show this for the file /glade/work/samrabin/ctsm53019_f09_BNF_hist/lnd/hist/ctsm53019_f09_BNF_hist.clm2.h5.1851-01-01-00000.nc
(using pftnames
from the file /glade/campaign/cesm/cesmdata/cseg/inputdata/lnd/clm2/paramdata/ctsm60_params.c241119.nc
):
Processed pfts1d_lon with dims ('pft',)
Processed pfts1d_lat with dims ('pft',)
Processed pfts1d_ixy with dims ('pft',)
Processed pfts1d_jxy with dims ('pft',)
Processed pfts1d_gi with dims ('pft',)
Processed pfts1d_li with dims ('pft',)
Processed pfts1d_ci with dims ('pft',)
Processed pfts1d_wtgcell with dims ('pft',)
Processed pfts1d_wtlunit with dims ('pft',)
Processed pfts1d_wtcol with dims ('pft',)
Processed pfts1d_itype_veg with dims ('pft',)
Processed pfts1d_itype_col with dims ('pft',)
Processed pfts1d_itype_lunit with dims ('pft',)
Processed pfts1d_active with dims ('pft',)
Processed GDD20_BASELINE with dims ('time', 'pft')
Processed GDD20_SEASON_END with dims ('time', 'pft')
Processed GDD20_SEASON_START with dims ('time', 'pft')
Processed GRAINC_TO_FOOD_ANN with dims ('time', 'pft')
Processed GRAINC_TO_SEED_ANN with dims ('time', 'pft')
Processed GRAINN_TO_FOOD_ANN with dims ('time', 'pft')
Processed GRAINN_TO_SEED_ANN with dims ('time', 'pft')
Can't handle GDDACCUM_PERHARV with dims ('time', 'mxharvests', 'pft')
Can't handle GDDHARV_PERHARV with dims ('time', 'mxharvests', 'pft')
Can't handle GRAINC_TO_FOOD_PERHARV with dims ('time', 'mxharvests', 'pft')
Can't handle GRAINC_TO_SEED_PERHARV with dims ('time', 'mxharvests', 'pft')
Can't handle GRAINN_TO_FOOD_PERHARV with dims ('time', 'mxharvests', 'pft')
Can't handle GRAINN_TO_SEED_PERHARV with dims ('time', 'mxharvests', 'pft')
Can't handle HARVEST_REASON_PERHARV with dims ('time', 'mxharvests', 'pft')
Can't handle HDATES with dims ('time', 'mxharvests', 'pft')
Can't handle HUI_PERHARV with dims ('time', 'mxharvests', 'pft')
Can't handle SDATES_PERHARV with dims ('time', 'mxharvests', 'pft')
Can't handle SOWING_REASON_PERHARV with dims ('time', 'mxharvests', 'pft')
Can't handle SYEARS_PERHARV with dims ('time', 'mxharvests', 'pft')
Can't handle SDATES with dims ('time', 'mxsowings', 'pft')
Can't handle SWINDOW_ENDS with dims ('time', 'mxsowings', 'pft')
Can't handle SWINDOW_STARTS with dims ('time', 'mxsowings', 'pft')
Moreover, the resulting Dataset fails to compute()
if the input Dataset is comprised of more than one timestep (e.g., having done xr.open_mfdataset()
on the 1851 and 1852 files instead of just on the 1851 file:
Traceback (most recent call last):
File "/glade/u/home/samrabin/sparse_array_testing/sparse_array_testing.py", line 160, in <module>
ds_sel = ds_gridded["GRAINC_TO_FOOD_ANN"].isel(time=0, vegtype=17).compute()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/glade/work/samrabin/conda-envs/cupid-analysis/lib/python3.11/site-packages/xarray/core/dataarray.py", line 1206, in compute
return new.load(**kwargs)
^^^^^^^^^^^^^^^^^^
File "/glade/work/samrabin/conda-envs/cupid-analysis/lib/python3.11/site-packages/xarray/core/dataarray.py", line 1174, in load
ds = self._to_temp_dataset().load(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/glade/work/samrabin/conda-envs/cupid-analysis/lib/python3.11/site-packages/xarray/core/dataset.py", line 900, in load
evaluated_data: tuple[np.ndarray[Any, Any], ...] = chunkmanager.compute(
^^^^^^^^^^^^^^^^^^^^^
File "/glade/work/samrabin/conda-envs/cupid-analysis/lib/python3.11/site-packages/xarray/namedarray/daskmanager.py", line 85, in compute
return compute(*data, **kwargs) # type: ignore[no-untyped-call, no-any-return]
^^^^^^^^^^^^^^^^^^^^^^^^
File "/glade/work/samrabin/conda-envs/cupid-analysis/lib/python3.11/site-packages/dask/base.py", line 662, in compute
results = schedule(dsk, keys, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/glade/u/home/samrabin/sparse_array_testing/sparse_array_testing.py", line 50, in to_sparse
coords = np.stack([itime] + tostack, axis=0)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/glade/work/samrabin/conda-envs/cupid-analysis/lib/python3.11/site-packages/numpy/core/shape_base.py", line 449, in stack
raise ValueError('all input arrays must have the same shape')
ValueError: all input arrays must have the same shape
(The latter error can be worked around by just gridding each file one-by-one, then concatenating them along time
.)
I think the version here does work for n-dimensions but obviously i haven't looked at it in 4 years: https://github.com/NCAR/ctsm_python_gallery/blob/master/notebooks/sparse-PFT-gridding.ipynb
Thanks, Deepak! I'll give it a go when I get a chance to come back to this work, but it does indeed look more flexible.
Last updated: May 16 2025 at 17:14 UTC