Hi all,
I've been working with some PFT-level data from CLM5 (e.g. NPP for each PFT) and have been using Python code from the NCAR-ESDS documentation to do the processing (Sparse arrays and the CESM land model component — NCAR-ESDS 0.1 documentation). This worked great until this week and I am now running into strange errors I'm not sure I understand. Following the example code, I convert the PFT output to a sparse matrix, then to an Xarray DataSet called sparse_data1. The following line:
data1 = sparse_data1.NPP.isel(vegtype=14).groupby("time.year").mean().sel(year=2015)
does what you would expect. However, I'm trying to do time series analysis, and this line:
data1 = sparse_data3.NPP.isel(vegtype=14).groupby("time.year").mean()
returns an error when I try to extract the output or do any operations on it.
print (data1.values)
[...]
ValueError: This operation requires consistent fill-values, but argument 1 had a fill value of 7.0, which is different from a fill_value of 5.0 in the first argument.
Attempting to slice the years using
sel(year=slice(2015,2100))
has the same issue. I'm unable to convert the output into a numpy array, and viewing its properties suggests there
is a text description of the data where the actual values should be. This code all worked fine last week so it's probably something to do with updates to the various Python packages involved, but I wondered if anyone else working with PFT-level output has encountered this problem or found a solution?
Cheers,
James
@James King I haven't experienced this issue, but given that you were able to get it working previously it makes sense that package versions could be part of it. Tagging @Deepak Cherian and @Daniel Kennedy here in case they have any additional insight.
Nice ot see that code is being used ! =)
Can you show us the output of xr.show_versions()
please
Rerunning that blogpost notebook with latest versions
cartopy : 0.19.0.post1
numpy : 1.23.1
xarray : 2022.3.0
matplotlib: 3.5.1
json : 2.0.9
sys : 3.8.6 | packaged by conda-forge | (default, Jan 25 2021, 23:21:18)
[GCC 9.3.0]
sparse : 0.13.0
works for me...
Thanks @Katie Dagon and @Deepak Cherian , it's a great bit of code! The output of python xr.show_versions()
is below.
xarray: 0.20.2
pandas: 1.3.5
numpy: 1.21.6
scipy: 1.7.3
netCDF4: 1.5.8
pydap: installed
h5netcdf: None
h5py: 3.6.0
Nio: None
zarr: 2.11.3
cftime: 1.6.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: 3.1.0
bottleneck: 1.3.4
dask: 2022.02.0
distributed: 2022.02.0
matplotlib: 3.5.2
cartopy: 0.20.2
seaborn: 0.11.2
numbagg: None
fsspec: 2022.5.0
cupy: None
pint: 0.18
sparse: 0.13.0
setuptools: 59.8.0
pip: 22.1.2
conda: None
pytest: 7.1.2
IPython: 7.33.0
sphinx: None
I'm trying to plot spatial means and sums of various PFT-level variables over time as line plots rather than spatial maps. Step 1 is getting them into an array with only a time dimension, and the code seemingly allows me to do that, but that error is thrown whenever I try to look at or plot the data (e.g. putting it into a pandas dataframe to calculate a running mean)
Oh hmm.. I found this dask issue
Are you ever computing std or var in the pipeline? Or only mean?
Yes I saw that too! Looks like it's a recent problem there. I'm only calculating means and sums with this code at the moment, depending on the variable
This seems to work:
data1_yearmean = data1.groupby("time.year").mean()
data1_chunk = data1_yearmean.chunk({"year": 100})
ts1 = data1_chunk.mean(("lat", "lon"))
ts2 = ts1.compute()
ts3 = ts1.copy(data=ts1.data.todense())
though it's fairly slow
(don't worry, the mean will be area-weighted now I seem to have a solution to implement!)
ts3 = ts2.copy(data=ts2.data.todense())
Last updated: May 16 2025 at 17:14 UTC