Dealing with PFT-level output from CLM · python-questions

I've been working with some PFT-level data from CLM5 (e.g. NPP for each PFT) and have been using Python code from the NCAR-ESDS documentation to do the processing (Sparse arrays and the CESM land model component — NCAR-ESDS 0.1 documentation). This worked great until this week and I am now running into strange errors I'm not sure I understand. Following the example code, I convert the PFT output to a sparse matrix, then to an Xarray DataSet called sparse_data1. The following line:

data1 = sparse_data1.NPP.isel(vegtype=14).groupby("time.year").mean().sel(year=2015)

does what you would expect. However, I'm trying to do time series analysis, and this line:

data1 = sparse_data3.NPP.isel(vegtype=14).groupby("time.year").mean()

print (data1.values)
[...]
ValueError: This operation requires consistent fill-values, but argument 1 had a fill value of 7.0, which is different from a fill_value of 5.0 in the first argument.

sel(year=slice(2015,2100))

has the same issue. I'm unable to convert the output into a numpy array, and viewing its properties suggests there
is a text description of the data where the actual values should be. This code all worked fine last week so it's probably something to do with updates to the various Python packages involved, but I wondered if anyone else working with PFT-level output has encountered this problem or found a solution?
Cheers,
James

Katie Dagon (Jul 11 2022 at 20:59):

@James King I haven't experienced this issue, but given that you were able to get it working previously it makes sense that package versions could be part of it. Tagging @Deepak Cherian and @Daniel Kennedy here in case they have any additional insight.

Deepak Cherian (Jul 11 2022 at 21:25):

Deepak Cherian (Jul 11 2022 at 21:26):

cartopy   : 0.19.0.post1
numpy     : 1.23.1
xarray    : 2022.3.0
matplotlib: 3.5.1
json      : 2.0.9
sys       : 3.8.6 | packaged by conda-forge | (default, Jan 25 2021, 23:21:18)
[GCC 9.3.0]
sparse    : 0.13.0

James King (Jul 12 2022 at 09:03):

Thanks @Katie Dagon and @Deepak Cherian , it's a great bit of code! The output of python xr.show_versions() is below.

xarray: 0.20.2
pandas: 1.3.5
numpy: 1.21.6
scipy: 1.7.3
netCDF4: 1.5.8
pydap: installed
h5netcdf: None
h5py: 3.6.0
Nio: None
zarr: 2.11.3
cftime: 1.6.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: 3.1.0
bottleneck: 1.3.4
dask: 2022.02.0
distributed: 2022.02.0
matplotlib: 3.5.2
cartopy: 0.20.2
seaborn: 0.11.2
numbagg: None
fsspec: 2022.5.0
cupy: None
pint: 0.18
sparse: 0.13.0
setuptools: 59.8.0
pip: 22.1.2
conda: None
pytest: 7.1.2
IPython: 7.33.0
sphinx: None

James King (Jul 12 2022 at 14:20):

I'm trying to plot spatial means and sums of various PFT-level variables over time as line plots rather than spatial maps. Step 1 is getting them into an array with only a time dimension, and the code seemingly allows me to do that, but that error is thrown whenever I try to look at or plot the data (e.g. putting it into a pandas dataframe to calculate a running mean)

Deepak Cherian (Jul 12 2022 at 14:58):

Deepak Cherian (Jul 12 2022 at 15:02):

James King (Jul 12 2022 at 15:58):

Yes I saw that too! Looks like it's a recent problem there. I'm only calculating means and sums with this code at the moment, depending on the variable

James King (Jul 12 2022 at 16:02):

data1_yearmean = data1.groupby("time.year").mean()
data1_chunk = data1_yearmean.chunk({"year": 100})
ts1 = data1_chunk.mean(("lat", "lon"))
ts2 = ts1.compute()
ts3 = ts1.copy(data=ts1.data.todense())

James King (Jul 12 2022 at 16:03):

(don't worry, the mean will be area-weighted now I seem to have a solution to implement!)

James King (Jul 13 2022 at 19:01):

ts3 = ts2.copy(data=ts2.data.todense())

Stream: python-questions

Topic: Dealing with PFT-level output from CLM

James King (Jul 11 2022 at 17:44):