Stream: python-questions

Topic: Dealing with PFT-level output from CLM


view this post on Zulip James King (Jul 11 2022 at 17:44):

Hi all,

I've been working with some PFT-level data from CLM5 (e.g. NPP for each PFT) and have been using Python code from the NCAR-ESDS documentation to do the processing (Sparse arrays and the CESM land model component — NCAR-ESDS 0.1 documentation). This worked great until this week and I am now running into strange errors I'm not sure I understand. Following the example code, I convert the PFT output to a sparse matrix, then to an Xarray DataSet called sparse_data1. The following line:

data1 = sparse_data1.NPP.isel(vegtype=14).groupby("time.year").mean().sel(year=2015)

does what you would expect. However, I'm trying to do time series analysis, and this line:

data1 = sparse_data3.NPP.isel(vegtype=14).groupby("time.year").mean()

returns an error when I try to extract the output or do any operations on it.

print (data1.values)
[...]
ValueError: This operation requires consistent fill-values, but argument 1 had a fill value of 7.0, which is different from a fill_value of 5.0 in the first argument.

Attempting to slice the years using

sel(year=slice(2015,2100))

has the same issue. I'm unable to convert the output into a numpy array, and viewing its properties suggests there
is a text description of the data where the actual values should be. This code all worked fine last week so it's probably something to do with updates to the various Python packages involved, but I wondered if anyone else working with PFT-level output has encountered this problem or found a solution?
Cheers,
James

view this post on Zulip Katie Dagon (Jul 11 2022 at 20:59):

@James King I haven't experienced this issue, but given that you were able to get it working previously it makes sense that package versions could be part of it. Tagging @Deepak Cherian and @Daniel Kennedy here in case they have any additional insight.

view this post on Zulip Deepak Cherian (Jul 11 2022 at 21:25):

Nice ot see that code is being used ! =)

Can you show us the output of xr.show_versions() please

view this post on Zulip Deepak Cherian (Jul 11 2022 at 21:26):

Rerunning that blogpost notebook with latest versions

cartopy   : 0.19.0.post1
numpy     : 1.23.1
xarray    : 2022.3.0
matplotlib: 3.5.1
json      : 2.0.9
sys       : 3.8.6 | packaged by conda-forge | (default, Jan 25 2021, 23:21:18)
[GCC 9.3.0]
sparse    : 0.13.0

works for me...

view this post on Zulip James King (Jul 12 2022 at 09:03):

Thanks @Katie Dagon and @Deepak Cherian , it's a great bit of code! The output of python xr.show_versions() is below.

xarray: 0.20.2
pandas: 1.3.5
numpy: 1.21.6
scipy: 1.7.3
netCDF4: 1.5.8
pydap: installed
h5netcdf: None
h5py: 3.6.0
Nio: None
zarr: 2.11.3
cftime: 1.6.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: 3.1.0
bottleneck: 1.3.4
dask: 2022.02.0
distributed: 2022.02.0
matplotlib: 3.5.2
cartopy: 0.20.2
seaborn: 0.11.2
numbagg: None
fsspec: 2022.5.0
cupy: None
pint: 0.18
sparse: 0.13.0
setuptools: 59.8.0
pip: 22.1.2
conda: None
pytest: 7.1.2
IPython: 7.33.0
sphinx: None

view this post on Zulip James King (Jul 12 2022 at 14:20):

I'm trying to plot spatial means and sums of various PFT-level variables over time as line plots rather than spatial maps. Step 1 is getting them into an array with only a time dimension, and the code seemingly allows me to do that, but that error is thrown whenever I try to look at or plot the data (e.g. putting it into a pandas dataframe to calculate a running mean)

view this post on Zulip Deepak Cherian (Jul 12 2022 at 14:58):

Oh hmm.. I found this dask issue

view this post on Zulip Deepak Cherian (Jul 12 2022 at 15:02):

Are you ever computing std or var in the pipeline? Or only mean?

view this post on Zulip James King (Jul 12 2022 at 15:58):

Yes I saw that too! Looks like it's a recent problem there. I'm only calculating means and sums with this code at the moment, depending on the variable

view this post on Zulip James King (Jul 12 2022 at 16:02):

This seems to work:

data1_yearmean = data1.groupby("time.year").mean()
data1_chunk = data1_yearmean.chunk({"year": 100})
ts1 = data1_chunk.mean(("lat", "lon"))
ts2 = ts1.compute()
ts3 = ts1.copy(data=ts1.data.todense())

though it's fairly slow

view this post on Zulip James King (Jul 12 2022 at 16:03):

(don't worry, the mean will be area-weighted now I seem to have a solution to implement!)

view this post on Zulip James King (Jul 13 2022 at 19:01):

ts3 = ts2.copy(data=ts2.data.todense())

Last updated: May 16 2025 at 17:14 UTC