Stream: python-questions

Topic: Reading in an ensemble with unequal # of files


view this post on Zulip Isla Simpson (Aug 12 2021 at 20:34):

Hello, To read in an ensemble of simulations I have been doing something like...

filelist = [sorted(glob.glob(topdir+"b.e11.BRCP85C5CNBDRD.f09_g16."+imem+".*.nc")) for imem in memstr]
dat = xr.open_mfdataset(filelist, combine="nested", concat_dim=['M','time'],
   coords='minimal', data_vars=['TREFHT', 'time_bnds'], compat='override')

where "memstr" is the list of the members. However, I'm running into issues when each member doesn't have the same number of files, even though it has the same time axis. e.g., if the record for member 1 is divided up into 2 and the record for member 2 is just one file.

I get the following error...

ValueError: The supplied objects do not form a hypercube because sub-lists do not have consistent depths

It seems related to this post https://github.com/pydata/xarray/issues/3648 but I'm not following what the solution would be in the context of open_mfdataset.

Thanks in advance for any help on this.
Isla

view this post on Zulip Deepak Cherian (Aug 12 2021 at 21:35):

Can you try using combine="by_coords"? that will try to do something sensible by looking at coordinate values along dimensions. It sounds like this could work for your case.

view this post on Zulip Isla Simpson (Aug 12 2021 at 22:09):

Hmm, that doesn't seem to work. If I simply switch to combine="by_coords" in the above, it gives me an array of (time,lat,lon) whereas I'm looking to get an array of (M,time,lat,lon) where M would be the member number. I'm not quite sure what it's doing with all the other members in this case. I'm showing a simplified example with only two members in the attached png concatissue.png

Thanks!

view this post on Zulip Isla Simpson (Aug 12 2021 at 22:22):

I could, of course, just do a loop over the members and fill the M dimension of an empty xarray data array, but I found it very convenient that open_mfdataset would do that for me if I had an equal number of files for each member.

view this post on Zulip Deepak Cherian (Aug 12 2021 at 22:24):

is M an existing dimension in these files?

view this post on Zulip Isla Simpson (Aug 12 2021 at 22:27):

No, it's not. It's the new dimension that I wanted to create for the different members. But if I remove M from the concat_dims the same thing happens. I'm basically trying to read in those two lists of files that are dat1(time,lat,lon) and dat2(time,lat,lon) and end up with dat(M,time,lat,lon) where M as size 2 and M=0 will be dat1 and M=1 will be dat2.

view this post on Zulip Deepak Cherian (Aug 12 2021 at 22:33):

ah I think this is the case open_mfdataset cannot handle yet

I think you want something like

members = [xr.open_mfdataset(single_member, combine="nested", concat_dim="time") for single_member in file_list]
xr.concat(members, dim="M")

Alternatively you could try writing a pre-process function that assigns a member number based on file name together with by_coords

view this post on Zulip Isla Simpson (Aug 12 2021 at 22:34):

Ah thanks. You solution is more elegant than the loop that I was thinking of. That will do nicely! Thanks once again for your help!

view this post on Zulip Isla Simpson (Aug 13 2021 at 19:23):

Sorry to bother again but I'm having a problem which I thought I knew how to solve but I'm failing. An issue is that some of the members have slightly different latitudes far down the decimal places. Normally I would set compat='override' and coords='minimal' and this then uses the latitudes from the first dataset. But for this case that's not working. If I do...

filelist = [sorted(glob.glob(topdir+"b.e11.B20TRC5CNBDRD.f09_g16."+imem+".*.nc")) for imem in memstr]
members = [ xr.open_mfdataset(i, combine="nested", compat='override', coords='minimal', concat_dim=['time']) for i in filelist ]
dat = xr.concat(members, dim='M', data_vars=['TREFHT','time_bnds'],
   compat='override',coords='minimal', combine_attrs='override')

even though all the files have 192 latitudes, I end up with 289 latitudes in my dat dataset because it's merging all the latitudes that aren't equal to the original ones. Any idea why the compat='override' and coords='minimal' options wouldn't work to avoid this issue in this case?

view this post on Zulip Deepak Cherian (Aug 13 2021 at 20:00):

If latitude is a dimension coordinate, then you want join="override". compat controls non-dimension coordinate variables.

view this post on Zulip Isla Simpson (Aug 13 2021 at 20:09):

Ohhh, yup - that was it. Awesome. Thanks!


Last updated: Jan 30 2022 at 12:01 UTC