Stream: dask

Topic: MergeError when opening multiple files with dask


view this post on Zulip Mira Berdahl (Jun 02 2022 at 23:34):

Hi,

I have a method that I've used without issue to open multiple ocean temperature files and sort them.

The way I read my data in is as follows:

READ TEMP iTrace runs

ddir = '/glade/scratch/mberdahl/iTrace/Ocean/TEMP/'
dfiles = sorted(glob(ddir + '.TEMP..nc')) # use sorted to make sure the files are in order for concatenation
mfds4 = xr.open_mfdataset(dfiles, combine='by_coords', parallel=True , chunks={'time': 6}, data_vars=['TEMP', 'time_bound'], decode_times=False)
mfds4 = xr.decode_cf(fixmonth(mfds4))
TEMP_iTrace = mfds4.TEMP
##################################################################################

I just tried using this with a new dataset (iTrace runs), and running into a MergeError. The error says:
MergeError: conflicting values for variable 'REGION_MASK' on objects to be combined. You can skip this check by specifying compat='override'.

I've tried the suggestion to add compat='override' but it does not help, instead producing a new error:
ValueError: Cannot specify both coords='different' and compat='override'.

Does anyone have advice on how to overcome this? If I process the data in smaller subsets (every 1000 years) the error does not appear, but then I still have to merge them after so it doesn't exactly help. Thanks for any help!

view this post on Zulip Deepak Cherian (Jun 03 2022 at 15:19):

For some reason the REGION_MASK variable is different in these files. To pass compat="override" you'll have to pass coords="minimal" too. This will ignore any differences in REGION_MASK and other coordinate variables and just pick values from the first file. By default coords="different", which means xarray will try to see if the values are different in different files. If so, it will add a new dimension and concatenate them.

view this post on Zulip Mira Berdahl (Jun 03 2022 at 16:36):

Looks like this works now -- thanks so much @Deepak Cherian


Last updated: May 16 2025 at 17:14 UTC