Stream: python-questions

Topic: chunking while loading files


view this post on Zulip Anna-Lena Deppenmeier (Apr 02 2021 at 18:04):

Hi all,
I have an issue with chunking while loading my dataset. I read my datasets like this

def read_dat(files, variables, pop=False, chunks={'time':100, 'nlat':100, 'nlon':100, 'z_t':-1}):
    def preprocess(ds):
        return ds[variables].reset_coords(drop=True) # reset coords means they are reset as variables
    ds = xr.open_mfdataset(files, parallel=True, preprocess=preprocess,
                           chunks=chunks,
                           combine='by_coords')
    if pop==True:
        file0 = xr.open_dataset(files[0], chunks=chunks)
        ds.update(file0[['ULONG', 'ULAT', 'TLONG', 'TLAT']])
        file0.close()

    ds
    return ds

but somehow this doesn't result in the desired chunks (120MB each) when I use it

flist = glob.glob('/project/oce/deppenme/process-dat/more_years/Pac_POP0.1_JRA_IAF_*.nc')
ds_misc = read_dat(flist, ['TEMP', 'WVEL', 'UVEL', 'VVEL', 'DIA_IMPVF_TEMP', 'Q'], pop=True,
                  chunks={'time':100, 'nlat':100, 'nlon':100, 'z_t':-1})

pasted image
Any ideas how to better do this?
(I am currently trying chunking while just reading with xr.open_mfdataset() and it takes way too long)

view this post on Zulip Deepak Cherian (Apr 02 2021 at 18:09):

It looks like you have one file per timestep, so open_mfdataset cannot apply chunking along time. I would chunk the other 3 dimensions to make bigger chunks, if you can. If you really need to chunk along time, you'll have to do it after reading the files.

view this post on Zulip Anna-Lena Deppenmeier (Apr 02 2021 at 18:22):

Thanks! Trying other chunks to get to the desired chunk size.


Last updated: Jan 30 2022 at 12:01 UTC