Basic read of an ensemble variable · python-questions

Greetings, sorry for the silly question, but I'm having a really hard time trying to read an ensemble of files. I'm just trying to read timeseries files of a variable which is a standard 1deg atmosphere variable, 60 timesamples but 250 ensemble members.

I can do a standard loop with single level variables, but with a multiple level variable, the whole thing comes to a halt.

So I make an array of files and case names and tried to apply some dask techniques. I thought it worked by now it keeps failing. And it never seems to use more than 1GB of memory, which may be the problem.

varn=['CCN3']
#varn=['TGCLDLWP','SWCF','ACTNL','ACTREL']
nv=len(varn)

Write=False

for r in range(numens):
    prefix=''
    ne=neall[r]
    for v in range(nv):
        print(varn[v])
        files=[]
        cases=[]
        for d in range(ne):
            enum=str(d+1).rjust(3,'0')
            ecase=prefix+ens[r]+'.'+enum
            infile=pth+ecase+psuffix+'cc_'+ecase+'.h0.'+varn[v]+'_comp2.nc'
            files.append(infile)
            cases.append(ecase)
        dset = xr.open_mfdataset(sorted(files), concat_dim='ensemble',
                         combine="nested", parallel=True, data_vars=[varn[v]],
                         engine="netcdf4", chunks={'time': 10})
# Add coordinate labels for the newly created `ensemble_member` dimension
        dset["ensemble"] = cases

# Vertical sum...
        if (varn[v]=='CCN3' or varn[v]=='so4_a1' or varn[v]=='soa_a1'):
            dout=dset[varn[v]].sum(dim='lev').mean(dim='lon')
## Single level (or no sum)
        else:
            dout=dset[varn[v]].mean(dim='lon')
        del dset,cases,files
## Write
        if write:
            dout.load().to_netcdf(opth+'zmall_'+ens[r]+'.'+varn[v]+'.nc')

Anderson Banihirwe (Aug 17 2021 at 23:08):

@Andrew Gettelman, it appears accessing data in question is restricted i.e. I'm getting a permission denied error. Are you able to adjust the read permissions?

src/netCDF4/_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success()

PermissionError: [Errno 13] Permission denied: b'/glade/campaign/cgd/projects/ppe/cam_ppe/PPE_250/control/control_timeseries/PPE_250_ensemble.001/atm/hist/cc_PPE_250_ensemble.001.h0.CCN3_comp2.nc'

Andrew Gettelman (Aug 17 2021 at 23:25):

Thanks Anderson. Sorry about that. I cannot change the permissions, but let me see if I can get them changed for read access.

Andrew Gettelman (Aug 18 2021 at 02:25):

@Anderson Banihirwe , the permissions have been changed. Note that after some further digging the variable 'CCN3' does not work, but another variable (e.g. varn='soa_a1') does work. I'm wondering if I can speed it up more than I have, or there is a better way. Thanks!

Deepak Cherian (Aug 20 2021 at 02:17):

Usually this means that the coordinate variables in your files don't line up exactly. (the various warnings about splitting large chunks are a sign that something has gone wrong and you need to investigate).

I checked this by adding join="exact" to your open_mfdataset call. That raised an error saying that the values for lat in the different files didn't agree exactly. (unfortunately xarray is a bit stupid here for now and doesn't allow you to change tolerance for this comparison yet)

I then read two different files at random, it looked like the all dimensions: lat, lon, time, lev are exactly the same and all you are doing is concatenating along a new dimension ensemble.

So I set join="override"... this avoids any comparisons, only uses coordinate values from the first file, and only checks that sizes along all dimensions are the same in each file (which is True).

Stream: python-questions

Topic: Basic read of an ensemble variable

Andrew Gettelman (Aug 17 2021 at 22:32):

Anderson Banihirwe (Aug 17 2021 at 23:08):

Andrew Gettelman (Aug 17 2021 at 23:25):

Andrew Gettelman (Aug 18 2021 at 02:25):

Deepak Cherian (Aug 20 2021 at 02:17):

Andrew Gettelman (Aug 20 2021 at 19:45):