Stream: python-questions
Topic: Basic read of an ensemble variable
Andrew Gettelman (Aug 17 2021 at 22:32):
Greetings, sorry for the silly question, but I'm having a really hard time trying to read an ensemble of files. I'm just trying to read timeseries files of a variable which is a standard 1deg atmosphere variable, 60 timesamples but 250 ensemble members.
I can do a standard loop with single level variables, but with a multiple level variable, the whole thing comes to a halt.
So I make an array of files and case names and tried to apply some dask techniques. I thought it worked by now it keeps failing. And it never seems to use more than 1GB of memory, which may be the problem.
The read snippet is here:
varn=['CCN3']
#varn=['TGCLDLWP','SWCF','ACTNL','ACTREL']
nv=len(varn)
Write=False
for r in range(numens):
prefix=''
ne=neall[r]
for v in range(nv):
print(varn[v])
files=[]
cases=[]
for d in range(ne):
enum=str(d+1).rjust(3,'0')
ecase=prefix+ens[r]+'.'+enum
infile=pth+ecase+psuffix+'cc_'+ecase+'.h0.'+varn[v]+'_comp2.nc'
files.append(infile)
cases.append(ecase)
dset = xr.open_mfdataset(sorted(files), concat_dim='ensemble',
combine="nested", parallel=True, data_vars=[varn[v]],
engine="netcdf4", chunks={'time': 10})
# Add coordinate labels for the newly created `ensemble_member` dimension
dset["ensemble"] = cases
# Vertical sum...
if (varn[v]=='CCN3' or varn[v]=='so4_a1' or varn[v]=='soa_a1'):
dout=dset[varn[v]].sum(dim='lev').mean(dim='lon')
## Single level (or no sum)
else:
dout=dset[varn[v]].mean(dim='lon')
del dset,cases,files
## Write
if write:
dout.load().to_netcdf(opth+'zmall_'+ens[r]+'.'+varn[v]+'.nc')
And the whole notebook is here:
/glade/u/home/andrew/python/ppe/read_ppe.ipynb
Why is this so tough. What am I doing wrong?
Thanks for the help.
Andrew
Anderson Banihirwe (Aug 17 2021 at 23:08):
@Andrew Gettelman, it appears accessing data in question is restricted i.e. I'm getting a permission denied error
. Are you able to adjust the read permissions?
src/netCDF4/_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success()
PermissionError: [Errno 13] Permission denied: b'/glade/campaign/cgd/projects/ppe/cam_ppe/PPE_250/control/control_timeseries/PPE_250_ensemble.001/atm/hist/cc_PPE_250_ensemble.001.h0.CCN3_comp2.nc'
Andrew Gettelman (Aug 17 2021 at 23:25):
Thanks Anderson. Sorry about that. I cannot change the permissions, but let me see if I can get them changed for read access.
Andrew Gettelman (Aug 18 2021 at 02:25):
@Anderson Banihirwe , the permissions have been changed. Note that after some further digging the variable 'CCN3' does not work, but another variable (e.g. varn='soa_a1') does work. I'm wondering if I can speed it up more than I have, or there is a better way. Thanks!
Deepak Cherian (Aug 20 2021 at 02:17):
Usually this means that the coordinate variables in your files don't line up exactly. (the various warnings about splitting large chunks are a sign that something has gone wrong and you need to investigate).
I checked this by adding join="exact"
to your open_mfdataset
call. That raised an error saying that the values for lat
in the different files didn't agree exactly. (unfortunately xarray is a bit stupid here for now and doesn't allow you to change tolerance for this comparison yet)
I then read two different files at random, it looked like the all dimensions: lat, lon, time, lev
are exactly the same and all you are doing is concatenating along a new dimension ensemble
.
So I set join="override"
... this avoids any comparisons, only uses coordinate values from the first file, and only checks that sizes along all dimensions are the same in each file (which is True).
Then I can read CCN3
.
Andrew Gettelman (Aug 20 2021 at 19:45):
Thanks Deepak! This is very helpful to know. I appreciate the help very much!
Last updated: Jan 30 2022 at 12:01 UTC