Reading in CESM2-LE ensemble data · python-questions

Reading in CESM2-LE ensemble data
Hi! I am a bit stuck with reading in CESM2-LE ensemble data. Is there some standard script for opening it, as there should be pretty many users:). Best, Erko

Julia Kent (Oct 05 2022 at 15:36):

Philip Chmielowiec (Oct 05 2022 at 16:02):

Max Grover (Oct 05 2022 at 16:23):

Max Grover (Oct 05 2022 at 16:24):

Philip Chmielowiec (Oct 05 2022 at 16:26):

One thing to also consider is whether you will be working on the NCAR HPC environment or not, since the data is hosted on /glade/.

Max Grover (Oct 05 2022 at 16:29):

Path to the the catalog on GLADE /glade/collections/cmip/catalog/intake-esm-datastore/catalogs/glade-cesm2-le.json

Erko Jakobson (Oct 06 2022 at 18:50):

catalog = intake.open_esm_datastore(
'/glade/collections/cmip/catalog/intake-esm-datastore/catalogs/glade-cesm2-le.json'
)

gives errorcode: FileNotFoundError: [Errno 2] No such file or directory: 'glade-cesm2-le.csv.gz'
But the file is in the same folder. Any idea where might be the problem?

Philip Chmielowiec (Oct 06 2022 at 19:42):

Erko Jakobson (Oct 06 2022 at 19:47):

Philip Chmielowiec (Oct 06 2022 at 19:50):

That's strange, I was working with the data earlier and didn't have this error (On Casper though, but shouldn't matter)

Erko Jakobson (Oct 06 2022 at 20:51):

Tried Casper, but I didn't find suitable kernel for importing intake and NCARCluster.
Under Cheyenne, I use kernel "Notebook Gallery 2019.12" and the code is following:

%matplotlib inline
import warnings
warnings.filterwarnings("ignore")
import intake
import numpy as np
import pandas as pd
import xarray as xr
import hvplot.pandas, hvplot.xarray
import holoviews as hv
from distributed import LocalCluster, Client
from ncar_jobqueue import NCARCluster
hv.extension('bokeh')

catalog = intake.open_esm_datastore(
'/glade/collections/cmip/catalog/intake-esm-datastore/catalogs/glade-cesm2-le.json'
)

FileNotFoundError Traceback (most recent call last)
<ipython-input-3-22448c0d5d6a> in <module>
1 catalog = intake.open_esm_datastore(
----> 2 '/glade/collections/cmip/catalog/intake-esm-datastore/catalogs/glade-cesm2-le.json'
3 )
4 catalog

/ncar/usr/jupyterhub/envs/notebook-gallery-2019.12/lib/python3.7/site-packages/intake_esm/core.py in __init__(self, esmcol_path, progressbar, log_level, **kwargs)
79 self.progressbar = progressbar
80 self._col_data = _fetch_and_parse_file(esmcol_path)
---> 81 self.df = self._fetch_catalog()
82 self._entries = {}
83 self.urlpath = ''

/ncar/usr/jupyterhub/envs/notebook-gallery-2019.12/lib/python3.7/site-packages/intake_esm/core.py in _fetch_catalog(self)
127 """Get the catalog file and cache it.
128 """
--> 129 return pd.read_csv(self._col_data['catalog_file'])
130
131 def serialize(self, name, directory=None):

/ncar/usr/jupyterhub/envs/notebook-gallery-2019.12/lib/python3.7/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)
683 )
684
--> 685 return _read(filepath_or_buffer, kwds)
686
687 parser_f.__name__ = name

/ncar/usr/jupyterhub/envs/notebook-gallery-2019.12/lib/python3.7/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
455
456 # Create the parser.
--> 457 parser = TextFileReader(fp_or_buf, **kwds)
458
459 if chunksize or iterator:

/ncar/usr/jupyterhub/envs/notebook-gallery-2019.12/lib/python3.7/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
893 self.options["has_index_names"] = kwds["has_index_names"]
894
--> 895 self._make_engine(self.engine)
896
897 def close(self):

/ncar/usr/jupyterhub/envs/notebook-gallery-2019.12/lib/python3.7/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
1133 def _make_engine(self, engine="c"):
1134 if engine == "c":
-> 1135 self._engine = CParserWrapper(self.f, **self.options)
1136 else:
1137 if engine == "python":

/ncar/usr/jupyterhub/envs/notebook-gallery-2019.12/lib/python3.7/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
1915 kwds["usecols"] = self.usecols
1916
-> 1917 self._reader = parsers.TextReader(src, **kwds)
1918 self.unnamed_cols = self._reader.unnamed_cols
1919

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()

/ncar/usr/jupyterhub/envs/notebook-gallery-2019.12/lib/python3.7/gzip.py in __init__(self, filename, mode, compresslevel, fileobj, mtime)
161 mode += 'b'
162 if fileobj is None:
--> 163 fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
164 if filename is None:
165 filename = getattr(fileobj, 'name', '')

Philip Chmielowiec (Oct 06 2022 at 20:57):

Try creating your own conda enviroment and use that as your kernel. I presume that the 2019 kernel might be a bit outdated, which could be leading to some issues with intake not recognizing/finding certain files.

Philip Chmielowiec (Oct 06 2022 at 21:00):

Let me know if you need help setting it up and getting it working with JupyterHub! The two additional packages I install wheneve setting up a enviroment on Casper/Cheyenne is ipykernel and nb_conda_kernels

Erko Jakobson (Oct 06 2022 at 21:09):

As you might have guessed - I have never created own conda or kernel. How to do it?

Julia Kent (Oct 06 2022 at 21:17):

Julia Kent (Oct 06 2022 at 21:19):

If you do have a file you can type in the terminal conda env create --file environment.txt (or whatever the name of your file is)

If not type conda create --name myenv python where myenv is the name of your new environment.

Julia Kent (Oct 06 2022 at 21:20):

Michael Levy (Oct 06 2022 at 21:31):

This is great advice from Julia, I would just insert a step between her two suggestions:

Run conda activate myenv to switch from the (base) environment to your new environment

Michael Levy (Oct 06 2022 at 21:33):

And I would't be surprised if the issue you were running into is that the version of intake-esm in Notebook Gallery 2019.12 expects a full pathname for catalog_file; I don't remember when Anderson dropped that requirement for how the json file is structured. So another possibility would be to copy /glade/collections/cmip/catalog/intake-esm-datastore/catalogs/glade-cesm2-le.json to your home directory or work directory, and then edit the third line to add the full path to the catalog

glade-cesm2-le.json
{
-  "catalog_file": "glade-cesm2-le.csv.gz",
+  "catalog_file": "/glade/collections/cmip/catalog/intake-esm-datastore/catalogs/glade-cesm2-le.csv.gz",

Erko Jakobson (Oct 07 2022 at 13:13):

Conda create resulted for some reason with unexpected error. But changing the catalog_file address in glade-cesm2-le.json worked and I got lists, thank you.
Still – there is a new problem:
dsets = catalog_subset.to_dataset_dict(storage_options={'anon':True})
gave FileNotFoundError: [Errno 2] No such file or directory: b'/glade/campaign/cgd/cesm/CESM2-LE/timeseries/atm/proc/tseries/day_1/TREFHT/b.e21.BSSP370cmip6.f09_g17.LE2-1181.010.cam.h1.TREFHT.20150101-20241231.nc'

And indeed – the address is incorrect, it feels that the glade-cesm2-le.csv.gz is outdated. How to solve it?

Michael Levy (Oct 07 2022 at 14:10):

@Erko Jakobson the file exists, and it looks like it should be globally readable:

$ ls -l /glade/campaign/cgd/cesm/CESM2-LE/timeseries/atm/proc/tseries/day_1/TREFHT/b.e21.BSSP370cmip6.f09_g17.LE2-1181.010.cam.h1.TREFHT.20150101-20241231.nc
-rw-r--r--+ 1 strandwg cesm 461900521 Mar 25  2021 /glade/campaign/cgd/cesm/CESM2-LE/timeseries/atm/proc/tseries/day_1/TREFHT/b.e21.BSSP370cmip6.f09_g17.LE2-1181.010.cam.h1.TREFHT.20150101-20241231.nc

are you running your notebook on cheyenne or casper? Cheyenne does not mount campaign storage, so you need to be on casper to access this data -- if you're having trouble on casper, I can make sure permissions are set correctly on all the subdirectories in the path

Deepak Cherian (Oct 07 2022 at 15:33):

This seems like it might be easier to solve during an office hours appointment. Erko, do you still have a UCAR login?

Julia Kent (Oct 07 2022 at 16:26):

I just opened a PR to unhide office hours so, once merged, Erko will no longer need a UCAR login to be able to schedule an appointment

Julia Kent (Oct 07 2022 at 16:41):

Erko Jakobson (Oct 10 2022 at 12:52):

I have UCAR login. I worked in Cheyenne, as in Casper I get error with "from ncar_jobqueue import NCARCluster". And I made appointment for online help.

Michael Levy (Oct 10 2022 at 14:06):

If you're using an environment from 2019, ncar_jobqueue is probably configured to use the SLURM queue manager for casper but CISL moved casper over to PBS (same queue manager that cheyenne has always used). You could try using from dask_jobqueue import PBSCluster but it would probably be best to walk through that change during your office hours appointment

Heather Craker (Oct 11 2022 at 17:54):

Hi all. Erko made an ESDS office hour appointment with me, and I also suggested using the PBSCluster. Unfortunately we ran into some errors with that approach. I'm not very familiar with working on the HPC systems, so I'm not of much help. I'm looking around online for a solution, but in the meantime if someone else wants to have a go at this during an office hour appointment, please do.

Stream: python-questions

Topic: Reading in CESM2-LE ensemble data

Erko Jakobson (Oct 05 2022 at 15:33):