Stream: python-questions

Topic: Reading in CESM2-LE ensemble data


view this post on Zulip Erko Jakobson (Oct 05 2022 at 15:33):

Reading in CESM2-LE ensemble data
Hi! I am a bit stuck with reading in CESM2-LE ensemble data. Is there some standard script for opening it, as there should be pretty many users:). Best, Erko

view this post on Zulip Julia Kent (Oct 05 2022 at 15:36):

This might be what you're looking for? https://cookbooks.projectpythia.org/cesm-lens-aws-cookbook/README.html

view this post on Zulip Philip Chmielowiec (Oct 05 2022 at 16:02):

@Erko Jakobson

https://ncar.github.io/esds/posts/2021/intake-cesm2-le-glade-example/

This should also be useful!

view this post on Zulip Max Grover (Oct 05 2022 at 16:23):

I would also suggest looking at one of these examples https://ncar.github.io/cesm2-le-aws/kay_et_al_lens2.html

view this post on Zulip Max Grover (Oct 05 2022 at 16:24):

You can create an interactive diagnostic plot that looks like this, with all the steps outlined Screen-Shot-2022-10-05-at-11.24.06-AM.png

view this post on Zulip Philip Chmielowiec (Oct 05 2022 at 16:26):

One thing to also consider is whether you will be working on the NCAR HPC environment or not, since the data is hosted on /glade/.

view this post on Zulip Max Grover (Oct 05 2022 at 16:29):

Path to the the catalog on GLADE /glade/collections/cmip/catalog/intake-esm-datastore/catalogs/glade-cesm2-le.json

view this post on Zulip Erko Jakobson (Oct 06 2022 at 18:50):

Thank you all for your help. Still there is a strange error with the code:

catalog = intake.open_esm_datastore(
'/glade/collections/cmip/catalog/intake-esm-datastore/catalogs/glade-cesm2-le.json'
)

gives errorcode: FileNotFoundError: [Errno 2] No such file or directory: 'glade-cesm2-le.csv.gz'
But the file is in the same folder. Any idea where might be the problem?

view this post on Zulip Philip Chmielowiec (Oct 06 2022 at 19:42):

@Erko Jakobson

Are you working locally or on Casper/Cheyenne?

view this post on Zulip Erko Jakobson (Oct 06 2022 at 19:47):

I am working on Cheyenne

view this post on Zulip Philip Chmielowiec (Oct 06 2022 at 19:50):

That's strange, I was working with the data earlier and didn't have this error (On Casper though, but shouldn't matter)

view this post on Zulip Erko Jakobson (Oct 06 2022 at 20:51):

Tried Casper, but I didn't find suitable kernel for importing intake and NCARCluster.
Under Cheyenne, I use kernel "Notebook Gallery 2019.12" and the code is following:

%matplotlib inline
import warnings
warnings.filterwarnings("ignore")
import intake
import numpy as np
import pandas as pd
import xarray as xr
import hvplot.pandas, hvplot.xarray
import holoviews as hv
from distributed import LocalCluster, Client
from ncar_jobqueue import NCARCluster
hv.extension('bokeh')

cluster = NCARCluster()
cluster.scale(40)
client = Client(cluster)

catalog = intake.open_esm_datastore(
'/glade/collections/cmip/catalog/intake-esm-datastore/catalogs/glade-cesm2-le.json'
)

And the whole error code is:


FileNotFoundError Traceback (most recent call last)
<ipython-input-3-22448c0d5d6a> in <module>
1 catalog = intake.open_esm_datastore(
----> 2 '/glade/collections/cmip/catalog/intake-esm-datastore/catalogs/glade-cesm2-le.json'
3 )
4 catalog

/ncar/usr/jupyterhub/envs/notebook-gallery-2019.12/lib/python3.7/site-packages/intake_esm/core.py in __init__(self, esmcol_path, progressbar, log_level, **kwargs)
79 self.progressbar = progressbar
80 self._col_data = _fetch_and_parse_file(esmcol_path)
---> 81 self.df = self._fetch_catalog()
82 self._entries = {}
83 self.urlpath = ''

/ncar/usr/jupyterhub/envs/notebook-gallery-2019.12/lib/python3.7/site-packages/intake_esm/core.py in _fetch_catalog(self)
127 """Get the catalog file and cache it.
128 """
--> 129 return pd.read_csv(self._col_data['catalog_file'])
130
131 def serialize(self, name, directory=None):

/ncar/usr/jupyterhub/envs/notebook-gallery-2019.12/lib/python3.7/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)
683 )
684
--> 685 return _read(filepath_or_buffer, kwds)
686
687 parser_f.__name__ = name

/ncar/usr/jupyterhub/envs/notebook-gallery-2019.12/lib/python3.7/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
455
456 # Create the parser.
--> 457 parser = TextFileReader(fp_or_buf, **kwds)
458
459 if chunksize or iterator:

/ncar/usr/jupyterhub/envs/notebook-gallery-2019.12/lib/python3.7/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
893 self.options["has_index_names"] = kwds["has_index_names"]
894
--> 895 self._make_engine(self.engine)
896
897 def close(self):

/ncar/usr/jupyterhub/envs/notebook-gallery-2019.12/lib/python3.7/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
1133 def _make_engine(self, engine="c"):
1134 if engine == "c":
-> 1135 self._engine = CParserWrapper(self.f, **self.options)
1136 else:
1137 if engine == "python":

/ncar/usr/jupyterhub/envs/notebook-gallery-2019.12/lib/python3.7/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
1915 kwds["usecols"] = self.usecols
1916
-> 1917 self._reader = parsers.TextReader(src, **kwds)
1918 self.unnamed_cols = self._reader.unnamed_cols
1919

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()

/ncar/usr/jupyterhub/envs/notebook-gallery-2019.12/lib/python3.7/gzip.py in __init__(self, filename, mode, compresslevel, fileobj, mtime)
161 mode += 'b'
162 if fileobj is None:
--> 163 fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
164 if filename is None:
165 filename = getattr(fileobj, 'name', '')

FileNotFoundError: [Errno 2] No such file or directory: 'glade-cesm2-le.csv.gz'

view this post on Zulip Philip Chmielowiec (Oct 06 2022 at 20:57):

Try creating your own conda enviroment and use that as your kernel. I presume that the 2019 kernel might be a bit outdated, which could be leading to some issues with intake not recognizing/finding certain files.

view this post on Zulip Philip Chmielowiec (Oct 06 2022 at 21:00):

Let me know if you need help setting it up and getting it working with JupyterHub! The two additional packages I install wheneve setting up a enviroment on Casper/Cheyenne is ipykernel and nb_conda_kernels

view this post on Zulip Erko Jakobson (Oct 06 2022 at 21:09):

As you might have guessed - I have never created own conda or kernel. How to do it?

view this post on Zulip Julia Kent (Oct 06 2022 at 21:17):

Does your project have an environment file you can start with?

Here is a useful Conda cheatsheet

view this post on Zulip Julia Kent (Oct 06 2022 at 21:19):

If you do have a file you can type in the terminal conda env create --file environment.txt (or whatever the name of your file is)

If not type conda create --name myenv python where myenv is the name of your new environment.

view this post on Zulip Julia Kent (Oct 06 2022 at 21:20):

Then download any desired package with conda install packagename

view this post on Zulip Michael Levy (Oct 06 2022 at 21:31):

This is great advice from Julia, I would just insert a step between her two suggestions:

Julia Kent said:

If not type conda create --name myenv python where myenv is the name of your new environment.

Run conda activate myenv to switch from the (base) environment to your new environment

Julia Kent said:

Then download any desired package with conda install packagename

view this post on Zulip Michael Levy (Oct 06 2022 at 21:33):

And I would't be surprised if the issue you were running into is that the version of intake-esm in Notebook Gallery 2019.12 expects a full pathname for catalog_file; I don't remember when Anderson dropped that requirement for how the json file is structured. So another possibility would be to copy /glade/collections/cmip/catalog/intake-esm-datastore/catalogs/glade-cesm2-le.json to your home directory or work directory, and then edit the third line to add the full path to the catalog

glade-cesm2-le.json
{
-  "catalog_file": "glade-cesm2-le.csv.gz",
+  "catalog_file": "/glade/collections/cmip/catalog/intake-esm-datastore/catalogs/glade-cesm2-le.csv.gz",

you should be able to read that json file with the older environment

view this post on Zulip Erko Jakobson (Oct 07 2022 at 13:13):

Conda create resulted for some reason with unexpected error. But changing the catalog_file address in glade-cesm2-le.json worked and I got lists, thank you.
Still – there is a new problem:
dsets = catalog_subset.to_dataset_dict(storage_options={'anon':True})
gave FileNotFoundError: [Errno 2] No such file or directory: b'/glade/campaign/cgd/cesm/CESM2-LE/timeseries/atm/proc/tseries/day_1/TREFHT/b.e21.BSSP370cmip6.f09_g17.LE2-1181.010.cam.h1.TREFHT.20150101-20241231.nc'

And indeed – the address is incorrect, it feels that the glade-cesm2-le.csv.gz is outdated. How to solve it?

view this post on Zulip Michael Levy (Oct 07 2022 at 14:10):

@Erko Jakobson the file exists, and it looks like it should be globally readable:

$ ls -l /glade/campaign/cgd/cesm/CESM2-LE/timeseries/atm/proc/tseries/day_1/TREFHT/b.e21.BSSP370cmip6.f09_g17.LE2-1181.010.cam.h1.TREFHT.20150101-20241231.nc
-rw-r--r--+ 1 strandwg cesm 461900521 Mar 25  2021 /glade/campaign/cgd/cesm/CESM2-LE/timeseries/atm/proc/tseries/day_1/TREFHT/b.e21.BSSP370cmip6.f09_g17.LE2-1181.010.cam.h1.TREFHT.20150101-20241231.nc

are you running your notebook on cheyenne or casper? Cheyenne does not mount campaign storage, so you need to be on casper to access this data -- if you're having trouble on casper, I can make sure permissions are set correctly on all the subdirectories in the path

view this post on Zulip Deepak Cherian (Oct 07 2022 at 15:33):

This seems like it might be easier to solve during an office hours appointment. Erko, do you still have a UCAR login?

view this post on Zulip Julia Kent (Oct 07 2022 at 16:26):

I just opened a PR to unhide office hours so, once merged, Erko will no longer need a UCAR login to be able to schedule an appointment

view this post on Zulip Julia Kent (Oct 07 2022 at 16:41):

@Erko Jakobson You can make an appointment here: https://ncar.github.io/esds/office-hours/

view this post on Zulip Erko Jakobson (Oct 10 2022 at 12:52):

I have UCAR login. I worked in Cheyenne, as in Casper I get error with "from ncar_jobqueue import NCARCluster". And I made appointment for online help.

view this post on Zulip Michael Levy (Oct 10 2022 at 14:06):

Erko Jakobson said:

I have UCAR login. I worked in Cheyenne, as in Casper I get error with "from ncar_jobqueue import NCARCluster". And I made appointment for online help.

If you're using an environment from 2019, ncar_jobqueue is probably configured to use the SLURM queue manager for casper but CISL moved casper over to PBS (same queue manager that cheyenne has always used). You could try using from dask_jobqueue import PBSCluster but it would probably be best to walk through that change during your office hours appointment

view this post on Zulip Heather Craker (Oct 11 2022 at 17:54):

Hi all. Erko made an ESDS office hour appointment with me, and I also suggested using the PBSCluster. Unfortunately we ran into some errors with that approach. I'm not very familiar with working on the HPC systems, so I'm not of much help. I'm looking around online for a solution, but in the meantime if someone else wants to have a go at this during an office hour appointment, please do.


Last updated: May 16 2025 at 17:14 UTC