Reading in CESM2-LE ensemble data
Hi! I am a bit stuck with reading in CESM2-LE ensemble data. Is there some standard script for opening it, as there should be pretty many users:). Best, Erko
This might be what you're looking for? https://cookbooks.projectpythia.org/cesm-lens-aws-cookbook/README.html
@Erko Jakobson
https://ncar.github.io/esds/posts/2021/intake-cesm2-le-glade-example/
This should also be useful!
I would also suggest looking at one of these examples https://ncar.github.io/cesm2-le-aws/kay_et_al_lens2.html
You can create an interactive diagnostic plot that looks like this, with all the steps outlined Screen-Shot-2022-10-05-at-11.24.06-AM.png
One thing to also consider is whether you will be working on the NCAR HPC environment or not, since the data is hosted on /glade/.
Path to the the catalog on GLADE /glade/collections/cmip/catalog/intake-esm-datastore/catalogs/glade-cesm2-le.json
Thank you all for your help. Still there is a strange error with the code:
catalog = intake.open_esm_datastore(
'/glade/collections/cmip/catalog/intake-esm-datastore/catalogs/glade-cesm2-le.json'
)
gives errorcode: FileNotFoundError: [Errno 2] No such file or directory: 'glade-cesm2-le.csv.gz'
But the file is in the same folder. Any idea where might be the problem?
@Erko Jakobson
Are you working locally or on Casper/Cheyenne?
I am working on Cheyenne
That's strange, I was working with the data earlier and didn't have this error (On Casper though, but shouldn't matter)
Tried Casper, but I didn't find suitable kernel for importing intake and NCARCluster.
Under Cheyenne, I use kernel "Notebook Gallery 2019.12" and the code is following:
%matplotlib inline
import warnings
warnings.filterwarnings("ignore")
import intake
import numpy as np
import pandas as pd
import xarray as xr
import hvplot.pandas, hvplot.xarray
import holoviews as hv
from distributed import LocalCluster, Client
from ncar_jobqueue import NCARCluster
hv.extension('bokeh')
cluster = NCARCluster()
cluster.scale(40)
client = Client(cluster)
catalog = intake.open_esm_datastore(
'/glade/collections/cmip/catalog/intake-esm-datastore/catalogs/glade-cesm2-le.json'
)
And the whole error code is:
FileNotFoundError Traceback (most recent call last)
<ipython-input-3-22448c0d5d6a> in <module>
1 catalog = intake.open_esm_datastore(
----> 2 '/glade/collections/cmip/catalog/intake-esm-datastore/catalogs/glade-cesm2-le.json'
3 )
4 catalog
/ncar/usr/jupyterhub/envs/notebook-gallery-2019.12/lib/python3.7/site-packages/intake_esm/core.py in __init__(self, esmcol_path, progressbar, log_level, **kwargs)
79 self.progressbar = progressbar
80 self._col_data = _fetch_and_parse_file(esmcol_path)
---> 81 self.df = self._fetch_catalog()
82 self._entries = {}
83 self.urlpath = ''
/ncar/usr/jupyterhub/envs/notebook-gallery-2019.12/lib/python3.7/site-packages/intake_esm/core.py in _fetch_catalog(self)
127 """Get the catalog file and cache it.
128 """
--> 129 return pd.read_csv(self._col_data['catalog_file'])
130
131 def serialize(self, name, directory=None):
/ncar/usr/jupyterhub/envs/notebook-gallery-2019.12/lib/python3.7/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)
683 )
684
--> 685 return _read(filepath_or_buffer, kwds)
686
687 parser_f.__name__ = name
/ncar/usr/jupyterhub/envs/notebook-gallery-2019.12/lib/python3.7/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
455
456 # Create the parser.
--> 457 parser = TextFileReader(fp_or_buf, **kwds)
458
459 if chunksize or iterator:
/ncar/usr/jupyterhub/envs/notebook-gallery-2019.12/lib/python3.7/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
893 self.options["has_index_names"] = kwds["has_index_names"]
894
--> 895 self._make_engine(self.engine)
896
897 def close(self):
/ncar/usr/jupyterhub/envs/notebook-gallery-2019.12/lib/python3.7/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
1133 def _make_engine(self, engine="c"):
1134 if engine == "c":
-> 1135 self._engine = CParserWrapper(self.f, **self.options)
1136 else:
1137 if engine == "python":
/ncar/usr/jupyterhub/envs/notebook-gallery-2019.12/lib/python3.7/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
1915 kwds["usecols"] = self.usecols
1916
-> 1917 self._reader = parsers.TextReader(src, **kwds)
1918 self.unnamed_cols = self._reader.unnamed_cols
1919
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()
/ncar/usr/jupyterhub/envs/notebook-gallery-2019.12/lib/python3.7/gzip.py in __init__(self, filename, mode, compresslevel, fileobj, mtime)
161 mode += 'b'
162 if fileobj is None:
--> 163 fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
164 if filename is None:
165 filename = getattr(fileobj, 'name', '')
FileNotFoundError: [Errno 2] No such file or directory: 'glade-cesm2-le.csv.gz'
Try creating your own conda enviroment and use that as your kernel. I presume that the 2019 kernel might be a bit outdated, which could be leading to some issues with intake not recognizing/finding certain files.
Let me know if you need help setting it up and getting it working with JupyterHub! The two additional packages I install wheneve setting up a enviroment on Casper/Cheyenne is ipykernel
and nb_conda_kernels
As you might have guessed - I have never created own conda or kernel. How to do it?
Does your project have an environment file you can start with?
Here is a useful Conda cheatsheet
If you do have a file you can type in the terminal conda env create --file environment.txt
(or whatever the name of your file is)
If not type conda create --name myenv python
where myenv is the name of your new environment.
Then download any desired package with conda install packagename
This is great advice from Julia, I would just insert a step between her two suggestions:
Julia Kent said:
If not type
conda create --name myenv python
where myenv is the name of your new environment.
Run conda activate myenv
to switch from the (base)
environment to your new environment
Julia Kent said:
Then download any desired package with
conda install packagename
And I would't be surprised if the issue you were running into is that the version of intake-esm
in Notebook Gallery 2019.12
expects a full pathname for catalog_file
; I don't remember when Anderson dropped that requirement for how the json file is structured. So another possibility would be to copy /glade/collections/cmip/catalog/intake-esm-datastore/catalogs/glade-cesm2-le.json
to your home directory or work directory, and then edit the third line to add the full path to the catalog
glade-cesm2-le.json
{
- "catalog_file": "glade-cesm2-le.csv.gz",
+ "catalog_file": "/glade/collections/cmip/catalog/intake-esm-datastore/catalogs/glade-cesm2-le.csv.gz",
you should be able to read that json file with the older environment
Conda create resulted for some reason with unexpected error. But changing the catalog_file address in glade-cesm2-le.json worked and I got lists, thank you.
Still – there is a new problem:
dsets = catalog_subset.to_dataset_dict(storage_options={'anon':True})
gave FileNotFoundError: [Errno 2] No such file or directory: b'/glade/campaign/cgd/cesm/CESM2-LE/timeseries/atm/proc/tseries/day_1/TREFHT/b.e21.BSSP370cmip6.f09_g17.LE2-1181.010.cam.h1.TREFHT.20150101-20241231.nc'
And indeed – the address is incorrect, it feels that the glade-cesm2-le.csv.gz is outdated. How to solve it?
@Erko Jakobson the file exists, and it looks like it should be globally readable:
$ ls -l /glade/campaign/cgd/cesm/CESM2-LE/timeseries/atm/proc/tseries/day_1/TREFHT/b.e21.BSSP370cmip6.f09_g17.LE2-1181.010.cam.h1.TREFHT.20150101-20241231.nc
-rw-r--r--+ 1 strandwg cesm 461900521 Mar 25 2021 /glade/campaign/cgd/cesm/CESM2-LE/timeseries/atm/proc/tseries/day_1/TREFHT/b.e21.BSSP370cmip6.f09_g17.LE2-1181.010.cam.h1.TREFHT.20150101-20241231.nc
are you running your notebook on cheyenne or casper? Cheyenne does not mount campaign storage, so you need to be on casper to access this data -- if you're having trouble on casper, I can make sure permissions are set correctly on all the subdirectories in the path
This seems like it might be easier to solve during an office hours appointment. Erko, do you still have a UCAR login?
I just opened a PR to unhide office hours so, once merged, Erko will no longer need a UCAR login to be able to schedule an appointment
@Erko Jakobson You can make an appointment here: https://ncar.github.io/esds/office-hours/
I have UCAR login. I worked in Cheyenne, as in Casper I get error with "from ncar_jobqueue import NCARCluster". And I made appointment for online help.
Erko Jakobson said:
I have UCAR login. I worked in Cheyenne, as in Casper I get error with "from ncar_jobqueue import NCARCluster". And I made appointment for online help.
If you're using an environment from 2019, ncar_jobqueue
is probably configured to use the SLURM queue manager for casper but CISL moved casper over to PBS (same queue manager that cheyenne has always used). You could try using from dask_jobqueue import PBSCluster
but it would probably be best to walk through that change during your office hours appointment
Hi all. Erko made an ESDS office hour appointment with me, and I also suggested using the PBSCluster. Unfortunately we ran into some errors with that approach. I'm not very familiar with working on the HPC systems, so I'm not of much help. I'm looking around online for a solution, but in the meantime if someone else wants to have a go at this during an office hour appointment, please do.
Last updated: May 16 2025 at 17:14 UTC