Stream: python-questions

Topic: intake to_dataset_dict error


view this post on Zulip Stephen Yeager (Oct 15 2021 at 22:42):

I'm using intake-esm to ingest CMIP6 OMIP data using:

catalog_file = '/glade/collections/cmip/catalog/intake-esm-datastore/catalogs/glade-cmip6.json'
col = intake.open_esm_datastore(catalog_file)
cat_so = col.search(
    experiment_id=['omip1', 'omip2'],
    variable_id=['so'],
    table_id='Omon'
)
dset_dict_so = cat_so.to_dataset_dict(
    cdf_kwargs={'chunks': {'time':12},'decode_times': True, 'use_cftime': True}
)

This returns an xarray.concat() error for key 'OMIP.IPSL.IPSL-CM6A-LR.omip1.Omon.gn'. The catalog for that key (Screen-Shot-2021-10-15-at-4.41.26-PM.png) shows duplicate entries (different versions) as well as missing data for one member_id. I don't know if this is related to ongoing updates to OMIP data holdings or if the json catalog needs updating. Who should I ask about this?

view this post on Zulip Anderson Banihirwe (Oct 18 2021 at 13:48):

This returns an xarray.concat() error for key 'OMIP.IPSL.IPSL-CM6A-LR.omip1.Omon.gn'. The catalog for that key (Screen-Shot-2021-10-15-at-4.41.26-PM.png) shows duplicate entries (different versions)

@Stephen Yeager, It appears that there's a bug in the catalog generation code. Some of these files should be excluded from the catalog. I will look into it. For a temporary workaround, you can use the .drop() method to modify the dataframe:

indices_to_drop = [...]
cat_so.df = cat_so.df.drop(indices_to_drop)

as well as missing data for one member_id. I don't know if this is related to ongoing updates to OMIP data holdings

I just checked the directories on Glade, and the data are just missing on the filesystem:

$ ls /glade/collections/cmip/CMIP6/OMIP/IPSL/IPSL-CM6A-LR/omip1/r2i1p1f1/Omon/so/gn/v20191120/so
so_Omon_IPSL-CM6A-LR_omip1_r2i1p1f1_gn_180001-189912.nc

view this post on Zulip Stephen Yeager (Oct 18 2021 at 17:18):

Thanks @Anderson Banihirwe . Shiquan Su informed me that the update to OMIP data holdings on glade is still ongoing. Will the glade-cmip6.json file get updated automatically as data is added?

view this post on Zulip Anderson Banihirwe (Oct 18 2021 at 18:13):

Will the glade-cmip6.json file get updated automatically as data is added?

Unfortunately not... I have to run the script.

view this post on Zulip Matt Long (Oct 18 2021 at 23:07):

@Eric Nienhouse, there is a growing community using intake-esm to access data on the CMIP AP. Would be great to discuss with @xdev how to automate the process of keep catalogs up to date.

view this post on Zulip Stephen Yeager (Oct 22 2021 at 16:53):

I'd like to be part of this discussion. @Max Grover wrote a blog post discussing a problem I was having using intake-esm for CMIP6 OMIP data (https://ncar.github.io/esds/posts/2021/intake_cmip6_debug/). The root of the problem was missing data on the CMIP AP. I put in a request for a full update of OMIP data on CMIP AP back in late September, and am finding that this process takes a very long time, to the point that intake-esm may not be a viable solution for me.


Last updated: Jan 30 2022 at 12:01 UTC