Stream: python-questions

Topic: intake constructor error


view this post on Zulip Julia Kent (Aug 05 2021 at 16:34):

Hi all. I am trying to open an intake catalog from yaml file with intake.open_catalog('test.yaml) but I get a Constructor Error: ConstructorError: could not determine a constructor for the tag 'tag:yaml.org,2002:python/object:intake.catalog.local.LocalCatalogEntry'

Does anyone know why this could be? Am I using the wrong open command? I also tried intake.open_yaml_file_cat(). Or is my yaml file perhaps formatted incorrectly? The script and my yaml file are here.

view this post on Zulip Anderson Banihirwe (Aug 05 2021 at 16:56):

Does anyone know why this could be? Am I using the wrong open command? I also tried

intake.open_catalog('test.yaml')

is the right command. The contents of the YAML file are the culprit. How was the test.yaml file produced?

view this post on Zulip Anderson Banihirwe (Aug 05 2021 at 17:02):

How was the test.yaml file produced?

Never mind... I see the notebook

view this post on Zulip Julia Kent (Aug 05 2021 at 17:04):

I generate it in the intake_serialize.ipynb notebook in that repository. I am trying to load the url of a catalog as a catalog, walk down to a few levels of depth, and save that new catalog as a yaml file that can also be loaded as a catalog.

view this post on Zulip Julia Kent (Aug 05 2021 at 17:04):

Thanks for helping!

view this post on Zulip Julia Kent (Aug 05 2021 at 17:16):

The meat of it is:

with open('test.yaml', 'w') as f:
    f.write(yaml.dump(stac_cat.walk(depth=10)))

view this post on Zulip Anderson Banihirwe (Aug 05 2021 at 17:43):

@jukent, Since stac_cat.walk(...) returns a dictionary with Python objects that may or may not be serializable, serializing this dict results in an invalid YAML file... You will need to jump through some hoops to get a valid YAML file :frown:.

I don't how to deal with these problematic Python objects ( for e.g. the satstac.item.Item which I believe comes from https://github.com/sat-utils/sat-stac/blob/master/satstac/item.py)

view this post on Zulip Anderson Banihirwe (Aug 05 2021 at 17:46):

It's my understanding that satstac.item.Item isn't serializable. So, if your goal is to serialize the walked catalog, you may have to exclude some of the items

view this post on Zulip Anderson Banihirwe (Aug 05 2021 at 17:53):

An alternative would be to serialize just top-level (the parents). For e.g.

Screen-Shot-2021-08-05-at-11.50.01-AM.png

view this post on Zulip Julia Kent (Aug 05 2021 at 17:53):

Thanks @Anderson Banihirwe Could you point me to some documentation on what makes a valid YAML that can be opened by intake? I thought any dictionary could be turned into a YAML.

view this post on Zulip Julia Kent (Aug 05 2021 at 17:56):

I think the purpose of the project is to find those hoops and figure out how to jump through them to get more than just the top-level in the YAML.

view this post on Zulip Anderson Banihirwe (Aug 05 2021 at 18:38):

Could you point me to some documentation on what makes a valid YAML that can be opened by intake?

There isn't documentation about this :frown:.

I thought any dictionary could be turned into a YAML.

That's right... It's the YAML loading part that creates all sorts of issues. intake uses the default yaml loader (which isn't aware of some of these custom objects).

The main test I use is if you can read the YAML file with the default pyyaml loader, intake should be able to do the same(https://github.com/intake/intake/blob/6959346c1db430547546627989875ce0cbdfb53f/intake/utils.py#L75)

import yaml
with open('test.yaml') as f:
    data = yaml.safe_load(f)

view this post on Zulip Anderson Banihirwe (Aug 05 2021 at 18:40):

I don't know if creating custom YAML loaders is going to help for your use case, but here's an example in case you are interested: https://stackoverflow.com/questions/58924168/loading-custom-objects-with-pyyaml

view this post on Zulip Julia Kent (Aug 05 2021 at 18:48):

Thanks @Anderson Banihirwe It seems the yaml.dump() method adds all sorts of things to the YAML that the intake.yaml() method excludes, and I need to figure out what those are so that I can have save the YAML a few layers down in a form that is still readable by intake. I will reach out if I have any more specific questions.

view this post on Zulip Julia Kent (Aug 05 2021 at 18:49):

Even just knowing that I am reading the YAML correctly is a huge help in finding the error!

view this post on Zulip Anderson Banihirwe (Aug 05 2021 at 19:01):

I will reach out if I have any more specific questions.
Even just knowing that I am reading the YAML correctly is a huge help in finding the error!

:+1: sounds good...

By the way, there's a shortcut method .save() for saving an intake catalog object to a YAML file

In [14]: url = 'https://raw.githubusercontent.com/sat-utils/sat-stac/master/test/catalog/catalog.json'

In [15]: cat = intake.open_stac_catalog(url)

In [16]: list(cat)
Out[16]: ['stac-catalog-eo']

In [17]: walked_cat_dict = cat.walk(depth=10)

In [18]: type(walked_cat_dict)
Out[18]: dict

In [19]: walked_cat = intake.catalog.Catalog.from_dict(walked_cat_dict)

In [21]: walked_cat.save('test.yaml')

view this post on Zulip Julia Kent (Aug 05 2021 at 19:15):

@Joe Hamman And I were looking at that method today as a possible avenue as well.


Last updated: Jan 30 2022 at 12:01 UTC