Stream: python-questions
Topic: intake constructor error
Julia Kent (Aug 05 2021 at 16:34):
Hi all. I am trying to open an intake catalog from yaml file with intake.open_catalog('test.yaml)
but I get a Constructor Error: ConstructorError: could not determine a constructor for the tag 'tag:yaml.org,2002:python/object:intake.catalog.local.LocalCatalogEntry'
Does anyone know why this could be? Am I using the wrong open command? I also tried intake.open_yaml_file_cat()
. Or is my yaml file perhaps formatted incorrectly? The script and my yaml file are here.
Anderson Banihirwe (Aug 05 2021 at 16:56):
Does anyone know why this could be? Am I using the wrong open command? I also tried
intake.open_catalog('test.yaml')
is the right command. The contents of the YAML file are the culprit. How was the test.yaml
file produced?
Anderson Banihirwe (Aug 05 2021 at 17:02):
How was the test.yaml file produced?
Never mind... I see the notebook
Julia Kent (Aug 05 2021 at 17:04):
I generate it in the intake_serialize.ipynb
notebook in that repository. I am trying to load the url of a catalog as a catalog, walk down to a few levels of depth, and save that new catalog as a yaml file that can also be loaded as a catalog.
Julia Kent (Aug 05 2021 at 17:04):
Thanks for helping!
Julia Kent (Aug 05 2021 at 17:16):
The meat of it is:
with open('test.yaml', 'w') as f:
f.write(yaml.dump(stac_cat.walk(depth=10)))
Anderson Banihirwe (Aug 05 2021 at 17:43):
@jukent, Since stac_cat.walk(...)
returns a dictionary with Python objects that may or may not be serializable, serializing this dict results in an invalid YAML file... You will need to jump through some hoops to get a valid YAML file :frown:.
I don't how to deal with these problematic Python objects ( for e.g. the satstac.item.Item
which I believe comes from https://github.com/sat-utils/sat-stac/blob/master/satstac/item.py)
Anderson Banihirwe (Aug 05 2021 at 17:46):
It's my understanding that satstac.item.Item
isn't serializable. So, if your goal is to serialize the walked catalog, you may have to exclude some of the items
Anderson Banihirwe (Aug 05 2021 at 17:53):
An alternative would be to serialize just top-level (the parents). For e.g.
Screen-Shot-2021-08-05-at-11.50.01-AM.png
Julia Kent (Aug 05 2021 at 17:53):
Thanks @Anderson Banihirwe Could you point me to some documentation on what makes a valid YAML that can be opened by intake? I thought any dictionary could be turned into a YAML.
Julia Kent (Aug 05 2021 at 17:56):
I think the purpose of the project is to find those hoops and figure out how to jump through them to get more than just the top-level in the YAML.
Anderson Banihirwe (Aug 05 2021 at 18:38):
Could you point me to some documentation on what makes a valid YAML that can be opened by intake?
There isn't documentation about this :frown:.
I thought any dictionary could be turned into a YAML.
That's right... It's the YAML loading part that creates all sorts of issues. intake uses the default yaml loader (which isn't aware of some of these custom objects).
The main test I use is if you can read the YAML file with the default pyyaml loader, intake should be able to do the same(https://github.com/intake/intake/blob/6959346c1db430547546627989875ce0cbdfb53f/intake/utils.py#L75)
import yaml
with open('test.yaml') as f:
data = yaml.safe_load(f)
Anderson Banihirwe (Aug 05 2021 at 18:40):
I don't know if creating custom YAML loaders is going to help for your use case, but here's an example in case you are interested: https://stackoverflow.com/questions/58924168/loading-custom-objects-with-pyyaml
Julia Kent (Aug 05 2021 at 18:48):
Thanks @Anderson Banihirwe It seems the yaml.dump()
method adds all sorts of things to the YAML that the intake.yaml()
method excludes, and I need to figure out what those are so that I can have save the YAML a few layers down in a form that is still readable by intake
. I will reach out if I have any more specific questions.
Julia Kent (Aug 05 2021 at 18:49):
Even just knowing that I am reading the YAML correctly is a huge help in finding the error!
Anderson Banihirwe (Aug 05 2021 at 19:01):
I will reach out if I have any more specific questions.
Even just knowing that I am reading the YAML correctly is a huge help in finding the error!
:+1: sounds good...
By the way, there's a shortcut method .save()
for saving an intake catalog object to a YAML file
In [14]: url = 'https://raw.githubusercontent.com/sat-utils/sat-stac/master/test/catalog/catalog.json'
In [15]: cat = intake.open_stac_catalog(url)
In [16]: list(cat)
Out[16]: ['stac-catalog-eo']
In [17]: walked_cat_dict = cat.walk(depth=10)
In [18]: type(walked_cat_dict)
Out[18]: dict
In [19]: walked_cat = intake.catalog.Catalog.from_dict(walked_cat_dict)
In [21]: walked_cat.save('test.yaml')
Julia Kent (Aug 05 2021 at 19:15):
@Joe Hamman And I were looking at that method today as a possible avenue as well.
Last updated: Jan 30 2022 at 12:01 UTC