Stream: DECS

Topic: s3-API


view this post on Zulip Riley Conroy (Mar 18 2020 at 17:30):

Welcome to S3 discussion

view this post on Zulip Tom Cram (Mar 25 2020 at 16:52):

@Riley Conroy is there anything we need to do to prepare for the S3 demo tomorrow afternoon?

view this post on Zulip Tom Cram (Mar 25 2020 at 16:52):

I will work on uploading the package to pypi.org today

view this post on Zulip Riley Conroy (Mar 25 2020 at 17:02):

I don't think so, but I'm not sure what's important to show. Just go through the functionality?

view this post on Zulip Tom Cram (Mar 25 2020 at 17:03):

Yes, I think that's all we need to do at this point

view this post on Zulip Riley Conroy (Mar 25 2020 at 17:04):

OK, sounds good. I'll maybe write a jupyter notebook that shells out commands. Alos, Since Sage is big into java, I might do some of sort of java use case. Might be overkill though.
We'll see what makes sense after the DECS meeting tomorrow.

view this post on Zulip Tom Cram (Mar 25 2020 at 17:24):

Sounds good. The package is now available from pypi.org at https://pypi.org/project/ncar-rda-s3/. I have installed it on casper and copied it to /glade/u/home/rdadata/lib/python/site-packages. You will need to include this in your python sys.path to import it.

view this post on Zulip Tom Cram (Mar 25 2020 at 17:29):

It is also available on the EIO VMs. Invoke python with 'python3', then 'import rda_s3'.

view this post on Zulip Riley Conroy (Mar 26 2020 at 15:45):

@Tom Cram I'm wrapping up a jupyter notebook. Did you want to talk about installation/requirements? It seems like that's the gap right now.

view this post on Zulip Tom Cram (Mar 26 2020 at 16:05):

Sure, no problem

view this post on Zulip Doug Schuster (Mar 26 2020 at 17:24):

@Tom Cram I added a symlink from /glade/u/home/rdadata/lib/python/site-packages/rda_s3/rda_s3.py to /glade/u/home/rdadata/bin. As we discussed in the group meeting, please send out the user environment configuration requirements to success fully run this on rda-data or casper. Thx!

view this post on Zulip Tom Cram (Mar 26 2020 at 17:27):

:+1:

view this post on Zulip Tom Cram (Mar 26 2020 at 17:53):

@Riley Conroy I'm going to change the Python interpreter in rda_s3.py to /usr/bin/env python3. It's currently failing on the VMs because python2 is the default on those machines.

view this post on Zulip Riley Conroy (Mar 26 2020 at 22:04):

Interesting about TDS accessing that NetCDF file just fine. Did anyone catch how Sean was able to do range requests on the netcdf file on his emulated object store? Did his object metadata include the byte ranges of data?

view this post on Zulip Tom Cram (Mar 26 2020 at 22:39):

I wasn't following closely enough to catch it. He did that very quickly.

view this post on Zulip Doug Schuster (Mar 27 2020 at 15:46):

I think he gets the byte offset info from the NetCDF file metadata, and might create a separate index file from that? I don’t think it’s in the object metadata, but I’d think that’d be a viable option. Maybe check in with Sean on the details of his workflow.

view this post on Zulip Doug Schuster (Mar 27 2020 at 18:25):

@Riley Conroy https://github.com/Unidata/netcdf-java/blob/master/cdm/s3/src/main/java/ucar/unidata/io/s3/S3RandomAccessFile.java
https://github.com/Unidata/netcdf-java/blob/master/cdm/s3/src/test/java/ucar/unidata/io/s3/TestS3Read.java

view this post on Zulip Riley Conroy (Mar 27 2020 at 19:24):

(deleted)

view this post on Zulip Tom Cram (Mar 27 2020 at 20:02):

@Riley Conroy I'm getting AttributeError when I try to use rda_s3 within a Python session:

import rda_s3 as rs
buckets = rs.list_buckets()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'rda_s3' has no attribute 'list_buckets'

view this post on Zulip Tom Cram (Mar 27 2020 at 20:22):

Found the problem. 'import rda_s3' only imports the package, not the module 'rda_s3.py'. So we currently need to import it as:
from rda_s3 import rda_s3
buckets = rda_s3.list_buckets()

view this post on Zulip Riley Conroy (Mar 27 2020 at 20:22):

Hmm, Could rda_s3 be the directory instead of the file

view this post on Zulip Tom Cram (Mar 27 2020 at 20:23):

I'll create an issue and get this sorted out. Might be as simple as adding a line to __init__.py

view this post on Zulip Riley Conroy (Mar 27 2020 at 20:23):

I tried something similar a while ago and it didn't work as expected

view this post on Zulip Riley Conroy (Mar 27 2020 at 20:24):

actually, could putting an "from rda_s3 import *" work?

view this post on Zulip Tom Cram (Mar 27 2020 at 20:24):

I wonder how the import in your Jupyter notebook is working though. It shouldn't based on my error

view this post on Zulip Tom Cram (Mar 27 2020 at 20:24):

No, I tried that and it didn't work

view this post on Zulip Riley Conroy (Mar 27 2020 at 20:25):

Oh, i mean in the __init__.py. I was executing the notebook in the same directory as the file, so i think it was grabbing the script itself.

view this post on Zulip Riley Conroy (Mar 27 2020 at 20:28):

Nevermind, putting 'from rda_s3 import *' in the __init__.py doesn't work

view this post on Zulip Tom Cram (Mar 27 2020 at 20:31):

I think I can add 'from . import rda_s3' to the __init__.py file. See https://stackoverflow.com/questions/47323411/attributeerror-module-object-has-no-attribute-xxxx

view this post on Zulip Tom Cram (Apr 02 2020 at 20:47):

Github is slow this afternoon. Unresponsive at times.

view this post on Zulip Tom Cram (Apr 02 2020 at 20:50):

Screen-Shot-2020-04-02-at-2.49.48-PM.png

view this post on Zulip Tom Cram (Apr 02 2020 at 20:51):

the dreaded rainbow unicorn!

view this post on Zulip Tom Cram (Apr 02 2020 at 21:50):

pip install now working on isd-s3:
https://pypi.org/project/ncar-isd-s3/

view this post on Zulip Tom Cram (Apr 02 2020 at 21:51):

Module can be imported as:

from isd_s3 import isd_s3
isd_s3.list_buckets()

view this post on Zulip Tom Cram (Apr 02 2020 at 21:51):

Another method:

from isd_s3 import isd_s3 as rs
rs.list_buckets()

view this post on Zulip Tom Cram (Apr 02 2020 at 21:53):

Yet another method:

import isd_s3.isd_s3 as rs
rs.list_buckets()

view this post on Zulip Doug Schuster (Apr 06 2020 at 15:14):

Great!

view this post on Zulip Tom Cram (Apr 07 2020 at 17:36):

@riley I'll go ahead and create a branch for this issue and fix it: https://github.com/NCAR/isd-s3/issues/22

view this post on Zulip Tom Cram (Apr 07 2020 at 21:02):

Since we're generalizing this for ISD and broader CISL use, I think we should have a separate configuration file for RDA specific config info, e.g. S3_URL, default_bucket, AWS credentials. I'm running into this issue by configuring the logging handler ... can't just put this configuration inside the main program, it should be imported instead. I'll set something up for logging and test it out. We can work on moving the other config info after logging works. Simple config example: https://martin-thoma.com/configuration-files-in-python/#python-configuration-file

view this post on Zulip Tom Cram (Apr 07 2020 at 21:03):

And the *config.py file would be included in .gitignore, since it applies only to us

view this post on Zulip Tom Cram (Apr 09 2020 at 21:52):

Logging is now working. I ran a pull request and deleted the log-test branch. Log messages are written to /glade/u/home/rdadata/dssdb/log/isd-s3.log and debug mode, if specified, writes to isd-s3.dbg.

view this post on Zulip Tom Cram (Apr 09 2020 at 21:54):

We will need to pull the function configure_log() out of the main code and put it into a new python wrapper since it's specific to the RDA use case. This could be done in coordination with abstracting out the other RDA specific stuff.

view this post on Zulip Tom Cram (Apr 09 2020 at 22:14):

The logging object in the code is named 'logger'. Some example log messages that can be added to the code:

logger.info("This is an info message")
logger.debug("This is a debug message")
logger.warning("This is a warning message")
logger.error("This is an error message")

view this post on Zulip Tom Cram (Apr 21 2020 at 19:37):

@Riley Conroy FYI it appears the methods in isd_s3.py with leading underscores are not available outside the module (i.e. they are available only inside the module library):

from isd_s3.isd_s3 import *
client = _get_session()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name '_get_session' is not defined

I'm going to remove the leading underscore on the method names in the library and see if that fixes it.

view this post on Zulip Riley Conroy (Apr 21 2020 at 19:48):

That's really strange. Maybe I missed a PIP recently.

view this post on Zulip Tom Cram (Apr 21 2020 at 19:51):

I think this behavior can be overridden by declaring it in the __all__ variable in __init__.py, but that seems more work than necessary. Easier to remove the leading underscore.

view this post on Zulip Tom Cram (Apr 21 2020 at 19:55):

This explains it well: https://dbader.org/blog/meaning-of-underscores-in-python

view this post on Zulip Riley Conroy (Apr 21 2020 at 19:59):

ohh, I see. It's because the * in the import. I didn't know that was different.
You could still do:

from isd_s3 import isd_s3
isd_s3._get_session()

anyway, better to remove the leading _ if it's not going to be used internally anyway.

view this post on Zulip Tom Cram (Apr 21 2020 at 20:03):

Yeah , since I'm calling the functions from outside the module, it's better to remove the underscore. But I'll keep the underscore on variable names that remain internal (e.g. _is_imported). Also will keep underscore on function names in __main__.py

view this post on Zulip Tom Cram (Apr 22 2020 at 20:52):

@Riley Conroy do you know what the default endpoint_url is if it's undefined when creating a boto3 client? Documentation doesn't make this clear. I'm leaning toward passing the url either as a command line argument or environment variable, and it will be up to the user to set this correctly (we define it in our cli.py entry-point script). Tempted to allow for an undefined url as shown in the documentation, and wonder what the result of that would be.

view this post on Zulip Riley Conroy (Apr 22 2020 at 21:02):

@Tom Cram I tested it out and the default is "https://s3.amazonaws.com". I guess since boto3 was built by amazon, they assume you're using their s3.

view this post on Zulip Tom Cram (Apr 22 2020 at 21:06):

Roger. I think it's ok to allow for it to be undefined, since there's a possibility other users would be using an AWS endpoint.

view this post on Zulip Tom Cram (Apr 24 2020 at 16:50):

I'm done with the PyInstaller executable build, and it's located at /glade/u/home/rdadata/bin/isd_s3_test. I have only tested list_buckets(), so we'll need to test out the other functions soon. But I'm going to go ahead and start a pull request to the master branch.

view this post on Zulip Tom Cram (Apr 24 2020 at 16:57):

A comment about user configuration. I put the isd_s3.ini config file in the same directory as the rda AWS credentials configuration (/glade/u/home/rdadata/.aws), since this seems like the most logical place for it to reside. I think this type of configuration setup (reading from a config file) should be completely detached from the package repo. It should be up to other users to decide if they want to implement their own config file setup when they develop their own apps. Config options supported by the package can be 1. Environment variables or 2. Command line argument input

view this post on Zulip Tom Cram (Apr 24 2020 at 17:14):

Some other minor issues:

  1. I renamed exit() to exit_session() in isd_s3.py. This was causing a conflict with the built-in system function exit() when I imported the library with from isd_s3.isd_s3 import*

  2. In __main__.py, I renamed the boolean variable pretty_print to pp, since this was creating a conflict with the function pretty_print() in isd_s3.py

view this post on Zulip Tom Cram (Apr 24 2020 at 17:36):

No conflicts in the pull request ... woo hoo!

view this post on Zulip Riley Conroy (Apr 24 2020 at 19:57):

@Tom Cram Are we still going to allow people to run the package directly assuming they have the correct environment? Basically, pyinstaller will be a suggestion to outside groups?

view this post on Zulip Tom Cram (Apr 24 2020 at 20:06):

They can run it directly by calling isd_s3.__main__.py. I don't think we should support the pyinstaller option since that gets into user case specifics.

view this post on Zulip Tom Cram (Apr 24 2020 at 20:16):

But we nevertheless should put our cli script under Github control. Does it make sense to put it under the rda-object-storage repo?

view this post on Zulip Riley Conroy (Apr 24 2020 at 20:38):

I think we could probably put it in an examples/ directory. To keep it in one place.
We could separate out examples/pyinstaller/ and examples/notebooks/
What do you think?

view this post on Zulip Tom Cram (Apr 24 2020 at 20:38):

That sounds good

view this post on Zulip Tom Cram (Apr 24 2020 at 23:25):

@Riley Conroy Let me know when you've reviewed the pull request, and I'll merge it when you're ready.

view this post on Zulip Riley Conroy (Apr 25 2020 at 00:20):

Yeah, I looked over it and there are a few things that I've been changing. Anyway, maybe it would be best to merge now, and we can go over it next week.

view this post on Zulip Tom Cram (Apr 27 2020 at 15:19):

Ok, I'll merge it this morning

view this post on Zulip Tom Cram (Apr 27 2020 at 16:12):

Merge completed

view this post on Zulip Riley Conroy (Apr 27 2020 at 17:13):

@Tom Cram
What are your thoughts about changing isd_s3 to be a class that holds the functions that interact with s3? The __init__() could start the boto3 session, and the client would be a member variable. I see quite a bit of shared state that might be good to encapsulate.

Also, might be worth is to move some of this configuration to its own module so it can be reused.

view this post on Zulip Tom Cram (Apr 27 2020 at 18:28):

I was thinking the same, both for the isd_s3 class and a separate config module

view this post on Zulip Tom Cram (Apr 27 2020 at 18:36):

@Riley Conroy Let me know when you've completed the changes you were working on at the end of last week. At that point I can rebuild a new PyInstaller executable and push a new release to PyPI.org.

view this post on Zulip Tom Cram (Jun 04 2020 at 21:17):

@Riley Conroy Package has been uploaded to PyPI.org, and you can go ahead and rebuild the pyinstaller executable.


Last updated: May 16 2025 at 17:14 UTC