s3-API · DECS · Zulip Chat Archive

@Riley Conroy is there anything we need to do to prepare for the S3 demo tomorrow afternoon?

Tom Cram (Mar 25 2020 at 16:52):

Riley Conroy (Mar 25 2020 at 17:02):

I don't think so, but I'm not sure what's important to show. Just go through the functionality?

Tom Cram (Mar 25 2020 at 17:03):

Riley Conroy (Mar 25 2020 at 17:04):

OK, sounds good. I'll maybe write a jupyter notebook that shells out commands. Alos, Since Sage is big into java, I might do some of sort of java use case. Might be overkill though.
We'll see what makes sense after the DECS meeting tomorrow.

Tom Cram (Mar 25 2020 at 17:24):

Sounds good. The package is now available from pypi.org at https://pypi.org/project/ncar-rda-s3/. I have installed it on casper and copied it to /glade/u/home/rdadata/lib/python/site-packages. You will need to include this in your python sys.path to import it.

Tom Cram (Mar 25 2020 at 17:29):

It is also available on the EIO VMs. Invoke python with 'python3', then 'import rda_s3'.

Riley Conroy (Mar 26 2020 at 15:45):

@Tom Cram I'm wrapping up a jupyter notebook. Did you want to talk about installation/requirements? It seems like that's the gap right now.

Tom Cram (Mar 26 2020 at 16:05):

Doug Schuster (Mar 26 2020 at 17:24):

@Tom Cram I added a symlink from /glade/u/home/rdadata/lib/python/site-packages/rda_s3/rda_s3.py to /glade/u/home/rdadata/bin. As we discussed in the group meeting, please send out the user environment configuration requirements to success fully run this on rda-data or casper. Thx!

Tom Cram (Mar 26 2020 at 17:27):

Tom Cram (Mar 26 2020 at 17:53):

@Riley Conroy I'm going to change the Python interpreter in rda_s3.py to /usr/bin/env python3. It's currently failing on the VMs because python2 is the default on those machines.

Riley Conroy (Mar 26 2020 at 22:04):

Interesting about TDS accessing that NetCDF file just fine. Did anyone catch how Sean was able to do range requests on the netcdf file on his emulated object store? Did his object metadata include the byte ranges of data?

Tom Cram (Mar 26 2020 at 22:39):

Doug Schuster (Mar 27 2020 at 15:46):

I think he gets the byte offset info from the NetCDF file metadata, and might create a separate index file from that? I don’t think it’s in the object metadata, but I’d think that’d be a viable option. Maybe check in with Sean on the details of his workflow.

Doug Schuster (Mar 27 2020 at 18:25):

Riley Conroy (Mar 27 2020 at 19:24):

Tom Cram (Mar 27 2020 at 20:02):

@Riley Conroy I'm getting AttributeError when I try to use rda_s3 within a Python session:

Tom Cram (Mar 27 2020 at 20:22):

Found the problem. 'import rda_s3' only imports the package, not the module 'rda_s3.py'. So we currently need to import it as:
from rda_s3 import rda_s3
buckets = rda_s3.list_buckets()

Riley Conroy (Mar 27 2020 at 20:22):

Tom Cram (Mar 27 2020 at 20:23):

I'll create an issue and get this sorted out. Might be as simple as adding a line to __init__.py

Riley Conroy (Mar 27 2020 at 20:23):

Riley Conroy (Mar 27 2020 at 20:24):

Tom Cram (Mar 27 2020 at 20:24):

I wonder how the import in your Jupyter notebook is working though. It shouldn't based on my error

Tom Cram (Mar 27 2020 at 20:24):

Riley Conroy (Mar 27 2020 at 20:25):

Oh, i mean in the __init__.py. I was executing the notebook in the same directory as the file, so i think it was grabbing the script itself.

Riley Conroy (Mar 27 2020 at 20:28):

Tom Cram (Mar 27 2020 at 20:31):

Tom Cram (Apr 02 2020 at 20:47):

Tom Cram (Apr 02 2020 at 20:50):

Tom Cram (Apr 02 2020 at 20:51):

Tom Cram (Apr 02 2020 at 21:50):

Tom Cram (Apr 02 2020 at 21:51):

Tom Cram (Apr 02 2020 at 21:53):

Doug Schuster (Apr 06 2020 at 15:14):

Tom Cram (Apr 07 2020 at 17:36):

Tom Cram (Apr 07 2020 at 21:02):

Since we're generalizing this for ISD and broader CISL use, I think we should have a separate configuration file for RDA specific config info, e.g. S3_URL, default_bucket, AWS credentials. I'm running into this issue by configuring the logging handler ... can't just put this configuration inside the main program, it should be imported instead. I'll set something up for logging and test it out. We can work on moving the other config info after logging works. Simple config example: https://martin-thoma.com/configuration-files-in-python/#python-configuration-file

Tom Cram (Apr 07 2020 at 21:03):

And the *config.py file would be included in .gitignore, since it applies only to us

Tom Cram (Apr 09 2020 at 21:52):

Logging is now working. I ran a pull request and deleted the log-test branch. Log messages are written to /glade/u/home/rdadata/dssdb/log/isd-s3.log and debug mode, if specified, writes to isd-s3.dbg.

Tom Cram (Apr 09 2020 at 21:54):

We will need to pull the function configure_log() out of the main code and put it into a new python wrapper since it's specific to the RDA use case. This could be done in coordination with abstracting out the other RDA specific stuff.

Tom Cram (Apr 09 2020 at 22:14):

The logging object in the code is named 'logger'. Some example log messages that can be added to the code:

logger.info("This is an info message")
logger.debug("This is a debug message")
logger.warning("This is a warning message")
logger.error("This is an error message")

Tom Cram (Apr 21 2020 at 19:37):

@Riley Conroy FYI it appears the methods in isd_s3.py with leading underscores are not available outside the module (i.e. they are available only inside the module library):

from isd_s3.isd_s3 import *
client = _get_session()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name '_get_session' is not defined

I'm going to remove the leading underscore on the method names in the library and see if that fixes it.

Riley Conroy (Apr 21 2020 at 19:48):

Tom Cram (Apr 21 2020 at 19:51):

I think this behavior can be overridden by declaring it in the __all__ variable in __init__.py, but that seems more work than necessary. Easier to remove the leading underscore.

Tom Cram (Apr 21 2020 at 19:55):

Riley Conroy (Apr 21 2020 at 19:59):

ohh, I see. It's because the * in the import. I didn't know that was different.
You could still do:

from isd_s3 import isd_s3
isd_s3._get_session()

anyway, better to remove the leading _ if it's not going to be used internally anyway.

Tom Cram (Apr 21 2020 at 20:03):

Yeah , since I'm calling the functions from outside the module, it's better to remove the underscore. But I'll keep the underscore on variable names that remain internal (e.g. _is_imported). Also will keep underscore on function names in __main__.py

Tom Cram (Apr 22 2020 at 20:52):

@Riley Conroy do you know what the default endpoint_url is if it's undefined when creating a boto3 client? Documentation doesn't make this clear. I'm leaning toward passing the url either as a command line argument or environment variable, and it will be up to the user to set this correctly (we define it in our cli.py entry-point script). Tempted to allow for an undefined url as shown in the documentation, and wonder what the result of that would be.

Riley Conroy (Apr 22 2020 at 21:02):

@Tom Cram I tested it out and the default is "https://s3.amazonaws.com". I guess since boto3 was built by amazon, they assume you're using their s3.

Tom Cram (Apr 22 2020 at 21:06):

Roger. I think it's ok to allow for it to be undefined, since there's a possibility other users would be using an AWS endpoint.

Tom Cram (Apr 24 2020 at 16:50):

I'm done with the PyInstaller executable build, and it's located at /glade/u/home/rdadata/bin/isd_s3_test. I have only tested list_buckets(), so we'll need to test out the other functions soon. But I'm going to go ahead and start a pull request to the master branch.

Tom Cram (Apr 24 2020 at 16:57):

A comment about user configuration. I put the isd_s3.ini config file in the same directory as the rda AWS credentials configuration (/glade/u/home/rdadata/.aws), since this seems like the most logical place for it to reside. I think this type of configuration setup (reading from a config file) should be completely detached from the package repo. It should be up to other users to decide if they want to implement their own config file setup when they develop their own apps. Config options supported by the package can be 1. Environment variables or 2. Command line argument input

Tom Cram (Apr 24 2020 at 17:14):

Tom Cram (Apr 24 2020 at 17:36):

Riley Conroy (Apr 24 2020 at 19:57):

@Tom Cram Are we still going to allow people to run the package directly assuming they have the correct environment? Basically, pyinstaller will be a suggestion to outside groups?

Tom Cram (Apr 24 2020 at 20:06):

They can run it directly by calling isd_s3.__main__.py. I don't think we should support the pyinstaller option since that gets into user case specifics.

Tom Cram (Apr 24 2020 at 20:16):

But we nevertheless should put our cli script under Github control. Does it make sense to put it under the rda-object-storage repo?

Riley Conroy (Apr 24 2020 at 20:38):

I think we could probably put it in an examples/ directory. To keep it in one place.
We could separate out examples/pyinstaller/ and examples/notebooks/
What do you think?

Tom Cram (Apr 24 2020 at 20:38):

Tom Cram (Apr 24 2020 at 23:25):

@Riley Conroy Let me know when you've reviewed the pull request, and I'll merge it when you're ready.

Riley Conroy (Apr 25 2020 at 00:20):

Yeah, I looked over it and there are a few things that I've been changing. Anyway, maybe it would be best to merge now, and we can go over it next week.

Tom Cram (Apr 27 2020 at 15:19):

Tom Cram (Apr 27 2020 at 16:12):

Riley Conroy (Apr 27 2020 at 17:13):

@Tom Cram
What are your thoughts about changing isd_s3 to be a class that holds the functions that interact with s3? The __init__() could start the boto3 session, and the client would be a member variable. I see quite a bit of shared state that might be good to encapsulate.

Also, might be worth is to move some of this configuration to its own module so it can be reused.

Tom Cram (Apr 27 2020 at 18:28):

Tom Cram (Apr 27 2020 at 18:36):

@Riley Conroy Let me know when you've completed the changes you were working on at the end of last week. At that point I can rebuild a new PyInstaller executable and push a new release to PyPI.org.

Tom Cram (Jun 04 2020 at 21:17):

@Riley Conroy Package has been uploaded to PyPI.org, and you can go ahead and rebuild the pyinstaller executable.

Stream: DECS

Topic: s3-API

Riley Conroy (Mar 18 2020 at 17:30):

Tom Cram (Mar 25 2020 at 16:52):

Tom Cram (Mar 25 2020 at 16:52):

Riley Conroy (Mar 25 2020 at 17:02):

Tom Cram (Mar 25 2020 at 17:03):

Riley Conroy (Mar 25 2020 at 17:04):

Tom Cram (Mar 25 2020 at 17:24):

Tom Cram (Mar 25 2020 at 17:29):

Riley Conroy (Mar 26 2020 at 15:45):

Tom Cram (Mar 26 2020 at 16:05):

Doug Schuster (Mar 26 2020 at 17:24):

Tom Cram (Mar 26 2020 at 17:27):

Tom Cram (Mar 26 2020 at 17:53):

Riley Conroy (Mar 26 2020 at 22:04):

Tom Cram (Mar 26 2020 at 22:39):

Doug Schuster (Mar 27 2020 at 15:46):

Doug Schuster (Mar 27 2020 at 18:25):

Riley Conroy (Mar 27 2020 at 19:24):

Tom Cram (Mar 27 2020 at 20:02):

Tom Cram (Mar 27 2020 at 20:22):

Riley Conroy (Mar 27 2020 at 20:22):

Tom Cram (Mar 27 2020 at 20:23):

Riley Conroy (Mar 27 2020 at 20:23):

Riley Conroy (Mar 27 2020 at 20:24):

Tom Cram (Mar 27 2020 at 20:24):

Tom Cram (Mar 27 2020 at 20:24):

Riley Conroy (Mar 27 2020 at 20:25):

Riley Conroy (Mar 27 2020 at 20:28):

Tom Cram (Mar 27 2020 at 20:31):

Tom Cram (Apr 02 2020 at 20:47):

Tom Cram (Apr 02 2020 at 20:50):

Tom Cram (Apr 02 2020 at 20:51):

Tom Cram (Apr 02 2020 at 21:50):

Tom Cram (Apr 02 2020 at 21:51):

Tom Cram (Apr 02 2020 at 21:51):

Tom Cram (Apr 02 2020 at 21:53):

Doug Schuster (Apr 06 2020 at 15:14):

Tom Cram (Apr 07 2020 at 17:36):

Tom Cram (Apr 07 2020 at 21:02):

Tom Cram (Apr 07 2020 at 21:03):

Tom Cram (Apr 09 2020 at 21:52):

Tom Cram (Apr 09 2020 at 21:54):

Tom Cram (Apr 09 2020 at 22:14):

Tom Cram (Apr 21 2020 at 19:37):

Riley Conroy (Apr 21 2020 at 19:48):

Tom Cram (Apr 21 2020 at 19:51):

Tom Cram (Apr 21 2020 at 19:55):

Riley Conroy (Apr 21 2020 at 19:59):

Tom Cram (Apr 21 2020 at 20:03):

Tom Cram (Apr 22 2020 at 20:52):

Riley Conroy (Apr 22 2020 at 21:02):

Tom Cram (Apr 22 2020 at 21:06):

Tom Cram (Apr 24 2020 at 16:50):

Tom Cram (Apr 24 2020 at 16:57):

Tom Cram (Apr 24 2020 at 17:14):

Tom Cram (Apr 24 2020 at 17:36):

Riley Conroy (Apr 24 2020 at 19:57):

Tom Cram (Apr 24 2020 at 20:06):

Tom Cram (Apr 24 2020 at 20:16):

Riley Conroy (Apr 24 2020 at 20:38):

Tom Cram (Apr 24 2020 at 20:38):

Tom Cram (Apr 24 2020 at 23:25):

Riley Conroy (Apr 25 2020 at 00:20):

Tom Cram (Apr 27 2020 at 15:19):

Tom Cram (Apr 27 2020 at 16:12):

Riley Conroy (Apr 27 2020 at 17:13):

Tom Cram (Apr 27 2020 at 18:28):

Tom Cram (Apr 27 2020 at 18:36):

Tom Cram (Jun 04 2020 at 21:17):