Welcome to S3 discussion
@Riley Conroy is there anything we need to do to prepare for the S3 demo tomorrow afternoon?
I will work on uploading the package to pypi.org today
I don't think so, but I'm not sure what's important to show. Just go through the functionality?
Yes, I think that's all we need to do at this point
OK, sounds good. I'll maybe write a jupyter notebook that shells out commands. Alos, Since Sage is big into java, I might do some of sort of java use case. Might be overkill though.
We'll see what makes sense after the DECS meeting tomorrow.
Sounds good. The package is now available from pypi.org at https://pypi.org/project/ncar-rda-s3/. I have installed it on casper and copied it to /glade/u/home/rdadata/lib/python/site-packages. You will need to include this in your python sys.path to import it.
It is also available on the EIO VMs. Invoke python with 'python3', then 'import rda_s3'.
@Tom Cram I'm wrapping up a jupyter notebook. Did you want to talk about installation/requirements? It seems like that's the gap right now.
Sure, no problem
@Tom Cram I added a symlink from /glade/u/home/rdadata/lib/python/site-packages/rda_s3/rda_s3.py to /glade/u/home/rdadata/bin. As we discussed in the group meeting, please send out the user environment configuration requirements to success fully run this on rda-data or casper. Thx!
:+1:
@Riley Conroy I'm going to change the Python interpreter in rda_s3.py to /usr/bin/env python3. It's currently failing on the VMs because python2 is the default on those machines.
Interesting about TDS accessing that NetCDF file just fine. Did anyone catch how Sean was able to do range requests on the netcdf file on his emulated object store? Did his object metadata include the byte ranges of data?
I wasn't following closely enough to catch it. He did that very quickly.
I think he gets the byte offset info from the NetCDF file metadata, and might create a separate index file from that? I don’t think it’s in the object metadata, but I’d think that’d be a viable option. Maybe check in with Sean on the details of his workflow.
@Riley Conroy https://github.com/Unidata/netcdf-java/blob/master/cdm/s3/src/main/java/ucar/unidata/io/s3/S3RandomAccessFile.java
https://github.com/Unidata/netcdf-java/blob/master/cdm/s3/src/test/java/ucar/unidata/io/s3/TestS3Read.java
(deleted)
@Riley Conroy I'm getting AttributeError when I try to use rda_s3 within a Python session:
import rda_s3 as rs
buckets = rs.list_buckets()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'rda_s3' has no attribute 'list_buckets'
Found the problem. 'import rda_s3' only imports the package, not the module 'rda_s3.py'. So we currently need to import it as:
from rda_s3 import rda_s3
buckets = rda_s3.list_buckets()
Hmm, Could rda_s3 be the directory instead of the file
I'll create an issue and get this sorted out. Might be as simple as adding a line to __init__.py
I tried something similar a while ago and it didn't work as expected
actually, could putting an "from rda_s3 import *" work?
I wonder how the import in your Jupyter notebook is working though. It shouldn't based on my error
No, I tried that and it didn't work
Oh, i mean in the __init__.py. I was executing the notebook in the same directory as the file, so i think it was grabbing the script itself.
Nevermind, putting 'from rda_s3 import *' in the __init__.py doesn't work
I think I can add 'from . import rda_s3' to the __init__.py file. See https://stackoverflow.com/questions/47323411/attributeerror-module-object-has-no-attribute-xxxx
Github is slow this afternoon. Unresponsive at times.
Screen-Shot-2020-04-02-at-2.49.48-PM.png
the dreaded rainbow unicorn!
pip install now working on isd-s3:
https://pypi.org/project/ncar-isd-s3/
Module can be imported as:
from isd_s3 import isd_s3
isd_s3.list_buckets()
Another method:
from isd_s3 import isd_s3 as rs
rs.list_buckets()
Yet another method:
import isd_s3.isd_s3 as rs
rs.list_buckets()
Great!
@riley I'll go ahead and create a branch for this issue and fix it: https://github.com/NCAR/isd-s3/issues/22
Since we're generalizing this for ISD and broader CISL use, I think we should have a separate configuration file for RDA specific config info, e.g. S3_URL, default_bucket, AWS credentials. I'm running into this issue by configuring the logging handler ... can't just put this configuration inside the main program, it should be imported instead. I'll set something up for logging and test it out. We can work on moving the other config info after logging works. Simple config example: https://martin-thoma.com/configuration-files-in-python/#python-configuration-file
And the *config.py file would be included in .gitignore, since it applies only to us
Logging is now working. I ran a pull request and deleted the log-test branch. Log messages are written to /glade/u/home/rdadata/dssdb/log/isd-s3.log and debug mode, if specified, writes to isd-s3.dbg.
We will need to pull the function configure_log() out of the main code and put it into a new python wrapper since it's specific to the RDA use case. This could be done in coordination with abstracting out the other RDA specific stuff.
The logging object in the code is named 'logger'. Some example log messages that can be added to the code:
logger.info("This is an info message") logger.debug("This is a debug message") logger.warning("This is a warning message") logger.error("This is an error message")
@Riley Conroy FYI it appears the methods in isd_s3.py with leading underscores are not available outside the module (i.e. they are available only inside the module library):
from isd_s3.isd_s3 import * client = _get_session() Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name '_get_session' is not defined
I'm going to remove the leading underscore on the method names in the library and see if that fixes it.
That's really strange. Maybe I missed a PIP recently.
I think this behavior can be overridden by declaring it in the __all__ variable in __init__.py, but that seems more work than necessary. Easier to remove the leading underscore.
This explains it well: https://dbader.org/blog/meaning-of-underscores-in-python
ohh, I see. It's because the * in the import. I didn't know that was different.
You could still do:
from isd_s3 import isd_s3 isd_s3._get_session()
anyway, better to remove the leading _ if it's not going to be used internally anyway.
Yeah , since I'm calling the functions from outside the module, it's better to remove the underscore. But I'll keep the underscore on variable names that remain internal (e.g. _is_imported). Also will keep underscore on function names in __main__.py
@Riley Conroy do you know what the default endpoint_url is if it's undefined when creating a boto3 client? Documentation doesn't make this clear. I'm leaning toward passing the url either as a command line argument or environment variable, and it will be up to the user to set this correctly (we define it in our cli.py entry-point script). Tempted to allow for an undefined url as shown in the documentation, and wonder what the result of that would be.
@Tom Cram I tested it out and the default is "https://s3.amazonaws.com". I guess since boto3 was built by amazon, they assume you're using their s3.
Roger. I think it's ok to allow for it to be undefined, since there's a possibility other users would be using an AWS endpoint.
I'm done with the PyInstaller executable build, and it's located at /glade/u/home/rdadata/bin/isd_s3_test. I have only tested list_buckets(), so we'll need to test out the other functions soon. But I'm going to go ahead and start a pull request to the master branch.
A comment about user configuration. I put the isd_s3.ini config file in the same directory as the rda AWS credentials configuration (/glade/u/home/rdadata/.aws), since this seems like the most logical place for it to reside. I think this type of configuration setup (reading from a config file) should be completely detached from the package repo. It should be up to other users to decide if they want to implement their own config file setup when they develop their own apps. Config options supported by the package can be 1. Environment variables or 2. Command line argument input
Some other minor issues:
I renamed exit()
to exit_session()
in isd_s3.py
. This was causing a conflict with the built-in system function exit()
when I imported the library with from isd_s3.isd_s3 import*
In __main__.py
, I renamed the boolean variable pretty_print
to pp
, since this was creating a conflict with the function pretty_print()
in isd_s3.py
No conflicts in the pull request ... woo hoo!
@Tom Cram Are we still going to allow people to run the package directly assuming they have the correct environment? Basically, pyinstaller will be a suggestion to outside groups?
They can run it directly by calling isd_s3.__main__.py
. I don't think we should support the pyinstaller option since that gets into user case specifics.
But we nevertheless should put our cli script under Github control. Does it make sense to put it under the rda-object-storage repo?
I think we could probably put it in an examples/
directory. To keep it in one place.
We could separate out examples/pyinstaller/
and examples/notebooks/
What do you think?
That sounds good
@Riley Conroy Let me know when you've reviewed the pull request, and I'll merge it when you're ready.
Yeah, I looked over it and there are a few things that I've been changing. Anyway, maybe it would be best to merge now, and we can go over it next week.
Ok, I'll merge it this morning
Merge completed
@Tom Cram
What are your thoughts about changing isd_s3 to be a class that holds the functions that interact with s3? The __init__() could start the boto3 session, and the client would be a member variable. I see quite a bit of shared state that might be good to encapsulate.
Also, might be worth is to move some of this configuration to its own module so it can be reused.
I was thinking the same, both for the isd_s3 class and a separate config module
@Riley Conroy Let me know when you've completed the changes you were working on at the end of last week. At that point I can rebuild a new PyInstaller executable and push a new release to PyPI.org.
@Riley Conroy Package has been uploaded to PyPI.org, and you can go ahead and rebuild the pyinstaller executable.
Last updated: May 16 2025 at 17:14 UTC