Async HTTP POST operation? · python-questions

I realize this isn't exactly a common scientific computing question, but I figured given the depth of Python experience here, maybe people have ideas. I'm trying to do a simple HTTP 'POST' operation to a public IP address of a REST API, and I'm interested in doing it in an asynchronous fashion so that when I'm on a compute node (or more generally behind a firewall), and it can't actually reach the IP, it doesn't pause for whatever the timeout duration is. I don't even need a return code, I'm just trying to find a better approach than a 'try/except' loop on a timeout. Any leads or ideas?

Anderson Banihirwe (Jul 27 2022 at 15:52):

@Brian Dobbins, if you haven't done so already, take a look at the backoff package: https://pypi.org/project/backoff/. This package provides a decorator that you may use to control the retry logic. For asynchronous calls, you will need an addition package: aiohttp: https://docs.aiohttp.org/en/stable/

import aiohttp
import backoff

@backoff.on_exception(backoff.expo, aiohttp.ClientError, max_time=60)
async def post_to_url(url):
    async with aiohttp.ClientSession(raise_for_status=True) as session:
        async with session.post(url) as response:
            return await response.text()

Brian Dobbins (Jul 27 2022 at 18:05):

@Anderson Banihirwe Thanks very, very much for your help, Anderson. I'm still a very novice Python coder, so bear with me on a simple follow-up question? It seems to use the async I/O options, I need a fully async main loop, which complicates some of the synchronous logic I want. Another option I just tried, which seems to work, uses Python's 'multiprocessing' package to launch a second process that does the POST operation, and terminates that process when the primary logic ends, eg:

p = multiprocessing.Process(target=post_function, args=(url, body))
...
p.terminate() # Later on - if the post worked, great.  If not, we don't care, so terminate.

Does this approach seem reasonable, too, or are there downsides to it that I'm missing in my inexperience? We don't care to retry the POST; if it works, great. If not, no big deal. So terminating the process is fine. The REST endpoint ensures we don't get corrupt / partial data from an interrupted transfer, too.

Anderson Banihirwe (Jul 27 2022 at 21:40):

without seeing the rest of your code, I think the multiprocessing.Process(....) approach is reasonable. Supporting async IO throughout your codebase may come at the cost of having to rewrite most of your code -- which is perhaps not worth the price.

Anderson Banihirwe (Jul 27 2022 at 21:48):

I'm curious... When the HTTP POST is synchronous, does it result in a significant performance penalty? I would love to see what you are building with this :smile:. It sounds interesting....

Brian Dobbins (Jul 28 2022 at 01:28):

Thanks again, and I'll ping you in a week or so about lunch or drinks. It'd be fantastic to catch up!

As for the code, it's part of a dataset tool, with this bit enabling some very basic metrics -- what dataset is someone trying to access, and from where (their IP). The idea is that with that information, we can not only try to replicate data in areas of high use, but also do it in a way transparent to the user so that notebooks are fully portable while using the closest / fastest copy. The POST operation is to send those two data points to a REST API (that in turn, writes it to a database), and the reason I don't want it synchronous is that if you're on a system without routing to the internet -like, say, a Cheyenne compute node?- it's really annoying to have a near-instantaneous operation hang for 60+ seconds! But there's also no trivial way to know if you can route to the internet, so we try, and don't worry about it if we fail. After all, we allow it to be disabled too.

I'll share the code (it'll be on Github, really), but since recording an IP is potentially 'user information', we have to chat with the Office of General Counsel to see if it's fine as-is or needs changes before we make the package public.

Brian Bonnlander (Jul 28 2022 at 01:38):

Hi, just saw this "quick timeout" check that might also avoid a long timeout response:

Brian Bonnlander (Jul 28 2022 at 01:41):

Ah, but I'm seeing this may no longer work...anyway, I have heard of "fail fast" ways to check if a connection is available. It's a common use case for applications that rely on network services.

Brian Dobbins (Jul 28 2022 at 04:15):

Interesting, thanks, Brian - lots of ideas in here, some of which I need to understand better. I think they tend to be focused more on multi-second timeout checks of connectivity, which typically can 'fail fast' in scenarios where links are down, but the firewalled compute nodes are a different beast, since pockets are literally dropped. So in each of the cases I've tried, it still just falls back to the timeout. Normally a few seconds should be more than sufficient too, with cached responses taking mere fractions of a second, but I think an asynchronous approach is even better since it's effectively no time at all.

Stream: python-questions

Topic: Async HTTP POST operation?

Brian Dobbins (Jul 27 2022 at 04:49):