latency
Hello,
I am building an app that performs object detection using machine learning.
The machine learning inference server is hosted in AWS sagemaker (US-east), and I am using railway to host a node express server as a sort of gateway between the client (US-east) and the sagemaker server. The client needs to send a single image with it's request. I have noticed that directly invoking the sagemaker server it takes about 300 milliseconds to get a response back. I know from local testing, the inference time is about 150 milliseconds, so it's taking 150 milliseconds presumably to send that image data and get a response back from sagemaker.
When invoking the express server hosted on railway (US-west), it takes about 900 milliseconds - 1 second to get a response back. I am slightly surprised by that, but I imagine that it's mostly passing the image data between requests that's causing most of this, i.e client --> express --> sagemaker instead of just client --> sagemaker. It could also be that express server is US-west and sagemaker is US-east. There is also the fact that I need to do some authentication stuff on the express server before passing along the request to sagemaker, but I have tried to run those in parallel rather than sequential.
I would like to reduce the latency as much as possible, also please forgive me, I am mostly a front-end dev that is diving into territory I know very little about, so any thoughts/ideas/suggestions are appreciated.
12 Replies
Project ID:
N/A
why not run the express app in us-east as well to cut down on the rtt to your aws service?
express app is running us-west (oregon)
my bad
run sagemaker in us-west you mean? Yeah that could be an option
I fixed my question
Railway requires an upgrade of my plan afaik to get access to US-east
yes they do
are you suprised that it's triple the latency, does us-west and us-east differ that much
I'm sure not all that latency is coming from the travel time
but you could also run your aws service in us-west if that's an option
though if this project will have a userbase or clients, then at some point you will need to upgrade to pro anyway
do you have some thoughts on what it could be or how to approach this, should I just atomically break it down and see what is causing the latency
I think you should just run the two things in the same region, eliminate that variable completely first