CeresMiller
CeresMiller
RRailway
Created by CeresMiller on 11/5/2023 in #✋|help
latency
Hello, I am building an app that performs object detection using machine learning. The machine learning inference server is hosted in AWS sagemaker (US-east), and I am using railway to host a node express server as a sort of gateway between the client (US-east) and the sagemaker server. The client needs to send a single image with it's request. I have noticed that directly invoking the sagemaker server it takes about 300 milliseconds to get a response back. I know from local testing, the inference time is about 150 milliseconds, so it's taking 150 milliseconds presumably to send that image data and get a response back from sagemaker. When invoking the express server hosted on railway (US-west), it takes about 900 milliseconds - 1 second to get a response back. I am slightly surprised by that, but I imagine that it's mostly passing the image data between requests that's causing most of this, i.e client --> express --> sagemaker instead of just client --> sagemaker. It could also be that express server is US-west and sagemaker is US-east. There is also the fact that I need to do some authentication stuff on the express server before passing along the request to sagemaker, but I have tried to run those in parallel rather than sequential. I would like to reduce the latency as much as possible, also please forgive me, I am mostly a front-end dev that is diving into territory I know very little about, so any thoughts/ideas/suggestions are appreciated.
16 replies