CPU-Bound Performance
We have a (mostly) read-only vector search service running on AWS fargate currently with 4 arm instances running 4 vCPUs and 8gb of ram (about $460/mo). This service serves 200tps at peak.
When I ported into Railway I noticed a few things:
1) The service is only registering 16 logical CPUs on the container when I have a team plan and was expecting 32. Is it possible something is configured incorrectly?
2) In benchmarks I'm only realizing 30 tps vs the 200 I get with AWS. I'm maxing out at around 1500% CPU (which, given I'm not saturating the 4 instances on AWS seems comparable, though underutilized per (1)). Is there anything else I can configure to get comparable performance? This particular code is sensitive to instructions like SSE or AVX, is it possible that they're not enabled on the build machine but are on the prod hypervisors?
It's entirely possible that this workload is just more performant on AWS, but I figured it's worth reaching out to see if there's anything that can be adjusted.
3 Replies
This sounds really interesting,
Fyi you should have a direct chat with the team if you want quicker responses
thanks finn -- don't really need a quick response on this one
Oh nice, well I'll be interested in the outcome. Seems really interesting!