Workers not scaling well - io getting slower
Hi,
I'm working on an application that needs to process (usually) a few thousand files and wanted to use Workers combined with R2 for that purpose.
However I'm experiencing slow file downloads and wanted to ask for help.
My setup:
- WorkerA -> receives array of N filenames (N=6 to stick to the open connections limit) and executes WorkerB N times
- WorkerB -> receives filename, downloads it from R2 (service binding) and sends timings to external webhook.
- Each file is almost the same size around 25 MB. For now files are only downloaded (no additional processing)
Testing:
I'm using 600 files for testing (so 15GB). My local script executes 100/200/300 requests to WorkerA passing 6 filenames in each request. The requests are made in less than 2 seconds.
The webhook receives data from WorkerB with times it took to download the file and I can see degradation in performance with growing number of files
- on 600 files 89% of all requests complete under 5s
- on 1200 files (each file requested 2 times, so 200 requests to WorkerA) 62% of all requests complete under 5s
- on 1800 files (each file requested 3 times, so 300 requests to WorkerA) 51% of all requests complete under 5s
The requests that take longer than 5 seconds easily reach 8...15 to even 20+ seconds in some cases.
The same happens when I'm downloading the same files with fetch api from different provider - so this doesn't look like R2 issue.
I'm on Workers Paid plan and the workers are in Unbound mode. "SmartPlacement" doesn't change anything.
The actions will be triggered by the end user so for obvious reasons this needs to be as fast as possible - I was hoping to download each file in less than 4 seconds.
Can you explain why it happens? Is it a known limitation?
Is the bandwidth throttled / some limit per account?
Is there anything I can do to have consistent performance that doesn't depend on the number of files I want to process?
Kind regards
5 Replies
Service bindings all execute within the same isolate, which means they share a thread. So, the more files you attempt to download, the slower it will be.
Can I then use fetch api instead of service binding to WorkerB and pass up to 1000 filenames at once to WorkerA? That should spawn 1000 isolated WorkerB right?
This seems like a use case for a queues as well
If requests come from the same place, they can be coallesced into a single isolate. Same with Queues
Unless you up your concurrency very high
Actually, they spawn in the same isolate, so even that might not help
Had something similar in mind so is there any way to actually scale this? Perhaps different queue consumer workers?