vLLM serverless throws 502 errors
I'm getting these errors out of the blue, anyone knows why?
2024-06-28 00:44:12.053
[71ncv12913w751]
[error]
Failed to get job, status code: 502
▼
2024-06-28 00:41:33.874
[71ncv12913w751]
[info]
Finished.
▼
2024-06-28 00:41:33.844
[71ncv12913w751]
[info]
Finished running generator.
▼
2024-06-28 00:41:08.658
[71ncv12913w751]
[error]
Failed to get job, status code: 502
▼
2024-06-28 00:40:40.032
[71ncv12913w751]
[error]
Failed to get job, status code: 502
....
2024-06-28 00:16:05.919
[71ncv12913w751]
[error]
Traceback: Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/runpod/serverless/modules/rp_job.py", line 55, in get_job async with session.get(_job_get_url()) as response: File "/usr/local/lib/python3.10/dist-packages/aiohttp/client.py", line 1194, in aenter self._resp = await self._coro File "/usr/local/lib/python3.10/dist-packages/aiohttp/client.py", line 605, in _request await resp.start(conn) File "/usr/local/lib/python3.10/dist-packages/aiohttp/client_reqrep.py", line 966, in start message, payload = await protocol.read() # type: ignore[union-attr] File "/usr/local/lib/python3.10/dist-packages/aiohttp/streams.py", line 622, in read await self._waiter aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected
▼
2024-06-28 00:16:05.919
[71ncv12913w751]
[error]
Failed to get job. | Error Type: ServerDisconnectedError | Error Message: Server disconnected
9 Replies
@Madiator2011 Is this because of runpod
The error logs you are seeing indicate it is experiencing network-related issues when trying to fetch job from a server.
Thanks, I will report this to support.
btw did your job result comes out?
i ever experienced this too but it seems to work just fine
Unsure, still investigating. RunPod Serverless is extraordinary for what it does. But it has still quite a few bugs.
Yeah for vllm mostly
And doesn't handle networking issues gracefully either
What networking issues?
I'm getting this error too for vllm. Did anyone find a solution? About 5% of requests end up getting failed with this error