R
RunPod3mo ago
ribbit

Connection reset by peer

Hi, I am encountering this error while pulling new docker image.
failed to pull image: read tcp 192.168.23.16:59484->104.16.98.215:443: read: connection reset by peer
failed to pull image: read tcp 192.168.23.16:59484->104.16.98.215:443: read: connection reset by peer
I initially encounter the error while running my endpoint, it'd randomly got stuck in 'in queue' status with the same error connection reset by peer. So I tried downgrading my docker image version but it failed on pull. Is there any way to fix this? Thanks
No description
Solution:
now it's stable, i think they fixed it
Jump to solution
23 Replies
digigoblin
digigoblin3mo ago
Which container registry are you using?
ribbit
ribbit3mo ago
wait i need to confirm this might be a private repository of my company but docker pulling and pushing to that repo works fine on other machine
digigoblin
digigoblin3mo ago
Well the registry is diconnecting the connection from RunPod, so maybe you have a self hosted registry with a firewall that is blocking RunPod but allowing internal traffic.
ribbit
ribbit3mo ago
it might not be a problem with the registry, I tried switching back to the last image that is cached in the system, the endpoint works but sometimes it would yield the same error as such
Failed to get job. | Error Type: ClientOSError | Error Message: [Errno 104] Connection reset by peer
Failed to get job. | Error Type: ClientOSError | Error Message: [Errno 104] Connection reset by peer
and also a long traceback
Traceback: Traceback (most recent call last): File "/opt/.cache/virtualenvs/llm-api-9TtSrW0h-py3.10/lib/python3.10/site-packages/runpod/serverless/modules/rp_job.py", line 55, in get_job async with session.get(_job_get_url()) as response: File "/opt/.cache/virtualenvs/llm-api-9TtSrW0h-py3.10/lib/python3.10/site-packages/aiohttp/client.py", line 1194, in __aenter__ self._resp = await self._coro File "/opt/.cache/virtualenvs/llm-api-9TtSrW0h-py3.10/lib/python3.10/site-packages/aiohttp/client.py", line 605, in _request await resp.start(conn) File "/opt/.cache/virtualenvs/llm-api-9TtSrW0h-py3.10/lib/python3.10/site-packages/aiohttp/client_reqrep.py", line 966, in start message, payload = await protocol.read() # type: ignore[union-attr] File "/opt/.cache/virtualenvs/llm-api-9TtSrW0h-py3.10/lib/python3.10/site-packages/aiohttp/streams.py", line 622, in read await self._waiter aiohttp.client_exceptions.ClientOSError:
Traceback: Traceback (most recent call last): File "/opt/.cache/virtualenvs/llm-api-9TtSrW0h-py3.10/lib/python3.10/site-packages/runpod/serverless/modules/rp_job.py", line 55, in get_job async with session.get(_job_get_url()) as response: File "/opt/.cache/virtualenvs/llm-api-9TtSrW0h-py3.10/lib/python3.10/site-packages/aiohttp/client.py", line 1194, in __aenter__ self._resp = await self._coro File "/opt/.cache/virtualenvs/llm-api-9TtSrW0h-py3.10/lib/python3.10/site-packages/aiohttp/client.py", line 605, in _request await resp.start(conn) File "/opt/.cache/virtualenvs/llm-api-9TtSrW0h-py3.10/lib/python3.10/site-packages/aiohttp/client_reqrep.py", line 966, in start message, payload = await protocol.read() # type: ignore[union-attr] File "/opt/.cache/virtualenvs/llm-api-9TtSrW0h-py3.10/lib/python3.10/site-packages/aiohttp/streams.py", line 622, in read await self._waiter aiohttp.client_exceptions.ClientOSError:
[Errno 104] Connection reset by peer
{5 items
"dt":"2024-04-17 07:00:26.619019"
"endpointid":"s42u5cq5ywrn55"
"level":"error"
"message":"Traceback: Traceback (most recent call last): File "/opt/.cache/virtualenvs/llm-api-9TtSrW0h-py3.10/lib/python3.10/site-packages/runpod/serverless/modules/rp_job.py", line 55, in get_job async with session.get(_job_get_url()) as response: File "/opt/.cache/virtualenvs/llm-api-9TtSrW0h-py3.10/lib/python3.10/site-packages/aiohttp/client.py", line 1194, in __aenter__ self._resp = await self._coro File "/opt/.cache/virtualenvs/llm-api-9TtSrW0h-py3.10/lib/python3.10/site-packages/aiohttp/client.py", line 605, in _request await resp.start(conn) File "/opt/.cache/virtualenvs/llm-api-9TtSrW0h-py3.10/lib/python3.10/site-packages/aiohttp/client_reqrep.py", line 966, in start message, payload = await protocol.read() # type: ignore[union-attr] File "/opt/.cache/virtualenvs/llm-api-9TtSrW0h-py3.10/lib/python3.10/site-packages/aiohttp/streams.py", line 622, in read await self._waiter aiohttp.client_exceptions.ClientOSError: [Errno 104] Connection reset by peer "
"workerId":"j0lijso11z3hz8"
}
[Errno 104] Connection reset by peer
{5 items
"dt":"2024-04-17 07:00:26.619019"
"endpointid":"s42u5cq5ywrn55"
"level":"error"
"message":"Traceback: Traceback (most recent call last): File "/opt/.cache/virtualenvs/llm-api-9TtSrW0h-py3.10/lib/python3.10/site-packages/runpod/serverless/modules/rp_job.py", line 55, in get_job async with session.get(_job_get_url()) as response: File "/opt/.cache/virtualenvs/llm-api-9TtSrW0h-py3.10/lib/python3.10/site-packages/aiohttp/client.py", line 1194, in __aenter__ self._resp = await self._coro File "/opt/.cache/virtualenvs/llm-api-9TtSrW0h-py3.10/lib/python3.10/site-packages/aiohttp/client.py", line 605, in _request await resp.start(conn) File "/opt/.cache/virtualenvs/llm-api-9TtSrW0h-py3.10/lib/python3.10/site-packages/aiohttp/client_reqrep.py", line 966, in start message, payload = await protocol.read() # type: ignore[union-attr] File "/opt/.cache/virtualenvs/llm-api-9TtSrW0h-py3.10/lib/python3.10/site-packages/aiohttp/streams.py", line 622, in read await self._waiter aiohttp.client_exceptions.ClientOSError: [Errno 104] Connection reset by peer "
"workerId":"j0lijso11z3hz8"
}
is this error something of my end?
digigoblin
digigoblin3mo ago
Which region is the endpoint in? There might be a networking issue in that region.
ribbit
ribbit3mo ago
the traceback refers to the code of runpod handler library
ribbit
ribbit3mo ago
it's global
No description
ribbit
ribbit3mo ago
No description
digigoblin
digigoblin3mo ago
Oh, that makes it very difficult to debug, guess someone from RunPod team will have to look into it for you. This is the endpoint id, right? s42u5cq5ywrn55
ribbit
ribbit3mo ago
true, I have also contacted support via the website, they're on it as well
digigoblin
digigoblin3mo ago
I would also try terminating that worker (j0lijso11z3hz8) and see if you get a new one in a different region. Oh thats good 👍
ribbit
ribbit3mo ago
ah ok i see, will try that thankyou, prod and dev services are down so im looking everywhere hahah
nerdylive
nerdylive3mo ago
have you fixed this problem yet?
ribbit
ribbit3mo ago
Not yet Sometimes it'd work but any other time it'd yield that error
digigoblin
digigoblin3mo ago
Seems @pazanchick has the same issue
ribbit
ribbit3mo ago
is availability the problem tho?
No description
digigoblin
digigoblin3mo ago
Maybe. I wouldn't choose the ones that are unavailable to be honest.
ribbit
ribbit3mo ago
kk
juanrubio5576
juanrubio55763mo ago
I got the same issue during the day. I deployed in europe. At first it could be an availability problem (was using network volume). I dropped the network volume, and got the same error on 2 different endpoint that I have deployed. I have been talking viachat with a team member of runpod. It seems they are on it
Anubhav
Anubhav3mo ago
Even we are facing it at the moment. Any updates on it?
Solution
ribbit
ribbit3mo ago
now it's stable, i think they fixed it
nerdylive
nerdylive3mo ago
great to hear that hahah
ribbit
ribbit3mo ago
yeah hahahah, things are ok up till now thanks all