pazanchick Posts - Answer Overflow

pazanchick

Posts Comments

RRunPod

•Created by pazanchick on 4/22/2024 in #⚡｜serverless

No active workers after deploying New Release

34 replies

RRunPod

•Created by pazanchick on 4/17/2024 in #⚡｜serverless

'Connection reset by peer' after job finishes.

Previous logs indicate that the handler works correctly. Happened multiple times now and returns failure response. Any input on this issue? @Papa Madiator

2024-04-17T07:48:57.719183260Z {"requestId": "92b3176b-81d0-4dbb-9307-9cbe812dd8f0-u1", "message": "Finished.", "level": "INFO"}
2024-04-17T07:49:03.090390819Z {"requestId": null, "message": "Failed to get job. | Error Type: ClientOSError | Error Message: [Errno 104] Connection reset by peer", "level": "ERROR"}

2024-04-17T07:49:03.090448408Z {"requestId": null, "message": "Traceback: Traceback (most recent call last):\n  File \"/usr/local/lib/python3.10/dist-packages/runpod/serverless/modules/rp_job.py\", line 55, in get_job\n    async with session.get(_job_get_url()) as response:\n  File \"/usr/local/lib/python3.10/dist-packages/aiohttp/client.py\", line 1194, in __aenter__\n    self._resp = await self._coro\n  File \"/usr/local/lib/python3.10/dist-packages/aiohttp/client.py\", line 605, in _request\n    await resp.start(conn)\n  File \"/usr/local/lib/python3.10/dist-packages/aiohttp/client_reqrep.py\", line 966, in start\n    message, payload = await protocol.read()  # type: ignore[union-attr]\n  File \"/usr/local/lib/python3.10/dist-packages/aiohttp/streams.py\", line 622, in read\n    await self._waiter\naiohttp.client_exceptions.ClientOSError: [Errno 104] Connection reset by peer\n", "level": "ERROR"}

2024-04-17T07:48:57.719183260Z {"requestId": "92b3176b-81d0-4dbb-9307-9cbe812dd8f0-u1", "message": "Finished.", "level": "INFO"}
2024-04-17T07:49:03.090390819Z {"requestId": null, "message": "Failed to get job. | Error Type: ClientOSError | Error Message: [Errno 104] Connection reset by peer", "level": "ERROR"}

2024-04-17T07:49:03.090448408Z {"requestId": null, "message": "Traceback: Traceback (most recent call last):\n  File \"/usr/local/lib/python3.10/dist-packages/runpod/serverless/modules/rp_job.py\", line 55, in get_job\n    async with session.get(_job_get_url()) as response:\n  File \"/usr/local/lib/python3.10/dist-packages/aiohttp/client.py\", line 1194, in __aenter__\n    self._resp = await self._coro\n  File \"/usr/local/lib/python3.10/dist-packages/aiohttp/client.py\", line 605, in _request\n    await resp.start(conn)\n  File \"/usr/local/lib/python3.10/dist-packages/aiohttp/client_reqrep.py\", line 966, in start\n    message, payload = await protocol.read()  # type: ignore[union-attr]\n  File \"/usr/local/lib/python3.10/dist-packages/aiohttp/streams.py\", line 622, in read\n    await self._waiter\naiohttp.client_exceptions.ClientOSError: [Errno 104] Connection reset by peer\n", "level": "ERROR"}

5 replies

RRunPod

•Created by pazanchick on 2/17/2024 in #⚡｜serverless

llama.cpp serverless endpoint

https://github.com/ggerganov/llama.cpp
llama.cpp is afak the only setup that supports llava-1.6 quantized, that's why i use it. On some workers the docker image works, on others "illegal instruction" error and crash. https://github.com/ggerganov/llama.cpp/issues/537 I wonder if someone already tried it out and if there's a better fix to this issue other than building and stuffing multiple binaries with the correct instruction sets into one image that will work anywhere. (i already tried building with LLAMA_NATIVE=0) appreciate any insights, thanks!

8 replies

RRunPod

•Created by pazanchick on 2/7/2024 in #⚡｜serverless

GraphQL: How to get the runtime of a serverless pod through the api stateless?

3 replies

RRunPod

•Created by pazanchick on 2/6/2024 in #⛅｜pods-clusters

GraphQL: Query specific Endpoints and getting running worker amount

My goal is to adjust the amount of Active Workers for serverless endpoints dynamically. 1. Is there a way to query specific endpoints, instead of all? https://docs.runpod.io/graphql/manage-endpoints 2. Is checking the PodTelemetry.state for all endpoint pods the most reliable way to for counting how many pods are running? https://graphql-spec.runpod.io/#definition-Endpoint I found that a health check with https://api.runpod.ai/v2/<id>/health seems to be more delayed. Appreciate any insights and thanks for your time!

4 replies

Gaming

Programming