pazanchick
pazanchick
RRunPod
Created by pazanchick on 4/22/2024 in #⚡|serverless
No active workers after deploying New Release
No description
34 replies
RRunPod
Created by pazanchick on 4/17/2024 in #⚡|serverless
'Connection reset by peer' after job finishes.
Previous logs indicate that the handler works correctly. Happened multiple times now and returns failure response. Any input on this issue? @Papa Madiator
2024-04-17T07:48:57.719183260Z {"requestId": "92b3176b-81d0-4dbb-9307-9cbe812dd8f0-u1", "message": "Finished.", "level": "INFO"}
2024-04-17T07:49:03.090390819Z {"requestId": null, "message": "Failed to get job. | Error Type: ClientOSError | Error Message: [Errno 104] Connection reset by peer", "level": "ERROR"}

2024-04-17T07:49:03.090448408Z {"requestId": null, "message": "Traceback: Traceback (most recent call last):\n File \"/usr/local/lib/python3.10/dist-packages/runpod/serverless/modules/rp_job.py\", line 55, in get_job\n async with session.get(_job_get_url()) as response:\n File \"/usr/local/lib/python3.10/dist-packages/aiohttp/client.py\", line 1194, in __aenter__\n self._resp = await self._coro\n File \"/usr/local/lib/python3.10/dist-packages/aiohttp/client.py\", line 605, in _request\n await resp.start(conn)\n File \"/usr/local/lib/python3.10/dist-packages/aiohttp/client_reqrep.py\", line 966, in start\n message, payload = await protocol.read() # type: ignore[union-attr]\n File \"/usr/local/lib/python3.10/dist-packages/aiohttp/streams.py\", line 622, in read\n await self._waiter\naiohttp.client_exceptions.ClientOSError: [Errno 104] Connection reset by peer\n", "level": "ERROR"}
2024-04-17T07:48:57.719183260Z {"requestId": "92b3176b-81d0-4dbb-9307-9cbe812dd8f0-u1", "message": "Finished.", "level": "INFO"}
2024-04-17T07:49:03.090390819Z {"requestId": null, "message": "Failed to get job. | Error Type: ClientOSError | Error Message: [Errno 104] Connection reset by peer", "level": "ERROR"}

2024-04-17T07:49:03.090448408Z {"requestId": null, "message": "Traceback: Traceback (most recent call last):\n File \"/usr/local/lib/python3.10/dist-packages/runpod/serverless/modules/rp_job.py\", line 55, in get_job\n async with session.get(_job_get_url()) as response:\n File \"/usr/local/lib/python3.10/dist-packages/aiohttp/client.py\", line 1194, in __aenter__\n self._resp = await self._coro\n File \"/usr/local/lib/python3.10/dist-packages/aiohttp/client.py\", line 605, in _request\n await resp.start(conn)\n File \"/usr/local/lib/python3.10/dist-packages/aiohttp/client_reqrep.py\", line 966, in start\n message, payload = await protocol.read() # type: ignore[union-attr]\n File \"/usr/local/lib/python3.10/dist-packages/aiohttp/streams.py\", line 622, in read\n await self._waiter\naiohttp.client_exceptions.ClientOSError: [Errno 104] Connection reset by peer\n", "level": "ERROR"}
5 replies
RRunPod
Created by pazanchick on 2/17/2024 in #⚡|serverless
llama.cpp serverless endpoint
https://github.com/ggerganov/llama.cpp
llama.cpp is afak the only setup that supports llava-1.6 quantized, that's why i use it. On some workers the docker image works, on others "illegal instruction" error and crash. https://github.com/ggerganov/llama.cpp/issues/537 I wonder if someone already tried it out and if there's a better fix to this issue other than building and stuffing multiple binaries with the correct instruction sets into one image that will work anywhere. (i already tried building with LLAMA_NATIVE=0) appreciate any insights, thanks!
8 replies
RRunPod
Created by pazanchick on 2/7/2024 in #⚡|serverless
GraphQL: How to get the runtime of a serverless pod through the api stateless?
No description
3 replies
RRunPod
Created by pazanchick on 2/6/2024 in #⛅|pods
GraphQL: Query specific Endpoints and getting running worker amount
My goal is to adjust the amount of Active Workers for serverless endpoints dynamically. 1. Is there a way to query specific endpoints, instead of all? https://docs.runpod.io/graphql/manage-endpoints 2. Is checking the PodTelemetry.state for all endpoint pods the most reliable way to for counting how many pods are running? https://graphql-spec.runpod.io/#definition-Endpoint I found that a health check with https://api.runpod.ai/v2/<id>/health seems to be more delayed. Appreciate any insights and thanks for your time!
4 replies