Finley
RRunPod
•Created by jax on 3/15/2024 in #⚡|serverless
Delay Time is too long
Hi @jax - long delay times like that are usually from the worker needing to download a model during its cold start. If you're using a custom model you'll want to have it saved to a network volume to avoid this
14 replies
Question about graphql API
@ChD That's correct - likely the A100 SXMs were all in use at the time that you ran the query, globally out of all specific GPU specs they're the most likely to go completely unavailable at the moment
There are some now and it's showing the price for me: "id":"NVIDIA A100-SXM4-80GB","securePrice":2.29}
23 replies
RRunPod
•Created by n8tzto on 3/14/2024 in #⚡|serverless
Unstable Internet Connection in the Workers
Hi @n8tzto - do you happen to have the endpoint IDs and the time of day that this occurs?
10 replies
Pod Downsized, with Pictures
Hi @MushyPotato - when you rent a pod on spot and it gets stopped (really, when a pod gets stopped for any reason) the GPUs in that machine are made available for other customers, and by the time you can start it again there may be fewer GPUs available.
Setting up a network volume will allow you to deploy to any GPU within that data center, so you will not be limited by whatever the GPU rental status in your specific machine is.
7 replies
GPU usage when pod initialized. Not able to clear.
@zkreutzjanz I've noted the pod/machine ID so we can take a look internally, best thing to do would be to rent another set of GPUs in the meantime if one is available. Sorry for the incovenience
5 replies
Chat History, Memory and Messages
@UltimateTobi I'm assuming you're using text-generation-webui?
As long as the messages appear in the window under Chat they will be considered by the AI, up to the context limit (e.g the "truncate the prompt up to this lenght" slider, if it shows 2048, it will consider the most recent 2048 tokens in the chat window, minus any tokens being used by the character tab, the system prompt, etc.)
For stuff that you want the AI to consider at all times, that would need to be in the Character tab as well.
If you're using AI chat I would actually recommend installing Sillytavern locally as it gives you way more options for interacting with the AI https://sillytavernai.com/, you can just use the oobabooga connection method and connect to the API at https://5xyhvtuv700jfl-5000.proxy.runpod.net/ (just swap out your pod ID in the URL)
You can edit the most recent message in oobabooga only, it's clunky, you need to type out what you want the message to be, and click replace last reply. ST is much more intuitive and lets you edit any reply at will
2 replies
Keeping reverse proxy hostname between destroy/start
At this time there's no way to keep the same pod ID after you terminate a network volume pod - there's no stop state for network volume pods (it's really specifically designed for machine based storage so the storage can be kept and the GPU can be freed up)
I can definitely see how this would be a feature gap though so I will bring it up to the team
6 replies
Does Runpod Support Kubernetes?
Hi there - can you advise how you got there? We definitely need to do something about that page, it's incredibly outdated 😅
As far as Kubernetes - no, no support for that, but if you are willing to rent out an entire machine of GPUs for a minimum time commitment for at least a few months, we can offer a baremetal setup instead
6 replies