server less capability check
I want to add runpod into a tier of load balanced llm models behind an app like openrouter.ai, but the decision will occur in our infrastructure. When i invoke a server less instance with my app and a task is completed, how am I billed for idle time if the container unloads the model from gpu memory?
In other words I want to reduce costs and increase performance by only needing to load the model after an idle timeout, paying only for the small app footprint in storage/memory
Solution:Jump to solution
You are charged for the entire time the container is running including cold start time, execution time and idle timeout.
4 Replies
Solution
You are charged for the entire time the container is running including cold start time, execution time and idle timeout.
I thought so. Do the containers have docker capabilities to create a wireguard interface?
You can't access the underlying docker stuff on the host machine if that's what you're asking
I don't mean the docker socket. I mean I want to create a VPN tunnel to my AWS tenant, rather than dealing with pki in the container