yhlong00000
yhlong00000
RRunPod
Created by blabbercrab on 7/7/2024 in #⚡|serverless
Trying to load a huge model into serverless
also 4x48GB = 192GB won't work as well, out of memory😂
16 replies
RRunPod
Created by blabbercrab on 7/7/2024 in #⚡|serverless
Trying to load a huge model into serverless
No description
16 replies
RRunPod
Created by BadNoise on 7/5/2024 in #⚡|serverless
Pipeline is not using gpu on serverless
No description
70 replies
RRunPod
Created by BadNoise on 7/5/2024 in #⚡|serverless
Pipeline is not using gpu on serverless
No description
70 replies
RRunPod
Created by BadNoise on 7/5/2024 in #⚡|serverless
Pipeline is not using gpu on serverless
No description
70 replies
RRunPod
Created by BadNoise on 7/5/2024 in #⚡|serverless
Pipeline is not using gpu on serverless
No description
70 replies
RRunPod
Created by BadNoise on 7/5/2024 in #⚡|serverless
Pipeline is not using gpu on serverless
i don't feel anything wrong with that😂 , I am still wondering what Patrick changed make it works to start using the GPU.
70 replies
RRunPod
Created by BadNoise on 7/5/2024 in #⚡|serverless
Pipeline is not using gpu on serverless
I think he trying to use the cache_model.py to cache the model locally when building the docker image. He set local_files_only=True, just to make sure it never download from internet.
70 replies
RRunPod
Created by houmie on 6/28/2024 in #⚡|serverless
vLLM serverless throws 502 errors
The error logs you are seeing indicate it is experiencing network-related issues when trying to fetch job from a server.
10 replies
RRunPod
Created by falk on 6/27/2024 in #⚡|serverless
Prevent Extra Workers from appearing
Extra workers are pre-provisioned but do not run unless necessary. They are designed to handle spikes in load by being available to start quickly if all max workers are busy. If you have set a limit of 3 max workers and have 2 extra workers: • Normal Operation: Only the 3 max workers handle requests. • During Throttling: If the load exceeds the capacity of the 3 max workers and they are all handling requests, the extra workers can be activated to manage the additional load. This setup ensures that the system can handle sudden increases in demand without immediate throttling, improving responsiveness and stability. Extra workers do not incur costs when they are idle. You are only charged for the workers that are actively handling requests. This allows you to have a buffer for handling spikes without incurring extra costs when the demand is low. @nerdylive @digigoblin is above explanation correct?
12 replies
RRunPod
Created by Bitman on 6/18/2024 in #⚡|serverless
best architecture opinion
I agree that it adds extra complexity. The advantage is that your backend only manages the workflow, while your serverless function has a single responsibility for performing the inference. By separating these tasks, the backend can scale or be modified independently as future needs arise.
6 replies
RRunPod
Created by Bitman on 6/18/2024 in #⚡|serverless
best architecture opinion
I feel create a backend that takes user input and calls a serverless endpoint to generate 10 prompts. Then, you can make parallel calls to serverless endpoint to process these prompts simultaneously. Finally, you can call a serverless endpoint again to aggregate the responses. It’s important to handle all possible cases, such as handling failed calls where only some responses are returned, or when the aggregation call fails.
6 replies