柠檬板烧鸡
RRunPod
•Created by 柠檬板烧鸡 on 3/12/2025 in #⚡|serverless
How to optimize batch processing performance?
Use serverless to deploy Qwen/Qwen2-7B model
GPU: Nivada A40 48G
Environment variables:
MODEL_NAME=Qwen/Qwen2-7B
HF_TOKEN=xxx
ENABLE_LORA=True
LORA_MODULES={"name": "cn_writer", "path": "{huggingface_model_name}", "base_model_name": "Qwen/Qwen2-7B"}
MAX_LORA_RANK=64
MIN_BATCH_SIZE=384
ENABLE_PREFIX_CACHING=1
My problem:
Batch processing takes too long, which is 3-4 times the time of a single request. How should I reduce the time consumption of this batch processing?
My code is in the attachment
Phenomenon:
The time consumption of 64 batch processing requests is 4 times that of a single batch processing request.
What I expect is how to make the time of 64 batch processing close to the time of single batch processing
64 replies
RRunPod
•Created by 柠檬板烧鸡 on 3/5/2025 in #⚡|serverless
how can I check the logs to see if my request uses the lora model

12 replies
Why is there still a daily charge after purchasing pod A40-48G with a one-time payment?
I purchased a GPU A40 *1 48G pod in Secure Cloud mode on February 17,
Volume Disk: 60G
Container Disk: 30G
Monthly purchase, $0.29/hour, totaling $262.08 per month.
My doubts are as follows
1. The first one-time deduction was $280
2. On the 18th and 19th, there will be a daily deduction of $5-7
I want to know how this fee is calculated?
20 replies