柠檬板烧鸡 Posts - Answer Overflow

柠檬板烧鸡

•Created by 柠檬板烧鸡 on 3/12/2025 in #⚡｜serverless

How to optimize batch processing performance？

Use serverless to deploy Qwen/Qwen2-7B model GPU: Nivada A40 48G Environment variables: MODEL_NAME=Qwen/Qwen2-7B HF_TOKEN=xxx ENABLE_LORA=True LORA_MODULES={"name": "cn_writer", "path": "{huggingface_model_name}", "base_model_name": "Qwen/Qwen2-7B"} MAX_LORA_RANK=64 MIN_BATCH_SIZE=384 ENABLE_PREFIX_CACHING=1 My problem: Batch processing takes too long, which is 3-4 times the time of a single request. How should I reduce the time consumption of this batch processing? My code is in the attachment Phenomenon: The time consumption of 64 batch processing requests is 4 times that of a single batch processing request. What I expect is how to make the time of 64 batch processing close to the time of single batch processing

64 replies

RRunPod

•Created by 柠檬板烧鸡 on 3/5/2025 in #⚡｜serverless

how can I check the logs to see if my request uses the lora model

12 replies

RRunPod

•Created by 柠檬板烧鸡 on 2/20/2025 in #⛅｜pods

Why is there still a daily charge after purchasing pod A40-48G with a one-time payment?

I purchased a GPU A40 *1 48G pod in Secure Cloud mode on February 17, Volume Disk: 60G Container Disk: 30G Monthly purchase, $0.29/hour, totaling $262.08 per month. My doubts are as follows 1. The first one-time deduction was $280 2. On the 18th and 19th, there will be a daily deduction of $5-7 I want to know how this fee is calculated?

20 replies

Gaming

Programming