Nelson
Nelson
RRunPod
Created by Nelson on 1/2/2025 in #⚡|serverless
Serverless SGLang - 128 max token limit problem.
I'm trying to use the subject template. I have always the same problem, the number of token of the answer is limited to 128. I don't know how to change the configuration.,,, I've tried with Llama 3.2 3B and Mistral 7B and with both happens the same problem. I've tried to ste the following environment variables with higher numbers than 128 with now luck ... CONTEXT_LENGTH MAX_TOTAL_TOKENS MAX_PREFILL_TOKENS CHUNKED_PREFILL_SIZE STREAM_INTERVAL BLOCK_SIZE MAX_TOKENS COMPLETION_MAX_TOKENS MAX_OUTPUT_TOKENS OUTPUT_TOKENS LLAMA_MAX_OUTPUT_TOKENS MAX_LENGTH COMPLETION_TOKENS COMPLETION_TOKENS_WO_JUMP_FORWARD LENGTH Request: { "input": { "text": "Give me list of the US States names." } } Answer: { "delayTime": 721, "executionTime": 3503, "id": "fa5e8637-5636-4e74-a1ad-de63f8b20301-u1", "output": [ { "meta_info": { "cached_tokens": 1, "completion_tokens": 128, "completion_tokens_wo_jump_forward": 128, "finish_reason": { "length": 128, "type": "length" }, "id": "c8ab53d4847a4e8687dfbe9abbefd90c", "prompt_tokens": 10 }, "text": "\n\nWhy would anyone want a random list of all 50 states names?\n\nYou may want a randomized list as an example of various techniques you can use:\n\n- a list of sets of randomly selected items that the “random list” has in common with the list from which the “random list” was generated (I have collected some of mine in the articles How Many Holes in Swiss Cheese and How Close You Are to Finding a Unicorn, but not exclusively used in the same way)\n- a list of sets of randomly selected items that the “random list” has in common with" } ], "status": "COMPLETED", "workerId": "0ifvvn3bcyfzii" } Any suggestion?
19 replies