Concept Comments - Answer Overflow

Existing worker on the newest SDK. I believe it was a JSON serialization error, which would be an error on my side but it shouldn't keep on running like that after erroring.

10 replies

RRunPod

•Created by Concept on 2/1/2024 in #⚡｜serverless

VLLM Worker Error that doesn't time out.

10 replies

RRunPod

•Created by Concept on 2/1/2024 in #⚡｜serverless

VLLM Worker Error that doesn't time out.

IS there a way to kill workers when they error?

10 replies

RRunPod

•Created by Concept on 1/15/2024 in #⚡｜serverless

Request Format Runpod VLLM Worker

I’m not too sure if there’s a difference

11 replies

RRunPod

•Created by Concept on 1/15/2024 in #⚡｜serverless

Request Format Runpod VLLM Worker

const requestBody = { input: { prompt: chatHistory, sampling_params: { max_tokens: 2000, }, apply_chat_template: true, stream: true, }, }; This worked for me

11 replies

RRunPod

•Created by Concept on 1/20/2024 in #⚡｜serverless

Empty Tokens Using Mixtral AWQ

@Alpay Ariyak

6 replies

RRunPod

•Created by Concept on 1/20/2024 in #⚡｜serverless

Empty Tokens Using Mixtral AWQ

{
  "delayTime": 206939,
  "executionTime": 7064,
  "id": "7756be1f-38eb-4672-9097-7785b367df08-u1",
  "output": [
    {
      "finished": false,
      "tokens": [
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        ""
      ],
      "usage": {
        "input": 43,
        "output": 30
      }
    },
    {
      "finished": false,
      "tokens": [
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        ""
      ],
      "usage": {
        "input": 43,
        "output": 60
      }
    },
    {
      "finished": false,
      "tokens": [
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        ""
      ],
      "usage": {
        "input": 43,
        "output": 90
      }
    },
    {
      "finished": true,
      "tokens": [
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        ""
      ],
      "usage": {
        "input": 43,
        "output": 100
      }
    }
  ],
  "status": "COMPLETED"
}

{
  "delayTime": 206939,
  "executionTime": 7064,
  "id": "7756be1f-38eb-4672-9097-7785b367df08-u1",
  "output": [
    {
      "finished": false,
      "tokens": [
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        ""
      ],
      "usage": {
        "input": 43,
        "output": 30
      }
    },
    {
      "finished": false,
      "tokens": [
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        ""
      ],
      "usage": {
        "input": 43,
        "output": 60
      }
    },
    {
      "finished": false,
      "tokens": [
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        ""
      ],
      "usage": {
        "input": 43,
        "output": 90
      }
    },
    {
      "finished": true,
      "tokens": [
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        "",
        ""
      ],
      "usage": {
        "input": 43,
        "output": 100
      }
    }
  ],
  "status": "COMPLETED"
}

6 replies

RRunPod

•Created by wizardjoe on 1/4/2024 in #⚡｜serverless

Error building worker-vllm docker image for mixtral 8x7b

Will look into it thank you.

69 replies

RRunPod

•Created by wizardjoe on 1/4/2024 in #⚡｜serverless

Error building worker-vllm docker image for mixtral 8x7b

So the reason why I'm trying to use Mixtral is the use of experts and also its context window. I'm open to using OpenChat, would it be possible to increase the context size from 8k or is that set? @Justin

69 replies

RRunPod

•Created by wizardjoe on 1/4/2024 in #⚡｜serverless

Error building worker-vllm docker image for mixtral 8x7b

69 replies

RRunPod

•Created by wizardjoe on 1/4/2024 in #⚡｜serverless

Error building worker-vllm docker image for mixtral 8x7b

2024-01-19T20:28:00.200082421Z INFO 01-19 20:28:00 llm_engine.py:70] Initializing an LLM engine with config: model='TheBloke/mixtral-8x7b-v0.1-AWQ', tokenizer='TheBloke/mixtral-8x7b-v0.1-AWQ', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=32768, download_dir='/models', load_format=auto, tensor_parallel_size=1, quantization=awq, enforce_eager=False, seed=0) This log is taking the most time. I'm stuck here for about 2-3minutes

69 replies

Gaming

Programming