RunPod•13mo ago

How to Run Text Generation Inference on Serverless?

Hello newbie here, I want to run text generation inference by huggingface on serverless. I use this repo https://github.com/runpod-workers/worker-tgi, I build my own docker image according the readme and deploy on runpod serverless. But when i hit my API I get this error:

{
  "delayTime": 100308,
  "error": "handler: module 'runpod.serverless.modules' has no attribute 'rp_metrics' \ntraceback: Traceback (most recent call last):\n  File \"/opt/conda/lib/python3.10/site-packages/runpod/serverless/modules/rp_job.py\", line 194, in run_job_generator\n    async for output_partial in job_output:\n  File \"/handler.py\", line 75, in handler_streaming\n    runpod.serverless.modules.rp_metrics.metrics_collector.update_stream_aggregate(\nAttributeError: module 'runpod.serverless.modules' has no attribute 'rp_metrics'\n",
  "executionTime": 376,
  "id": "d5ff5d8d-acf5-40a3-8ffb-1ee5ce48f8d3-e1",
  "status": "FAILED"
}

{
  "delayTime": 100308,
  "error": "handler: module 'runpod.serverless.modules' has no attribute 'rp_metrics' \ntraceback: Traceback (most recent call last):\n  File \"/opt/conda/lib/python3.10/site-packages/runpod/serverless/modules/rp_job.py\", line 194, in run_job_generator\n    async for output_partial in job_output:\n  File \"/handler.py\", line 75, in handler_streaming\n    runpod.serverless.modules.rp_metrics.metrics_collector.update_stream_aggregate(\nAttributeError: module 'runpod.serverless.modules' has no attribute 'rp_metrics'\n",
  "executionTime": 376,
  "id": "d5ff5d8d-acf5-40a3-8ffb-1ee5ce48f8d3-e1",
  "status": "FAILED"
}

can anyone help me?

8 Replies

ashleyk•13mo ago

Most people use this one: https://github.com/runpod-workers/worker-vllm

GitHub

GitHub - runpod-workers/worker-vllm: The RunPod worker template for...

The RunPod worker template for serving our large language model endpoints. Powered by vLLM. - runpod-workers/worker-vllm

Oryza sativaOP•13mo ago

is it also support text generation inference?

ashleyk•13mo ago

Yes

Oryza sativaOP•13mo ago

hello sorry for late response, I tried to use prebuild docker from this repo the config looks like this. but still no response after hit my api

ashleyk•13mo ago

What response do you get when calling your endpoint? @Alpay Ariyak may be able to advise.

Alpay Ariyak•13mo ago

@Oryza sativa Can you share the worker logs

Oryza sativaOP•13mo ago

I am sorry, already get the response, I think becuse I hit the endpoint but my endpoint still on initializing status, isn ot ready yet. and finally get the response with this. thank you @ashleyk

-d '{"input": {"prompt": "What is Deeplearning?", "sampling_params": {"max_tokens": 100, "n": 1, "presence_penalty": 0.2, "frequency_penalty": 0.7, "temperature": 0.3}}} '

-d '{"input": {"prompt": "What is Deeplearning?", "sampling_params": {"max_tokens": 100, "n": 1, "presence_penalty": 0.2, "frequency_penalty": 0.7, "temperature": 0.3}}} '

Oryza sativaOP•13mo ago

but i just curious, it is using vllm right? so is runpod now support using TGI for deploying model in serverless? https://github.com/huggingface/text-generation-inference

GitHub

GitHub - huggingface/text-generation-inference: Large Language Mode...

Large Language Model Text Generation Inference. Contribute to huggingface/text-generation-inference development by creating an account on GitHub.

Gaming

Programming

How to Run Text Generation Inference on Serverless?

Did you find this page helpful?