R
RunPod•10mo ago
Hashset

Text-generation-inference on serverless endpoints

Hi, I don't have much experience neither with llms nor with python, so I always just use this image 'ghcr.io/huggingface/text-generation-inference:latest' and run my models on Pods. Now, I wanna try serverless endpoints, but I don't know how to launch text-generation-inference on serverless endpoints, can someone give some tips or maybe there are some docs which could help me.
10 Replies
ashleyk
ashleyk•10mo ago
GitHub
GitHub - runpod-workers/worker-vllm: The RunPod worker template for...
The RunPod worker template for serving our large language model endpoints. Powered by vLLM. - runpod-workers/worker-vllm
Hashset
HashsetOP•10mo ago
Thanks @ashleyk, I think this can help. I'll take a look at it 🙂
No description
Hashset
HashsetOP•10mo ago
For now, everything works well! I managed to deploy llama-2-7b, but I have few more questions:
Hashset
HashsetOP•10mo ago
1. How can I set a temperature or other fields when sending a request:
No description
Hashset
HashsetOP•10mo ago
2. Why am I seeing this deprecation notification? Am I doing something wrong?
No description
Hashset
HashsetOP•10mo ago
2024-03-06T11:04:57.237074977Z 2024-03-06T11:04:57.237196083Z ========== 2024-03-06T11:04:57.237225766Z == CUDA == 2024-03-06T11:04:57.237320751Z ========== 2024-03-06T11:04:57.239764246Z 2024-03-06T11:04:57.239767808Z CUDA Version 12.1.0 2024-03-06T11:04:57.240376901Z 2024-03-06T11:04:57.240384025Z Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. 2024-03-06T11:04:57.240869637Z 2024-03-06T11:04:57.240873199Z This container image and its contents are governed by the NVIDIA Deep Learning Container License. 2024-03-06T11:04:57.240876761Z By pulling and using the container, you accept the terms and conditions of this license: 2024-03-06T11:04:57.240879135Z https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license 2024-03-06T11:04:57.240883884Z 2024-03-06T11:04:57.240886259Z A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience. 2024-03-06T11:04:57.249862363Z 2024-03-06T11:04:57.249875424Z ** 2024-03-06T11:04:57.249882548Z DEPRECATION NOTICE! 2024-03-06T11:04:57.249979908Z ** 2024-03-06T11:04:57.250003654Z THIS IMAGE IS DEPRECATED and is scheduled for DELETION. 2024-03-06T11:04:57.250009591Z https://gitlab.com/nvidia/container-images/cuda/blob/master/doc/support-policy.md 2024-03-06T11:04:57.250039273Z
ashleyk
ashleyk•10mo ago
Why are you using CUDA 12.1.0 base image and not 12.1.1. Use 12.1.1 instead.
Alpay Ariyak
Alpay Ariyak•10mo ago
Please refer to the Worker vLLM documentation, it goes into a lot of detail on usage That’s the one vLLM uses in their docker image
Hashset
HashsetOP•10mo ago
Thank you Alpay, yes I've found my answers in the documentation) sorry, I should've read it till the end)
Alpay Ariyak
Alpay Ariyak•10mo ago
No worries at all, let me know if anything else comes up
Want results from more Discord servers?
Add your server