R
RunPod•2mo ago
houmie

How to deploy Llama3 on Aphrodite Engine (RunPod)

I have setup the following settings for a pod with 48 GB RAM. 1) I'm not sure how to enable Q4 cache otherwise the 5.0bpw won't fit. Any advice please? (See attached) 2) I get an error config.json can't be found, It seems like the REVISION variable has not been taken into account. Based on the docs it says: REVISION: The HuggingFace branch name, it defaults to the main branch. I think that's a bug.
2024-05-06T19:18:00.996748225Z Starting Aphrodite Engine API server...
2024-05-06T19:18:00.996849854Z + exec python3 -m aphrodite.endpoints.openai.api_server --host 0.0.0.0 --port 7860 --download-dir /tmp/hub --model turboderp/Llama-3-70B-Instruct-exl2 --revision 5.0bpw --kv-cache-dtype fp8_e5m2 --gpu-memory-utilization 1.0 --enforce-eager --max-log-len 0
2024-05-06T19:18:03.870671479Z Traceback (most recent call last):
2024-05-06T19:18:03.870689139Z File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
2024-05-06T19:18:03.870694559Z response.raise_for_status()
2024-05-06T19:18:03.870698449Z File "/usr/local/lib/python3.10/dist-Packages/requests/models.py", line 1021, in raise_for_status
2024-05-06T19:18:03.870701649Z raise HTTPError(http_error_msg, response=self)
2024-05-06T19:18:03.870705949Z requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/turboderp/Llama-3-70B-Instruct-exl2/resolve/main/config.json
2024-05-06T19:18:03.870710549Z
2024-05-06T19:18:00.996748225Z Starting Aphrodite Engine API server...
2024-05-06T19:18:00.996849854Z + exec python3 -m aphrodite.endpoints.openai.api_server --host 0.0.0.0 --port 7860 --download-dir /tmp/hub --model turboderp/Llama-3-70B-Instruct-exl2 --revision 5.0bpw --kv-cache-dtype fp8_e5m2 --gpu-memory-utilization 1.0 --enforce-eager --max-log-len 0
2024-05-06T19:18:03.870671479Z Traceback (most recent call last):
2024-05-06T19:18:03.870689139Z File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
2024-05-06T19:18:03.870694559Z response.raise_for_status()
2024-05-06T19:18:03.870698449Z File "/usr/local/lib/python3.10/dist-Packages/requests/models.py", line 1021, in raise_for_status
2024-05-06T19:18:03.870701649Z raise HTTPError(http_error_msg, response=self)
2024-05-06T19:18:03.870705949Z requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/turboderp/Llama-3-70B-Instruct-exl2/resolve/main/config.json
2024-05-06T19:18:03.870710549Z
No description
Solution:
Sure, I just made A PR. Please have a look: https://github.com/PygmalionAI/aphrodite-engine/pull/455 Do you think you could cherry pick this fix for RunPod?...
Jump to solution
18 Replies
nerdylive
nerdylive•2mo ago
or if its a bug i suggest just make an issue to the github of the template
houmie
houmie•2mo ago
Ah yes, there is already an issue about this and patch has been proposed: https://github.com/PygmalionAI/aphrodite-engine/issues/318#issuecomment-2088895545 Do you see how simple the fix is? 😄 They just forgot to add this parameter revision=self.model_config.revision The patch can be applied to the latest release v0.5.2. Do you think you could do that? That would be amazing.
nerdylive
nerdylive•2mo ago
Do a quick pr if you're sure for that, sorry I don't want to do that Create a fork, of the github repo Edit your own fork ( using web editor ) Then create a pr
Solution
houmie
houmie•2mo ago
Sure, I just made A PR. Please have a look: https://github.com/PygmalionAI/aphrodite-engine/pull/455 Do you think you could cherry pick this fix for RunPod?
houmie
houmie•2mo ago
@nerdylive Thanks
nerdylive
nerdylive•2mo ago
No description
houmie
houmie•2mo ago
Ah hold on, it seems someone replied to rebase it to dev.
nerdylive
nerdylive•2mo ago
Try to do that and they will merge your pr
houmie
houmie•2mo ago
Ah is that you? LOL
nerdylive
nerdylive•2mo ago
No no That's not me
houmie
houmie•2mo ago
sure, what a timing
nerdylive
nerdylive•2mo ago
Yeah I just happened to open that just now
houmie
houmie•2mo ago
cool. I'll do it now. Thanks
nerdylive
nerdylive•2mo ago
Yep np Look at the cool green button at my screen, that doesn't look like I just replied hahah
houmie
houmie•2mo ago
yes 🙂 I was joking
nerdylive
nerdylive•2mo ago
same all gud now on the engine?
houmie
houmie•2mo ago
Ok, he just approved and merged it to Dev branch. Now we need to wait until he makes a release. 😦 It's ok. Hopefully he can do it in the next 2 weeks, before we go live.
nerdylive
nerdylive•2mo ago
nice