Limitation questions
Hi, I had a question about the LLM limits on tokens. I saw on the docs that certain LLMs like
@cf/baai/bge-base-en-v1.5
have a max token limit of 512
in and 768
out, but I didn't happen to see anything for the @cf/meta/llama-2-7b-chat-fp16
or @cf/meta/llama-2-7b-chat-int8
models.
Anywhere I can see the info for those models?Cloudflare Docs
Text Embeddings · Cloudflare Workers AI docs
Feature extraction models transform raw data into numerical features that can be processed while preserving the information in the original dataset. …
3 Replies
Please ping me when you reply 🙏
Hi :meowwave:, on the Text Generation page https://developers.cloudflare.com/workers-ai/models/text-generation/ you should see the limit for -fp16 is
Default max (sequence) tokens (stream): 2500 Default max (sequence) tokens: 256 Context tokens limit: 3072 Sequence tokens limit: 2500and for -int8
Default max (sequence) tokens (stream): 1800 Default max (sequence) tokens: 256 Context tokens limit: 2048 Sequence tokens limit: 1800
Ohh its on there?
The "terms" link on the model catalogue links you to the link I sent lul
ty very much