Cloudflare Developers•15mo ago

Limitation questions

Hi, I had a question about the LLM limits on tokens. I saw on the docs that certain LLMs like @cf/baai/bge-base-en-v1.5 have a max token limit of 512 in and 768 out, but I didn't happen to see anything for the @cf/meta/llama-2-7b-chat-fp16 or @cf/meta/llama-2-7b-chat-int8 models. Anywhere I can see the info for those models?

Cloudflare Docs

Text Embeddings · Cloudflare Workers AI docs

Feature extraction models transform raw data into numerical features that can be processed while preserving the information in the original dataset. …

3 Replies

userOP•15mo ago

Please ping me when you reply 🙏

DaniFoldi•15mo ago

Hi :meowwave:, on the Text Generation page https://developers.cloudflare.com/workers-ai/models/text-generation/ you should see the limit for -fp16 is

Default max (sequence) tokens (stream): 2500 Default max (sequence) tokens: 256 Context tokens limit: 3072 Sequence tokens limit: 2500

and for -int8

Default max (sequence) tokens (stream): 1800 Default max (sequence) tokens: 256 Context tokens limit: 2048 Sequence tokens limit: 1800

userOP•15mo ago

Ohh its on there? The "terms" link on the model catalogue links you to the link I sent lul ty very much

Gaming

Programming

Limitation questions

Did you find this page helpful?