Batch API Access for `@cf/meta/llama-3.3-70b-instruct-batch`
Hello,
I'm currently testing the Workers AI Batch API functionality with LLM models, and I've encountered some issues that I’d like clarification on.
Attempt 1:
@cf/meta/llama-3.3-70b-instruct-fp8-fast
I tried to batch multiple requests to this model using the following code:
However, I received this error:
This seems to indicate that batching is not supported for this model, even though it's one of the newest.
Attempt 2: @cf/meta/llama-3.3-70b-instruct-batch
After reviewing your blog post, I attempted to use this alternative model for batching. However, I received the following error:
This makes me wonder:
1. Is @cf/meta/llama-3.3-70b-instruct-batch
still in alpha or limited-access only?
2. When will batching be supported for @cf/meta/llama-3.3-70b-instruct-fp8-fast
, if at all?
3. Is there a recommended LLM model that supports batching for general instruction-following tasks?The Cloudflare Blog
Workers AI gets a speed boost, batch workload support, more LoRAs, ...
We just made Workers AI inference faster with speculative decoding & prefix caching. Use our new batch inference for handling large request volumes seamlessly. Build tailored AI apps with more LoRA options. Lastly, new models and a refreshed dashboard round out this Developer Week update for Workers AI.
0 Replies