Batch API Access for `@cf/meta/llama-3.3-70b-instruct-batch`

Hello, I'm currently testing the Workers AI Batch API functionality with LLM models, and I've encountered some issues that I’d like clarification on. Attempt 1: @cf/meta/llama-3.3-70b-instruct-fp8-fast I tried to batch multiple requests to this model using the following code:
const batchResult = await AI.run('@cf/meta/llama-3.3-70b-instruct-fp8-fast', {
requests: batchMessages,
}, {
queueRequest: true,
});
const batchResult = await AI.run('@cf/meta/llama-3.3-70b-instruct-fp8-fast', {
requests: batchMessages,
}, {
queueRequest: true,
});
However, I received this error:
Error: 8007: This model does not support request queuing
Error: 8007: This model does not support request queuing
This seems to indicate that batching is not supported for this model, even though it's one of the newest. Attempt 2: @cf/meta/llama-3.3-70b-instruct-batch After reviewing your blog post, I attempted to use this alternative model for batching. However, I received the following error:
Error: 5018: This account is not allowed to access this model
Error: 5018: This account is not allowed to access this model
This makes me wonder: 1. Is @cf/meta/llama-3.3-70b-instruct-batch still in alpha or limited-access only? 2. When will batching be supported for @cf/meta/llama-3.3-70b-instruct-fp8-fast, if at all? 3. Is there a recommended LLM model that supports batching for general instruction-following tasks?
The Cloudflare Blog
Workers AI gets a speed boost, batch workload support, more LoRAs, ...
We just made Workers AI inference faster with speculative decoding & prefix caching. Use our new batch inference for handling large request volumes seamlessly. Build tailored AI apps with more LoRA options. Lastly, new models and a refreshed dashboard round out this Developer Week update for Workers AI.
0 Replies
No replies yetBe the first to reply to this messageJoin

Did you find this page helpful?