Cloudflare Developers•7d ago

Batch API Access for `@cf/meta/llama-3.3-70b-instruct-batch`

Hello, I'm currently testing the Workers AI Batch API functionality with LLM models, and I've encountered some issues that I’d like clarification on. Attempt 1: @cf/meta/llama-3.3-70b-instruct-fp8-fast I tried to batch multiple requests to this model using the following code:

const batchResult = await AI.run('@cf/meta/llama-3.3-70b-instruct-fp8-fast', {
  requests: batchMessages,
}, {
  queueRequest: true,
});

const batchResult = await AI.run('@cf/meta/llama-3.3-70b-instruct-fp8-fast', {
  requests: batchMessages,
}, {
  queueRequest: true,
});

However, I received this error:

Error: 8007: This model does not support request queuing

Error: 8007: This model does not support request queuing

This seems to indicate that batching is not supported for this model, even though it's one of the newest. Attempt 2: @cf/meta/llama-3.3-70b-instruct-batch After reviewing your blog post, I attempted to use this alternative model for batching. However, I received the following error:

Error: 5018: This account is not allowed to access this model

Error: 5018: This account is not allowed to access this model

This makes me wonder: 1. Is @cf/meta/llama-3.3-70b-instruct-batch still in alpha or limited-access only? 2. When will batching be supported for @cf/meta/llama-3.3-70b-instruct-fp8-fast, if at all? 3. Is there a recommended LLM model that supports batching for general instruction-following tasks?

The Cloudflare Blog

Workers AI gets a speed boost, batch workload support, more LoRAs, ...

We just made Workers AI inference faster with speculative decoding & prefix caching. Use our new batch inference for handling large request volumes seamlessly. Build tailored AI apps with more LoRA options. Lastly, new models and a refreshed dashboard round out this Developer Week update for Workers AI.

0 Replies

No replies yetBe the first to reply to this messageJoin

Gaming

Programming

Batch API Access for `@cf/meta/llama-3.3-70b-instruct-batch`

Did you find this page helpful?