Image to Text with Workers AI.

Hello! I'm testing out some options for creating image alt text using AI. Most API endpoints I've trialed take a image URL string as the input paramter. The cloudflare API JSON schema is unfamiliar to me - where am I passing the image here? https://developers.cloudflare.com/workers-ai/models/uform-gen2-qwen-500m/ The Cloudflare API docs also don't seem to list Image to Text as an available endpoint. https://developers.cloudflare.com/api/operations/workers-ai-post-run-cf-microsoft-resnet-50 Finally, the API offers GET for model search. https://developers.cloudflare.com/api/operations/workers-ai-search-model I found the url listed doesn't work: https://api.cloudflare.com/client/v4/apiv4/accounts/{account_id}/models/search But this did return a bunch of models, including the image-to-text model I want to trial (@cf/unum/uform-gen2-qwen-500m) https://api.cloudflare.com/client/v4/accounts/{{account_id}}/ai/models/search Can someone help me decipher if an image-to-text endpoint is available and how to send that request? The docs are really confusing me here.
Cloudflare Docs
uform-gen2-qwen-500m · Cloudflare Workers AI docs
Run AI models in Workers, Pages, or via API.
Cloudflare API Documentation
Interact with Cloudflare's products and services via the Cloudflare API
Cloudflare API Documentation
Interact with Cloudflare's products and services via the Cloudflare API
8 Replies
Web Bae
Web Bae5mo ago
I've also experiement with the ai package directly in the worker (vice hitting the api endpoint). It seems the expected input is an arraybuffer... is there a way to give it a hostedUrl string?
rayberra
rayberra5mo ago
In case you have figured it out yet.. As you've noticed, @cf/unum/uform-gen2-qwen-500m is somewhat undocumented atm. Expected inputs from a worker are in line with:
const inputs = {
image: [...new Uint8Array(await inputImage.arrayBuffer())],
prompt: "Describe this image in three sentences.",
max_tokens: 512, // optional
};
const inputs = {
image: [...new Uint8Array(await inputImage.arrayBuffer())],
prompt: "Describe this image in three sentences.",
max_tokens: 512, // optional
};
You can't give it an URL, so it's up to your worker to fetch or supply the image as you see fit, e.g. const inputImage = await fetch("https://example.com/cat.jpg");.
Web Bae
Web Bae5mo ago
thanks @Raylight ! I got it working
another_User
another_User5mo ago
@Web Bae could you please help me, how did you solve this, what does your code look like, I'm also stuck here
Web Bae
Web Bae5mo ago
@another_User here you go
const imgUrl = data.hostedUrl;
const response = await fetch(imgUrl);
const arrayBuffer = await response.arrayBuffer();
const arr = Array.from(new Uint8Array(arrayBuffer));

// Create an AI request with the base64 encoded image
const ai = new Ai(c.env.AI);
const input = {
image: arr, // Directly use the base64 string
prompt: 'Provide a one sentence description of the image to be used as website alt text',
max_tokens: 256,
};

const aiResponse = await ai.run<'@cf/unum/uform-gen2-qwen-500m'>('@cf/unum/uform-gen2-qwen-500m', input);
const { description } = aiResponse;
const imgUrl = data.hostedUrl;
const response = await fetch(imgUrl);
const arrayBuffer = await response.arrayBuffer();
const arr = Array.from(new Uint8Array(arrayBuffer));

// Create an AI request with the base64 encoded image
const ai = new Ai(c.env.AI);
const input = {
image: arr, // Directly use the base64 string
prompt: 'Provide a one sentence description of the image to be used as website alt text',
max_tokens: 256,
};

const aiResponse = await ai.run<'@cf/unum/uform-gen2-qwen-500m'>('@cf/unum/uform-gen2-qwen-500m', input);
const { description } = aiResponse;
another_User
another_User5mo ago
Thanks. This really helped. Were you able to get the inference SD 1.5 model to work. It currently gives me: InferenceUpstreamError: unknown internal error
Web Bae
Web Bae5mo ago
I haven’t tried that one I think - can you link me to it?
another_User
another_User5mo ago
I got it to work, thanks. For anyone wondering, I got the in painting to work by passing by deploying the worker function, in there, passed the mask and image as new uint8list, similar to what @Web Bae did. It worked perfectly.
Want results from more Discord servers?
Add your server