Cloudflare Developers•13mo ago

Image to Text with Workers AI.

Hello! I'm testing out some options for creating image alt text using AI. Most API endpoints I've trialed take a image URL string as the input paramter. The cloudflare API JSON schema is unfamiliar to me - where am I passing the image here? https://developers.cloudflare.com/workers-ai/models/uform-gen2-qwen-500m/ The Cloudflare API docs also don't seem to list Image to Text as an available endpoint. https://developers.cloudflare.com/api/operations/workers-ai-post-run-cf-microsoft-resnet-50 Finally, the API offers GET for model search. https://developers.cloudflare.com/api/operations/workers-ai-search-model I found the url listed doesn't work: https://api.cloudflare.com/client/v4/apiv4/accounts/{account_id}/models/search But this did return a bunch of models, including the image-to-text model I want to trial (@cf/unum/uform-gen2-qwen-500m) https://api.cloudflare.com/client/v4/accounts/{{account_id}}/ai/models/search Can someone help me decipher if an image-to-text endpoint is available and how to send that request? The docs are really confusing me here.

Cloudflare Docs

uform-gen2-qwen-500m · Cloudflare Workers AI docs

Run AI models in Workers, Pages, or via API.

Cloudflare API Documentation

Interact with Cloudflare's products and services via the Cloudflare API

Cloudflare API Documentation

Interact with Cloudflare's products and services via the Cloudflare API

8 Replies

Web BaeOP•13mo ago

I've also experiement with the ai package directly in the worker (vice hitting the api endpoint). It seems the expected input is an arraybuffer... is there a way to give it a hostedUrl string?

rayberra•13mo ago

In case you have figured it out yet.. As you've noticed, @cf/unum/uform-gen2-qwen-500m is somewhat undocumented atm. Expected inputs from a worker are in line with:

const inputs = {
  image: [...new Uint8Array(await inputImage.arrayBuffer())],
  prompt: "Describe this image in three sentences.",
  max_tokens: 512, // optional
};

const inputs = {
  image: [...new Uint8Array(await inputImage.arrayBuffer())],
  prompt: "Describe this image in three sentences.",
  max_tokens: 512, // optional
};

You can't give it an URL, so it's up to your worker to fetch or supply the image as you see fit, e.g. const inputImage = await fetch("https://example.com/cat.jpg");.

Web BaeOP•13mo ago

thanks @Raylight ! I got it working

another_User•13mo ago

@Web Bae could you please help me, how did you solve this, what does your code look like, I'm also stuck here

Web BaeOP•12mo ago

@another_User here you go

const imgUrl = data.hostedUrl;
    const response = await fetch(imgUrl);
    const arrayBuffer = await response.arrayBuffer();
    const arr = Array.from(new Uint8Array(arrayBuffer));

    // Create an AI request with the base64 encoded image
    const ai = new Ai(c.env.AI);
    const input = {
        image: arr, // Directly use the base64 string
        prompt: 'Provide a one sentence description of the image to be used as website alt text',
        max_tokens: 256,
    };

    const aiResponse = await ai.run<'@cf/unum/uform-gen2-qwen-500m'>('@cf/unum/uform-gen2-qwen-500m', input);
    const { description } = aiResponse;

const imgUrl = data.hostedUrl;
    const response = await fetch(imgUrl);
    const arrayBuffer = await response.arrayBuffer();
    const arr = Array.from(new Uint8Array(arrayBuffer));

    // Create an AI request with the base64 encoded image
    const ai = new Ai(c.env.AI);
    const input = {
        image: arr, // Directly use the base64 string
        prompt: 'Provide a one sentence description of the image to be used as website alt text',
        max_tokens: 256,
    };

    const aiResponse = await ai.run<'@cf/unum/uform-gen2-qwen-500m'>('@cf/unum/uform-gen2-qwen-500m', input);
    const { description } = aiResponse;

another_User•12mo ago

Thanks. This really helped. Were you able to get the inference SD 1.5 model to work. It currently gives me: InferenceUpstreamError: unknown internal error

Web BaeOP•12mo ago

I haven’t tried that one I think - can you link me to it?

another_User•12mo ago

I got it to work, thanks. For anyone wondering, I got the in painting to work by passing by deploying the worker function, in there, passed the mask and image as new uint8list, similar to what @Web Bae did. It worked perfectly.

Gaming

Programming

Image to Text with Workers AI.

Did you find this page helpful?