Way to respond with multiple images from text to image models

I am want to respond with two or more images in a single response I tried to use encodebase64 from hono but that just returns an empty string here is my route.
avatarRouter.get("/", async(c) => {
const body = await c.req.json()
const ai = new Ai(c.env.AI)
const prompt = body.prompt

const anime = `anime style ${prompt}`
const realistic = `photo realistic style ${prompt}`


try {
const animeImg = await ai.run("@cf/bytedance/stable-diffusion-xl-lightning", { prompt: anime})
const realisticImg = await ai.run("@cf/bytedance/stable-diffusion-xl-lightning", { prompt: realistic})

console.log({
data: encodeBase64(animeImg)
// returns an empty string
})
//this works for a single image:
// return c.body(animeImg)

//this just returns empty strings
return c.json({
anime: encodeBase64(animeImg)
realistic: encodeBase64(realisticImg)
} catch (e) {
c.status(400)
return c.json({
message: "error while fetching image"
})
}
})
avatarRouter.get("/", async(c) => {
const body = await c.req.json()
const ai = new Ai(c.env.AI)
const prompt = body.prompt

const anime = `anime style ${prompt}`
const realistic = `photo realistic style ${prompt}`


try {
const animeImg = await ai.run("@cf/bytedance/stable-diffusion-xl-lightning", { prompt: anime})
const realisticImg = await ai.run("@cf/bytedance/stable-diffusion-xl-lightning", { prompt: realistic})

console.log({
data: encodeBase64(animeImg)
// returns an empty string
})
//this works for a single image:
// return c.body(animeImg)

//this just returns empty strings
return c.json({
anime: encodeBase64(animeImg)
realistic: encodeBase64(realisticImg)
} catch (e) {
c.status(400)
return c.json({
message: "error while fetching image"
})
}
})
I just want to be able to respond with 2 or multiple images (without like adding up the images in a canvas and responding with that image) so I am open to different approaches
6 Replies
rayberra
rayberra10mo ago
If you're using SDK version 1.1.0 or the new native binding, the response from run() is a ReadableStream instead of Uint8Array. I reckon that's why your encodeBase64(animeImg) returns an empty string. Beware that encoding and returning json will be quite slow though. Also, you'd want to run the inference requests in parallel, i.e. using Promise.all().
Nozomu
NozomuOP10mo ago
but when I hover over the responses it still says Uint8Array, also is there any other way you'd recommend so that I can format the images and send multiple images in a single response or directly send multiple images? should I downgrade the version, but anyways as you said encoding will take a lot of time
rayberra
rayberra10mo ago
The type data is probably not up to date. The best (simple) solution I've come up with so far - if you really need two images in a single response - is something like this:
// run inference requests in parallel
const responses = await Promise.all([
env.AI.run(model1, inputs1),
env.AI.run(model2, inputs2)]);

// read the streams to completion
const blobs = await Promise.all([
new Response(responses[0]).blob(),
new Response(responses[1]).blob()]);

// create a multipart/form-data response
const formData = new FormData();
formData.set("output1", blobs[0]);
formData.set("output2", blobs[1]);
return new Response(formData);
// run inference requests in parallel
const responses = await Promise.all([
env.AI.run(model1, inputs1),
env.AI.run(model2, inputs2)]);

// read the streams to completion
const blobs = await Promise.all([
new Response(responses[0]).blob(),
new Response(responses[1]).blob()]);

// create a multipart/form-data response
const formData = new FormData();
formData.set("output1", blobs[0]);
formData.set("output2", blobs[1]);
return new Response(formData);
Not sure I'd recommend it, but I've tested it with 2 x @cf/bytedance/stable-diffusion-xl-lightning and got acceptable results on latency and CPU. Websockets could be another option.
Nozomu
NozomuOP10mo ago
I tried this out, I think it works decent enough for what I am building. this is kind of the response on postman I don't know for now, but I am sure there is an easy way to format it on the client side --8aedf57a88e783599c5d76329ad6425a Content-Disposition: form-data; name="output!"; filename="output!" Content-Type: application/octet-stream �PNG  I have an intuition for how to make it work with websockets I'll try that out if this becomes too inefficient. Thankyou for the help really appreciate it 😁 !
rayberra
rayberra10mo ago
Made a tiny complete example of using formdata/multi-part, including client-side js and "UI": https://gist.github.com/Raylight-PWL/495d9ecf835602914dd35e7edadac0e2. Looks like CPU time is somewhere around 5ms. Latency is theoretically close to that of generating a single image. In practice there's a lot of variance, and you'll naturally always be stalling on the slowest of the two inference requests.
Nozomu
NozomuOP10mo ago
"obligatory css hell" had me wheezing, I did a similar approach with createObjectUrl. again thakyou for the help Ray!

Did you find this page helpful?