Way to respond with multiple images from text to image models
I am want to respond with two or more images in a single response I tried to use encodebase64 from hono but that just returns an empty string here is my route.
I just want to be able to respond with 2 or multiple images (without like adding up the images in a canvas and responding with that image) so I am open to different approaches
6 Replies
If you're using SDK version 1.1.0 or the new native binding, the response from run() is a ReadableStream instead of Uint8Array. I reckon that's why your encodeBase64(animeImg) returns an empty string. Beware that encoding and returning json will be quite slow though. Also, you'd want to run the inference requests in parallel, i.e. using Promise.all().
but when I hover over the responses it still says Uint8Array, also is there any other way you'd recommend so that I can format the images and send multiple images in a single response or directly send multiple images?
should I downgrade the version, but anyways as you said encoding will take a lot of time
The type data is probably not up to date. The best (simple) solution I've come up with so far - if you really need two images in a single response - is something like this:
Not sure I'd recommend it, but I've tested it with 2 x
@cf/bytedance/stable-diffusion-xl-lightning
and got acceptable results on latency and CPU. Websockets could be another option.I tried this out, I think it works decent enough for what I am building.
this is kind of the response on postman I don't know for now, but I am sure there is an easy way to format it on the client side
--8aedf57a88e783599c5d76329ad6425a
Content-Disposition: form-data; name="output!"; filename="output!"
Content-Type: application/octet-stream
�PNG
I have an intuition for how to make it work with websockets I'll try that out if this becomes too inefficient.
Thankyou for the help really appreciate it 😁 !Made a tiny complete example of using formdata/multi-part, including client-side js and "UI": https://gist.github.com/Raylight-PWL/495d9ecf835602914dd35e7edadac0e2. Looks like CPU time is somewhere around 5ms. Latency is theoretically close to that of generating a single image. In practice there's a lot of variance, and you'll naturally always be stalling on the slowest of the two inference requests.
"obligatory css hell" had me wheezing, I did a similar approach with createObjectUrl. again thakyou for the help Ray!