RunPod•3mo ago

Format of video input for vLLM model LLaVA-NeXT-Video-7B-hf

Dear Discord members, I have a question about using the vLLM template with the HuggingFace LLaVA-NeXT-Video-7B-hf model on text+video multi-modal input. Video input is a fairly new feature in the vLLM library and I do not seem to find definitive information on how I should encode the input video so that the running model instance decodes it into the format it understands. The online vLLM AI chatbot suggested a vector of JPEG-encoded video frames but that did not work. The vLLM GitHub gave me the impression that a NumPy array is the right solution but this does not work either. Has anyone had success in using this (or a similar) setup? Thank you in advance, Ferenc

1 Reply

nerdylive•3mo ago

use the openai client maybe

video = cv2.VideoCapture("data/bison.mp4")

base64Frames = []
while video.isOpened():
    success, frame = video.read()
    if not success:
        break
    _, buffer = cv2.imencode(".jpg", frame)
    base64Frames.append(base64.b64encode(buffer).decode("utf-8"))

video.release()
print(len(base64Frames), "frames read.")

video = cv2.VideoCapture("data/bison.mp4")

base64Frames = []
while video.isOpened():
    success, frame = video.read()
    if not success:
        break
    _, buffer = cv2.imencode(".jpg", frame)
    base64Frames.append(base64.b64encode(buffer).decode("utf-8"))

video.release()
print(len(base64Frames), "frames read.")

import cv2  # We're using OpenCV to read video, to install !pip install opencv-python
import base64
import time
from openai import OpenAI
import os
import requests

import cv2  # We're using OpenCV to read video, to install !pip install opencv-python
import base64
import time
from openai import OpenAI
import os
import requests

OR how did you try it? & how did you generate responses?

Gaming

Programming

Format of video input for vLLM model LLaVA-NeXT-Video-7B-hf

Did you find this page helpful?