400 Errors with allenai-olmocr on Serverless SGLang - Need Payload Help!

I'm trying to deploy the allenai/olmOCR-7B-0225-preview model (fintuned Qwen/Qwen2-VL-7B model) RunPod using the Serverless SGLang endpoint template, but I'm consistently getting 400 Bad Request errors when sending requests. running on L40S. I'm trying to send PDF documents for OCR, and I hope the issue is with the input payload. I've tried various common input formats based on the RunPod documentation and examples, but no luck so far. I've tried sending as a pdf file & page number as well as what I originally tried (pdf anchor text and image). in the code below, I am using the retrieved https://molmo.allenai.org/paper.pdf I'm using the allenai-olmocr model (Hugging Face link: https://huggingface.co/allenai/olmOCR-7B-0225-preview), deployed as a Serverless SGLang endpoint on RunPod. I deployed it the lazy way, providing huggingface handle and mostly default settings, and am wondering if I need to set up a handler and deploy using docker to get to work? I've checked the RunPod documentation for Serverless requests (https://docs.runpod.io/serverless/overview) and the olmOCR documentation and examples (https://github.com/allenai/olmocr), but I'm still struggling to get the input payload correct. I've mainly only tried sending base64, I experimented with s3 url but didn't seem to make a difference. Could someone please help me understand the exact JSON payload format expected by RunPod Serverless SGLang for a multimodal model like allenai/olmocr? Specifically, I'm unsure about the correct structure for including both text (anchor text) and image data in the request, and can't find any clear answers. I was able to get this to work on replicate (using an existing setup https://github.com/lucataco/cog-olmocr), but it was very much point and click setup. Replicate was pretty expensive & slow. I've checked the RunPod logs and they indicate a "Bad Request", pointing to an issue with the input data format. Any guidance or example payloads would be greatly appreciated!
1 Reply
danomatic0117
danomatic0117OP2mo ago
import requests import dotenv import os import base64 from io import BytesIO from PIL import Image from olmocr.data.renderpdf import render_pdf_to_base64png from olmocr.prompts.anchor import get_anchor_text import json # Import json module import time dotenv.load_dotenv() RUNPOD_API_KEY = os.getenv('RUNPOD_API_KEY') RUNPOD_ENDPOINT_ID = os.getenv('RUNPOD_ENDPOINT_ID') if not RUNPOD_API_KEY or not RUNPOD_ENDPOINT_ID: print("Error: RUNPOD_API_KEY or RUNPOD_ENDPOINT_ID not found in .env file.") exit() headers = { 'Content-Type': 'application/json', 'Authorization': f'Bearer {RUNPOD_API_KEY}' } pdf_file_path = 'paper.pdf' # Replace with your PDF file or keep 'paper.pdf' in the same directory if not os.path.exists(pdf_file_path): import urllib.request urllib.request.urlretrieve("https://molmo.allenai.org/paper.pdf", pdf_file_path) image_base64 = render_pdf_to_base64png(pdf_file_path, 1, target_longest_image_dim=1024) anchor_text = get_anchor_text(pdf_file_path, 1, pdf_engine="pdfreport", target_length=4000) print("Anchor Text:", anchor_text) data = { "input": { # Keep the "input" wrapper as per RunPod documentation "messages": [ # Use "messages" as expected by SGLang Chat Completion API { "role": "user", "content": [ {"type": "text", "text": anchor_text}, # Anchor text as text content { "type": "image_url", # Use "image_url" type for image data "image_url": {"url": f"data:image/png;base64,{image_base64}"} # Data URL format for base64 image }, ], } ], "model": "allenai/olmocr", # Model name - important to include this for SGLang "temperature": 0.8, "max_new_tokens": 50, "num_return_sequences": 1, "do_sample": True, } } endpoint_url = f'https://api.runpod.ai/v2/{RUNPOD_ENDPOINT_ID}/runsync' # Use /runsync for synchronous endpoint response = requests.post(endpoint_url, headers=headers, json=data) print("Status Code:", response.status_code) print("Response Body:", response.json()) if response.status_code != 200: print("\nRequest submission failed. Check the 'error' in the response body and RunPod logs for more details.") else: print("\nRequest submitted successfully! Output:") print(json.dumps(response.json(), indent=2)) # Directly print the output in readable JSON format

Did you find this page helpful?