Multimodal Messages¶
Blackgeorge supports multimodal messages allowing you to send images, videos, audio, and documents to vision-capable models.
Overview¶
Multimodal messages use the litellm/OpenAI format where content can be either a string or a list of content objects.
Sending Images¶
Using URLs¶
from blackgeorge import Desk, Job, Worker
desk = Desk(model="openrouter/google/gemini-3-flash-preview")
worker = Worker(name="VisionAnalyst")
job = Job(input=[
{"type": "text", "text": "What's in this image?"},
{"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
])
report = desk.run(worker, job)
print(report.content)
Using Local Files¶
from blackgeorge import Desk, Job, Worker, encode_file
image_data = encode_file("./my_photo.jpg")
job = Job(input=[
{"type": "text", "text": "Describe this image"},
{"type": "image_url", "image_url": {"url": image_data}}
])
report = desk.run(worker, job)
Multiple Images¶
job = Job(input=[
{"type": "text", "text": "Compare these two images"},
{"type": "image_url", "image_url": {"url": "https://example.com/before.jpg"}},
{"type": "image_url", "image_url": {"url": "https://example.com/after.jpg"}}
])
Sending Videos¶
job = Job(input=[
{"type": "text", "text": "Summarize this video"},
{"type": "video_url", "video_url": {"url": "https://youtube.com/watch?v=..."}}
])
# For local video files
video_data = encode_file("./demo.mp4")
job = Job(input=[
{"type": "text", "text": "What happens in this video?"},
{"type": "video_url", "video_url": {"url": video_data}}
])
Sending Audio¶
audio_data = encode_file("./recording.mp3")
job = Job(input=[
{"type": "text", "text": "Transcribe this audio"},
{"type": "file", "file": {"file_data": audio_data}}
])
Document Understanding (PDF, DOCX, etc.)¶
pdf_data = encode_file("./contract.pdf")
job = Job(input=[
{"type": "text", "text": "Extract key terms from this contract"},
{"type": "file", "file": {
"file_data": pdf_data,
"filename": "contract.pdf"
}}
])
Supported document types:
- PDF (.pdf)
- Word (.doc, .docx)
- Excel (.xls, .xlsx)
- CSV (.csv)
- HTML (.html)
- Markdown (.md)
- Text (.txt)
Generating Images¶
Blackgeorge includes tools for generating images with OpenRouter, Gemini, and OpenAI-compatible providers:
from blackgeorge import Desk, Job, Worker, generate_image
desk = Desk(model="openrouter/google/gemini-3-flash-preview")
worker = Worker(name="Assistant", tools=[generate_image])
job = Job(input="Please generate an image of a sunset over mountains")
report = desk.run(worker, job)
The generate_image tool accepts:
- prompt (required): Text description of the image
- model (optional): Model to use (default:
openrouter/google/gemini-3-pro-image-preview)
- size (optional): Image size (default: 1024x1024)
- quality (optional): Image quality (default: standard)
Returns a dict with:
- url: URL of the generated image
- b64_json: Base64-encoded image data when returned by the provider
- revised_prompt: The model's revised prompt when supported
For models that generate images through chat completions, Blackgeorge automatically falls back to
completion(..., modalities=["image", "text"]) if the image endpoint returns no image data.
The encode_file() Utility¶
encode_file(file_path, mime_type=None) converts local files to base64 data URLs:
from blackgeorge import encode_file
# Auto-detect MIME type from extension
image_data = encode_file("photo.jpg") # data:image/jpeg;base64,...
pdf_data = encode_file("doc.pdf") # data:application/pdf;base64,...
# Explicit MIME type
data = encode_file("file.bin", mime_type="application/octet-stream")
Supported Models¶
Multimodal input support varies by model:
| Feature | Models |
|---|---|
| Images | openrouter/google/gemini-3-flash-preview, openrouter/google/gemini-3-pro-preview |
| Videos | openrouter/google/gemini-3-flash-preview, openrouter/google/gemini-3-pro-preview |
| Audio | openrouter/google/gemini-3-flash-preview, openrouter/google/gemini-3-pro-preview |
| Documents | openrouter/google/gemini-3-flash-preview, openrouter/google/gemini-3-pro-preview |
| Image Generation | openrouter/google/gemini-3-pro-image-preview |
Examples¶
Vision Analysis with Gemini 3 Flash Preview¶
from blackgeorge import Desk, Job, Worker, encode_file
desk = Desk(model="openrouter/google/gemini-3-flash-preview")
worker = Worker(name="VisionBot", instructions="You analyze images in detail")
photo = encode_file("./product.jpg")
job = Job(input=[
{"type": "text", "text": "Describe this product in detail for a catalog"},
{"type": "image_url", "image_url": {"url": photo}}
])
report = desk.run(worker, job)
print(report.content)
Document Q&A with Gemini 3 Pro Preview¶
from blackgeorge import Desk, Job, Worker, encode_file
desk = Desk(model="openrouter/google/gemini-3-pro-preview")
worker = Worker(name="DocumentAnalyst")
pdf = encode_file("./research_paper.pdf")
job = Job(input=[
{"type": "text", "text": "What are the main findings of this research?"},
{"type": "file", "file": {"file_data": pdf, "filename": "research_paper.pdf"}}
])
report = desk.run(worker, job)
Best Practices¶
- File Size: Large files (>10MB) may cause timeouts or errors
- Model Limits: Check model documentation for maximum image/video duration limits
- Cost: Multimodal requests typically cost more than text-only
- URLs vs Base64: URLs are more efficient for remote files; use base64 for local files
- Error Handling: Always validate that your chosen model supports the media type