Skip to content

Adapters

Adapters define how Blackgeorge talks to a model provider.

BaseModelAdapter

BaseModelAdapter defines the interface for model calls.

  • complete(...): synchronous completion
  • acomplete(...): async completion
  • structured_complete(...): structured output completion
  • astructured_complete(...): async structured output completion

Both methods accept OpenAI-style message payloads and optional tool schemas. Blackgeorge uses the async adapter methods internally for both sync and async runs, so custom adapters should implement acomplete and astructured_complete.

LiteLLMAdapter

LiteLLMAdapter is the default adapter. It uses LiteLLM to call models with OpenAI-compatible inputs.

Key behaviors:

  • calls litellm.completion(...) for synchronous requests
  • calls litellm.acompletion(...) for async requests
  • passes messages and optional model parameters (temperature, max_tokens, thinking, extra_body)
  • only sends tools and tool_choice when tools are present
  • supports streaming when requested
  • enables parallel_tool_calls when model metadata indicates supports_parallel_function_calling
  • for streaming calls, emits llm.completed on stream exhaustion/close and llm.failed if stream iteration raises

Runtime lifecycle hardening

Blackgeorge configures LiteLLM runtime lifecycle once when LiteLLMAdapter is initialized:

  • it applies deterministic shutdown cleanup for async LiteLLM clients
  • it patches LiteLLM logging-worker enqueue behavior to close dropped coroutines safely
  • it avoids registering LiteLLM lazy async cleanup in a way that can emit shutdown warnings

This hardening was added to prevent process-exit warnings observed in real integrations, including DeepSeek (deepseek/deepseek-chat):

  • RuntimeWarning: coroutine 'close_litellm_async_clients' was never awaited
  • RuntimeWarning: coroutine 'Logging.async_success_handler' was never awaited

Tool calls are parsed from the response and mapped into ToolCall objects. Malformed tool payloads are preserved with ToolCall.error metadata instead of crashing adapter parsing.

Structured output pipeline

Structured output uses LiteLLM JSON schema response formats when possible and falls back to Instructor with LiteLLM. Blackgeorge initializes Instructor clients with:

  • instructor.from_provider("litellm/<model>")
  • instructor.from_provider("litellm/<model>", async_client=True)

If the LiteLLM structured response fails or is unavailable, the worker calls chat.completions.create(..., response_model=YourModel) and returns the validated Pydantic object as Report.data. Structured output retries are clamped to a minimum of 3 attempts for resilience (retries=0 still performs 3 retries after the first failed attempt).

Adapter hooks for structured output

If your adapter implements structured_complete/astructured_complete, the worker will call those hooks for response-schema jobs. This lets you route structured output through non-LiteLLM providers or custom pipelines. If the hooks are not implemented, the worker falls back to the LiteLLM + Instructor path.

Cost tracking

The LiteLLM adapter provides cost tracking through callback events. When using LiteLLMAdapter, the following events are automatically emitted:

LLM events

Subscribe to these events to track costs and usage:

from blackgeorge import Desk, Worker, Job

desk = Desk(model="openai/gpt-5-nano")

# Track LLM costs
def on_llm_completed(event):
    payload = event.payload
    print(f"Model: {payload['model']}")
    print(f"Cost: ${payload.get('cost', 0):.6f}")
    print(f"Tokens: {payload['total_tokens']}")
    print(f"Latency: {payload['latency_ms']}ms")

desk.event_bus.subscribe("llm.completed", on_llm_completed)

worker = Worker(name="assistant")
report = desk.run(worker, Job(input="Hello"))

Event payloads

llm.started: - model: Model name being called - messages_count: Number of messages in the request - tools_count: Number of tools available

llm.completed: - model: Model name - latency_ms: Request latency in milliseconds - prompt_tokens: Number of prompt tokens - completion_tokens: Number of completion tokens - total_tokens: Total token count - cost: Estimated cost in USD (if available from LiteLLM)

llm.failed: - model: Model name - latency_ms: Request latency before failure - error_type: Exception class name - error_message: Exception message

Cost utilities

The blackgeorge.adapters.cost module provides utilities for cost calculation:

from blackgeorge.adapters.cost import calculate_cost, get_model_pricing

# Calculate cost from a response
cost = calculate_cost(response)

# Get pricing info for a model
pricing = get_model_pricing("openai/gpt-5-nano")

Custom adapters

To use another provider, implement BaseModelAdapter and pass it to Desk(adapter=...).