Public API#
Top-level exports are defined in sourcery/__init__.py.
Core Functions#
extract(request, engine=None) -> ExtractResult
aextract(request, engine=None) -> ExtractResult
extract_from_sources(sources, *, task, runtime, options=None, engine=None) -> ExtractResult
aextract_from_sources(sources, *, task, runtime, options=None, engine=None) -> ExtractResult
Runtime Entry Points#
SourceryEngine.extract(request) -> ExtractResult
SourceryEngine.aextract(request) -> ExtractResult
SourceryEngine.replay_run(request, raw_run_id) -> tuple[dict[str, object] | None, list[EventRecord]]
Top-Level Contract Exports (import sourcery)#
EntitySpec,EntitySchemaSetExtractionTask,ExtractionExample,ExampleExtractionExtractRequest,ExtractOptions,ExtractResultRuntimeConfig,RetryPolicy,SessionRefinementConfig,ReconciliationConfigAlignedExtraction,CanonicalClaim,DocumentResult,DocumentReconciliationReportExtractionRunTrace,RunMetrics,SourceDocument
Ingestion Exports#
From sourcery.ingest:
load_source_document(...)load_source_documents(...)load_pdf_document(...)load_html_document(...)load_url_document(...)load_ocr_image_document(...)
Top-level shortcut (import sourcery) includes only:
load_source_document(...)load_source_documents(...)
IO Exports#
From sourcery.io:
save_extract_result_jsonl(...)iter_document_rows(...)load_document_results_jsonl(...)render_document_html(...)write_document_html(...)render_reviewer_html(...)write_reviewer_html(...)visualize(...)
Top-level shortcut (import sourcery) includes:
write_reviewer_html(...)
Convenience Example#
import sourcery
from sourcery.contracts import ExtractionTask, RuntimeConfig
result = sourcery.extract_from_sources(
["sample.pdf", "https://example.com/article"],
task=ExtractionTask(...),
runtime=RuntimeConfig(model="deepseek/deepseek-chat"),
)