System Overview#
Sourcery is structured as replaceable black boxes around typed contracts.
Primary Primitives#
Defined in sourcery/contracts/models.py:
SourceDocumentTextChunkExtractionCandidateAlignedExtractionDocumentResultCanonicalClaimExtractRequestExtractResult
These primitives are the stable interface. Internal implementation can change without breaking user code if these contracts remain consistent.
Module Boundaries#
sourcery/contracts: request/result/runtime contracts.sourcery/pipeline: deterministic chunking, prompt compilation, alignment, merge.sourcery/runtime: model invocation orchestration and retries.sourcery/ingest: source normalization intoSourceDocument.sourcery/io: JSONL persistence and HTML review surfaces.sourcery/observability: run trace and event collection.sourcery/benchmarks: benchmark CLI and comparative tooling.
Execution Flow#
- Validate
ExtractRequestand task examples. - Normalize input documents.
- Plan chunks per extraction pass.
- Execute runtime batch for chunks.
- Align candidates to source spans.
- Merge non-overlapping resolved extractions.
- Optionally reconcile canonical claims.
- Emit
ExtractResultwith metrics, warnings, and run trace.
Determinism Notes#
Determinism is strongest in pipeline logic (chunking, aligner, merger).
Runtime behavior may vary with provider/model behavior, but deterministic options plus strict examples reduce drift.