Execution Graph
How INDW resolves, dispatches, and executes the canonical merge stage graph.
The execution graph is INDW's single canonical representation of pipeline stages. Every merge run resolves the same graph regardless of backend — workers execute stage pools in parallel, and the apply coordinator merges results in document sequence order.
Graph structure
Documents enter the graph as rows from interleaved source streams. Each document carries a MergeDocumentContext that accumulates stage marks, rejection reasons, quality scores, and dedup hashes through the pipeline.
Stage marks
Each document context tracks which stages have processed it via stage marks:
| Mark | Stage | Purpose |
|---|---|---|
STAGE_FAST_FILTER | Language gate | Early language detection and rejection |
STAGE_DOC_DEDUP | Exact doc dedup | Content hash against seen set |
STAGE_STRUCTURAL_FILTER | Stage0 content filters | Length, entropy, structural checks |
STAGE_METADATA | Ingest enrichment | URL, domain, provenance metadata |
STAGE_ADMISSION | PCI admission | Tier assignment (TIER0, TIER1) |
Stage0 processing in process_stage0_batch runs these marks in order. A rejection at any mark terminates the document with a tier assignment.
Dispatch model
The scheduler dispatches documents in chunks (default 500 per batch) to worker pools:
Worker count and chunk size are resolved from CLI flags, environment, and hardware probes:
MergeDocumentContext
The document context is the carrier object through the graph:
Rejected documents carry _reject_tier and rejection reason. Survivors carry stage0_cleared, quality scores, and dedup hashes for downstream pools.
PipelineRunner
PipelineRunner (indw.schedule.stages.runner) orchestrates stage pool execution. It resolves the active backend, allocates worker pools, and coordinates batch handoff between stage pools without duplicating stage logic per backend.
Parity guarantees
The execution graph is designed for backend parity. Integration tests verify:
- Local vs multiprocess output hash match
- Workers=1 vs workers=N hash match
- Tier admission consistency across backends
Run indw validate to execute the parity suite.