Build with InstantAI
Everything you need to integrate, extend, and ship with InstantAI's open-source data infrastructure.
Now open source
Data Workflow — the engine behind InstantAI pipelines
A production-grade, distributed data processing framework. Built for scale — with a single canonical execution graph, PCI-tier admission, semantic deduplication, and low-latency output. Now available to everyone under Apache 2.0.
Quickstart
Install the SDK, set up your environment, and ship your first pipeline in under five minutes.
Architecture
Learn how the execution graph, scheduler, and storage layers coordinate every pipeline run end-to-end.
Guides
Practical walkthroughs for data cleaning, deduplication, semantic filtering, and distributed jobs.
Configuration
Control datasets, output targets, environment overrides, and advanced pipeline settings.
CLI
Run pipelines, validate configs, inspect stage outputs, and manage jobs from the terminal.
API Reference
Full SDK documentation — every class, method, and type you need to extend or embed the framework.