How It Works
The Cognitive
Pipeline
Seven deterministic stages that transform raw documents into validated, evidence-backed, structured intelligence.
Ingestion
Any source. Any format.Recora accepts PDFs, Word documents, audio transcripts, video captions, web content, and raw text. The pipeline begins the moment a document enters the system — no preprocessing required from the user.
Normalization
Clean. Structured. Consistent.Raw input is normalized into a clean, consistent text layer. Formatting artifacts, OCR noise, and irrelevant metadata are stripped. What remains is a reliable foundation for downstream reasoning.
Chunking & Indexing
Semantic segmentation at scale.The normalized text is segmented into semantically coherent chunks — not arbitrary character limits. Each chunk retains its position in the document hierarchy, so context is never severed from meaning.
Evidence Extraction
Every claim. Every source.Recora identifies and extracts direct evidence — quotes, numerical data, clauses, obligations, risks — and anchors each piece to its exact location in the source document. Nothing is inferred at this stage.
Structured IR
Schema-first intermediate representation.All extracted evidence is mapped to a schema-first Intermediate Representation — a typed data model where every fact has a defined structure. This is the core of what makes Recora different from any chat-based AI tool.
LLM Transformation
LLM as engine — not oracle.The LLM is applied only to the structured IR — not to raw text. It acts as a transformation engine that reshapes, summarizes, or analyzes structured data. It cannot hallucinate facts that don't exist in the IR.
Artifact Generation
Outputs that can be trusted.The final output is a structured artifact — a versioned, JSON-backed document with full citation chains linking every assertion back to its source. Artifacts are queryable, diffable, and integrable with your existing systems.
Why the IR Layer Changes Everything
Every AI tool available today passes raw text directly to an LLM and asks it to reason. Recora never does this. The Intermediate Representation (IR) layer means the LLM only ever sees structured, validated data. It cannot invent facts — because the only facts available to it are those already extracted and verified from your source documents.
This is the architectural difference between probabilistic AI and deterministic reasoning infrastructure.