Contracts that read themselves.
MSAs, NDAs, vendor agreements. Pull effective dates, parties, governing law, termination, renewal — into a clause-level table you can query.
Blockdata turns your documents — and the databases around them — into blocks: atomic, cited, queryable units that your agents, your analysts, and your SQL all read the same way.
Documents and databases land in one stack of blocks — each with its source span, its confidence, and a re-runnable record of every decision the pipeline made. Spin out vectors, graphs, schemas, or agents from the same foundation.
PDFs, scans, DOCX, slides, spreadsheets, emails. Layout-aware parsing with tables, headings, and figures preserved. Page-accurate provenance on every block.
Define the columns you care about. We fill them, cite the source, and flag low confidence for review.
Custom taxonomies. Few-shot or zero-shot. Confidence ranked.
A thousand documents, five million blocks, one consolidated stack. Spin out a vector store, a knowledge graph, a Postgres schema, or a Mongo collection — from the same source.
Hand any slice of the stack to an agent that knows it cold. Orchestrated on Kai, our companion platform for agentic work — same auth, same audit trail.
Every job in Blockdata follows the same six-step shape. Stop at any step, inspect outputs, retry with a tweaked prompt or schema, then re-run downstream — without losing what already worked.
Pre-built schemas, pipelines, and review surfaces tuned for the four document worlds we hear about every week.
MSAs, NDAs, vendor agreements. Pull effective dates, parties, governing law, termination, renewal — into a clause-level table you can query.
SEC filings, prospectuses, transcripts. Numerical extraction with page citations. Diff documents across quarters.
Patient intake, prior auth, lab reports. HIPAA-aligned pipelines. Map to FHIR, drop into your EHR.
Adjuster notes, scans, policy docs. Triage at intake. Flag the 4% of cases that need a human.
We had four years of document AI that almost worked. Blockdata is the first tool where our partners actually trust the output, because every field is cited and every run is auditable. We migrated 11 pipelines off our in-house stack in a quarter.
Python, TypeScript, and a REST surface that mirrors the workbench one-to-one. The same primitives your analysts click through, your engineers ship.
Every workbench action — parse, extract, classify, index, query — maps to a single API call. Runs are addressable. Outputs are versioned. Re-running is a one-liner.
from blockdata import Workbench wb = Workbench(project="contracts-q3") # upload, parse, and extract in one run run = wb.pipeline( assets="./msas/*.pdf", schema="msa.v3", classify=["MSA", "NDA"], on_low_confidence="review", ) for doc in run.results: print(doc.fields.counterparty, doc.fields.effective_date, doc.confidence) # every output is cited & re-runnable run.rerun(step="extract", schema="msa.v4")
Your documents stay in your tenant. Your model calls stay on the path you choose. No training on your data, ever.
Start free on the workbench. Hit our API the same day. Bring your enterprise documents when you're ready.