New Workbench v6.7 — pipelines for the agentic era

Turn every document
into trusted, structured data.

Blockdata turns your documents — and the databases around them — into blocks: atomic, cited, queryable units that your agents, your analysts, and your SQL all read the same way.

SOC 2 Type II HIPAA EU data residency
Blockdata/contracts-q3/Assets
35 ASSETS
Assets
Runs
Schemas
Preview
Name Status Pages Conf.
PDF msa-acme-2026-final.pdf Indexed 24 0.98
DOC redline-vendor-agreement.docx Running 18
PDF insurance-claim-7842.pdf Indexed 9 0.94
IMG patient-intake-scan-04.tiff Indexed 3 0.91
PDF sec-10k-q3-2025.pdf Queued 142
MD deal-memo-blockdata.md Indexed 6 0.99
TXT earnings-transcript-q2.txt Indexed 11 0.96
PDF compliance-policy-v3.pdf Indexed 42 0.95
1 selected Parse Extract Classify Index ~78 cr
Trusted by document-heavy teams
Lattice Legal
Northwind
Helvetia
Caplan Bank
⟁ Orion Health
Meridian
Platform

A workbench, not a black box. Every block inspectable.

Documents and databases land in one stack of blocks — each with its source span, its confidence, and a re-runnable record of every decision the pipeline made. Spin out vectors, graphs, schemas, or agents from the same foundation.

01Parse

From any document to the same clean schema.

PDFs, scans, DOCX, slides, spreadsheets, emails. Layout-aware parsing with tables, headings, and figures preserved. Page-accurate provenance on every block.

parse
{ "title": "MSA", "effective": "2026‑03‑14", "parties": [ "Acme Inc.", "Blockdata" ], "pages": 24 }
02Extract

Structured fields, on your schema.

Define the columns you care about. We fill them, cite the source, and flag low confidence for review.

effective_date 2026‑03‑14
counterparty Acme Inc.
termination 90 days
03Classify

Route each document to the right pipeline.

Custom taxonomies. Few-shot or zero-shot. Confidence ranked.

MSA 0.96
SOW
NDA
Insurance
Filing
04Stack

One stack of blocks. Every shape you need.

A thousand documents, five million blocks, one consolidated stack. Spin out a vector store, a knowledge graph, a Postgres schema, or a Mongo collection — from the same source.

stack · 5.2M blocks
VECTORpgvector · pinecone
GRAPHknowledge graph
SQLpostgres schema
DOCmongodb
Pipeline

One canvas. Six honest steps.

Every job in Blockdata follows the same six-step shape. Stop at any step, inspect outputs, retry with a tweaked prompt or schema, then re-run downstream — without losing what already worked.

01
Ingest
Folders, buckets, SharePoint, S3, GCS — plus Postgres, Mongo, and warehouse connectors.
+ 1,284 files queued + postgres://prod/contracts + gcs://contracts/2026
02
Parse
Layout-aware. Tables, figures, footnotes, signatures.
blocks: 1,284 → 18,712 tables: 412 avg p99: 4.2s
03
Classify
Route to MSA / NDA / claim / filing pipelines.
MSA: 412 · NDA: 188 claim: 642 · filing: 42 unsure: 12 → review
04
Extract
Your schema. Cited fields. Confidence on every value.
fields: 24 · cited: 100% avg conf: 0.94 flagged: 38 → review
05
Stack
5.2M blocks consolidated. Output to vector, graph, SQL, or Mongo.
→ pgvector · 312k → kg edges · 89k → postgres · msa.v3
06
Agents
Ship answers via API, dashboard, or a specialist agent on Kai.
/v1/query · /v1/stack kai://agents/contract p50 latency: 380ms
Solutions

Built for teams where documents are the work.

Pre-built schemas, pipelines, and review surfaces tuned for the four document worlds we hear about every week.

Legal

Contracts that read themselves.

MSAs, NDAs, vendor agreements. Pull effective dates, parties, governing law, termination, renewal — into a clause-level table you can query.

MSANDAredlines
Finance

10-Ks, term sheets, every footnote.

SEC filings, prospectuses, transcripts. Numerical extraction with page citations. Diff documents across quarters.

10-Kmemosterm sheets
Healthcare

Records to chart, with provenance.

Patient intake, prior auth, lab reports. HIPAA-aligned pipelines. Map to FHIR, drop into your EHR.

HIPAAFHIREHR
Insurance

Claims, faster — without the misses.

Adjuster notes, scans, policy docs. Triage at intake. Flag the 4% of cases that need a human.

claimspolicytriage
218M
documents parsed by Blockdata customers since launch.
0.96
median extraction confidence across legal, finance, healthcare.
11×
faster than the average in-house parse + extract stack.
380ms
p50 retrieval latency for agentic Q&A at scale.
Eleanor Chen
VP Engineering, Lattice Legal
$2.4M
saved in first-year review hours, across 84 attorneys.
We had four years of document AI that almost worked. Blockdata is the first tool where our partners actually trust the output, because every field is cited and every run is auditable. We migrated 11 pipelines off our in-house stack in a quarter.
Developers

Two SDKs, one truthful API. No surprises.

Python, TypeScript, and a REST surface that mirrors the workbench one-to-one. The same primitives your analysts click through, your engineers ship.

Build on the same primitives your team clicks.

Every workbench action — parse, extract, classify, index, query — maps to a single API call. Runs are addressable. Outputs are versioned. Re-running is a one-liner.

Read the docs View on GitHub
python
typescript
cURL
from blockdata import Workbench

wb = Workbench(project="contracts-q3")

# upload, parse, and extract in one run
run = wb.pipeline(
    assets="./msas/*.pdf",
    schema="msa.v3",
    classify=["MSA", "NDA"],
    on_low_confidence="review",
)

for doc in run.results:
    print(doc.fields.counterparty,
          doc.fields.effective_date,
          doc.confidence)

# every output is cited & re-runnable
run.rerun(step="extract", schema="msa.v4")
Security

The boring questions, answered up front.

Your documents stay in your tenant. Your model calls stay on the path you choose. No training on your data, ever.

ComplianceSOC 2 II
ComplianceHIPAA
ComplianceISO 27001
ResidencyUS · EU · UK
EncryptionAES‑256
DeploySelf‑host
ModelsBYO keys
AuditFull run log
Ready when you are

Ship document work you can defend.

Start free on the workbench. Hit our API the same day. Bring your enterprise documents when you're ready.

Tweaks close
Theme
Accent
Density
Headline