/ Lens Document AI
Turn unstructured PDFs into structured data.
Lens is an AI document intelligence platform we are building that extracts structured data from contracts, invoices, KYC forms, and scanned PDFs — with citations back to the source page and bounding box.
/ Project overview
Turn unstructured PDFs into structured data.
Lens is the studio's document AI product currently in active development. It combines layout-aware OCR, vision-language models, and a structured-output orchestration layer to turn messy real-world documents (scanned contracts, multi-page invoices, KYC forms, financial statements) into clean JSON your systems can actually use. Every extracted field links back to its source page and bounding box for human verification, and the system flags low-confidence fields for review instead of silently making things up. Targeting alpha rollout in Q3 2026.
/ Case study
The story.
Start to ship.
The problem.
Most document-extraction tools either give you raw OCR text (and you write the parsing yourself) or hand you a black-box JSON with no way to verify it. For high-stakes documents — contracts, KYC, financial statements — neither is acceptable. You need extracted fields that link back to where they came from, and you need the system to admit when it's not sure.
How we built it.
Lens is being built as a verification-first extractor. Every field returned carries a page number, a bounding box, and a confidence score. Below a threshold, the field is flagged for human review rather than emitted silently. The pipeline runs layout-aware OCR first, then a vision-language model with structured-output prompting, then a validator that cross-checks extracted values against schema constraints (date formats, totals matching line items, etc.).
What shipped.
Currently in active build. Internal benchmarks on a 200-document contract corpus are setting the bar before alpha. We're targeting Q3 2026 for a closed alpha with two design partners; expressions of interest welcome.
/ The work
What we built.
Key features
- Layout-aware OCR for scans and digital PDFs
- Vision-language model extraction with structured output
- Source-citation: every field links to page + bounding box
- Confidence scoring with human-review flagging
- Schema templates for contracts, invoices, KYC, financials
- Webhook-driven pipeline for high-volume processing
Tech stack
Let’s build
You have an idea.
We’ll have it shipping by next month.
Book a free 30-minute call. No pitch, no pressure. You’ll walk away with a scope, a timeline, and a clear number — or an honest referral.
Reply time: under 12 hours · New Delhi, India · Serving globally