Scanned bank statements

Scanned statement to Excel — verified, not silently OCR'd.

Drop a printout-of-a-printout PDF, a phone-camera shot of a statement, or a 200-page scan of a discovery bundle. Get back Excel, CSV, or QBO with a Verified badge — meaning Σ(transactions) equals (ending − beginning) within a penny. If the OCR misread a digit, we tell you which row to look at, instead of shipping a spreadsheet that quietly doesn't add up.

Free for 15 pages a month, no card required.

Why scanned statements are the hard case

A digital PDF has a text layer — pdfplumber reads it directly, no model needed, fast and free. A scanned PDF is just an image. To extract transactions from it you have to run OCR, and OCR is where silent errors slip in: a smudged 8 reads as a 3, a column boundary drifts by a pixel, a row is missed entirely on a page break. The output spreadsheet looks fine until someone tries to reconcile it against the statement summary three weeks later and the totals are off by $4 271.

Most converters ship that spreadsheet anyway, with no indication anything went wrong. ChatGPT and other generic vision tools have no architectural check that forces the rows to sum to the statement's printed delta — they confidently output whatever they happened to read. The only way to catch an OCR mis-read at extraction time is to verify the arithmetic against the source-of-truth numbers the statement already prints on page one.

How pdftoexcel handles a scan

  1. 1. Detect. On upload we check whether the PDF has a text layer (digital) or is image-only (scanned). Below ~50 characters per page = treated as scanned.
  2. 2. Vision OCR. Scanned pages route through Gemini 2.5 Flash with a structured-output prompt that returns transactions as JSON. Multi-page batching keeps the cost down.
  3. 3. Reconcile every row. The extracted rows are summed and compared to the statement's printed beginning and ending balance. The running-balance column, where present, is independently recomputed and matched against the printed values.
  4. 4. Escalate or flag. If reconciliation fails on Gemini, the same prompt re-runs on Claude Haiku 4.5 with a hint about the previous mismatch. Still failing — Claude Sonnet 4.5 takes a final pass. If even that can't close the math, the export ships with an Unverified flag pointing at the first divergent row, so you can fix it in the in-browser preview and re-export.
  5. 5. Export. XLSX, CSV, or QBO — same outputs, same reconciliation guarantee as digital PDFs.

What a Verified scan tells you (and what it doesn't)

Verified means one specific arithmetic check: the sum of the extracted transaction amounts equals the difference between the statement's printed beginning and ending balance, to within one cent. That is a strong signal — the OCR didn't silently lose a row, didn't mis-read a critical digit on a major transaction, and didn't hallucinate one. It is not a certification of the statement's authenticity, and it doesn't guarantee that every description string is letter-perfect (Description columns are robust to OCR noise in a way that amount columns aren't — and you typically edit descriptions in your downstream tool anyway).

Unverified means we are telling you, honestly, which row the arithmetic broke at. That is more useful than a converter that ships a quietly-wrong spreadsheet.

We wrote this distinction up at length in Scanned bank statements to Excel without the silent OCR errors. The full reconciliation logic is on the security page.

What works, what to be careful with

Works well

  • Office-scanner output at 200–300 DPI in black-and-white.
  • Photographed pages from a phone, as long as the page is reasonably flat and the lighting is uniform.
  • Multi-page scans, including discovery-style bundles where each statement is concatenated.
  • Password-protected scans — the uploader prompts for a password.
  • Single-account, multi-month statements where the running balance carries across pages.

Be careful with

  • Heavily-skewed scans (more than ~5° rotation): straighten in Preview / Acrobat first if the result comes back Unverified.
  • Highlighter ink directly over amount columns: the OCR is robust to colored highlight in description columns but can lose digits when the amount column itself is overprinted.
  • Combined statements where multiple accounts share one PDF: extraction works, but you may want to split by account before importing into QuickBooks.
  • Scans of statements that don't print a beginning and ending balance (rare — almost every bank does). Without the header summary, the reconciliation check has nothing to compare against, and the export ships with status = skipped.

Confidentiality and retention

Scanned PDFs go through the same pipeline as digital ones: uploaded over HTTPS to a private Supabase Storage bucket, downloaded into the parser worker over a 5-minute signed URL, held in memory only for the parse, deleted on successful conversion. Page images are sent to the LLM provider only when the deterministic fast-path can't handle the file. Both Anthropic (Claude) and Google (Gemini paid tier) contractually exclude API inputs from model training under their commercial terms. Full lifecycle on the security page; sub-processor list at /legal/subprocessors.

FAQ

What DPI / quality do I need?
200 DPI black-and-white is a reasonable floor; 300 DPI gives more headroom on small fonts. Phone-camera shots work as long as the whole page is in frame and the lighting is uniform. Below ~150 DPI the reconciliation pass-rate drops noticeably.
Can I convert a screenshot of a statement?
If the screenshot is a single image of a single page — yes, save as PDF and upload. Multi-page screenshots stitched into a PDF also work. Cropped screenshots that don't include the beginning and ending balance can still be parsed but won't earn a Verified badge, because the reconciliation check has no reference numbers.
What about handwritten annotations on the scan?
Margins and handwritten notes are usually ignored. If a handwritten mark crosses an amount column, that row may fail the reconciliation check and be flagged for inline edit.
How is this different from running OCR myself in Acrobat?
Acrobat OCR turns the image into text but does not structure transactions and does not verify the arithmetic. You'd still need a parser plus a manual reconciliation against the statement summary. That's the entire workflow we replace.

Convert a scanned statement now.

Free for 15 pages a month. Drop your hardest scanned PDF and see whether it earns a Verified badge — or, more usefully, where it gets flagged.