April 24, 2026 · 7 min read

Scanned bank statements to Excel — without the silent OCR errors

If the statement you need to convert came from a scanner instead of a bank portal, the failure mode isn't the OCR — it's the missing reconciliation check afterward. Here's how we handle scans, and why the running-balance check is the thing that makes it safe.

If you're a forensic accountant, a divorce attorney, or a bookkeeper with a retail client, a big chunk of the bank statements that cross your desk don't come from the bank's customer portal. They come from a fax machine, a disclosure packet, a manila folder of old records, or — most commonly — the opposing side's discovery response, in which someone printed every statement, scanned them, and emailed you a PDF. That PDF has no text layer. You can't copy the numbers. Your usual converter rejects it. You re-type 180 rows by hand into Excel and hope you didn't flip a sign.

This has been the most common single piece of feedback we've received on pdftoexcel since launch: "What about scans?" The short answer is that scans are a supported path, not a rejected one — they route through a vision-model OCR pass and get the same running-balance reconciliation that digital PDFs get. The longer answer is more interesting, because the thing that makes OCR on bank statements safe isn't the OCR itself.

OCR on bank statements has a specific failure mode

Generic OCR — Tesseract, Textract, Google Document AI — is very good at turning an image of text into a string of text. It's less good at preserving layout: which column a number belongs to, which amount is a debit vs. a credit, which row continues onto the next line. On a bank statement those layout facts are the data. An OCR that correctly reads every character but places the amount "1,234.56" in the withdrawal column instead of the deposit column has given you a spreadsheet in which every total is wrong in a way no amount of proofreading will catch.

Vision-capable LLMs — Gemini 2.5 Flash, Claude 4 Sonnet, GPT-4o — handle the layout problem natively because they see the whole page as an image and reason about it structurally. "This number is under the Withdrawals header, so it's a withdrawal" is the kind of inference a vision LLM gets right out of the box where a column-based OCR pipeline gets it wrong half the time on split-block layouts like Bank of America's. Our pipeline uses Gemini Flash for scans specifically because the model preserves the semantic structure the OCR would have flattened.

But that's just a better OCR. The problem isn't solved. Vision LLMs still hallucinate rows. They still mis-read amounts that are clipped at the page margin, or OCR a "1" as an "l" and turn a $100 withdrawal into an unparseable symbol. The new failure mode replaces column-assignment errors with occasional transcription errors, and in both cases the output is a plausible-looking spreadsheet that's wrong.

The running balance is what makes scans safe

The reconciliation check we run on every export — sum of transaction amounts must equal ending balance minus beginning balance, to within a penny — is the one integrity check that survives any OCR pipeline. It doesn't care whether the rows came from pdfplumber, Tesseract, Gemini, or hand-entered by a summer intern. If they sum to the right number, the extraction is arithmetically consistent with what the bank printed on the first page; if they don't, something is off by exactly that much and we show you which row is closest to the delta.

This is why we can ship scan support without adding a second risk surface for our customers. The running-balance check is the same invariant we enforce on digital PDFs, and it's the thing that separates "good OCR" from "OCR you'd stand behind in court." We verified this end-to-end on a real 4-page Chase scan the day before launch: 62 transactions extracted, beginning $310.45, ending $12,274.35, reconciled to the penny, Verified badge, twenty-five-second round-trip. The scan path and the digital path converge on the same guarantee.

What we won't do

Vision OCR costs roughly twice what text-mode extraction does — the input token count for a page-as-image is higher than for a page of already-extracted text. We don't route digital PDFs through vision "just to be safe," because that doubles the cost with no accuracy gain (digital PDFs have the exact text right there, no OCR pass required). We detect whether a PDF has a usable text layer and pick the cheapest path that produces a reconcilable result. Scans go to vision; digital PDFs go to pdfplumber first, then to a text-mode LLM if the fast-path doesn't recognise the bank. You always see which path was used.

We also don't ship scan support for files that exceed the inline-data limit of the underlying models — around 20 MB for Gemini, 24 MB for Claude. A high-resolution multi-hundred-page scan can blow past that cap quickly. When it does, the file lands in our manual review queue: a human on our side converts it by hand, runs the same reconciliation, and emails you the result. Sign up for a free account, re-upload, and you'll get an email within a few hours with the converted file. Same Verified badge, same guarantee, slower turn-around.

The shape of the check, not the shape of the file

The thing to internalise, whether you're building this yourself or evaluating a converter to buy, is that the file format is not where the safety comes from. A digital PDF parsed with pdfplumber can produce a garbage spreadsheet; a scanned PDF run through a vision model can produce a perfect one. The difference is whether there's a check at the end that says the numbers add up. Ship the check. Make the check cheap. Make the check loud when it fails. Scans will take care of themselves.

Convert a scanned statement — 15 free pages, no card

Other guides