Security

You upload bank statements that belong to your clients. Here is exactly what happens to them, who can see them, and how long they live.

TL;DR

  • HTTPS in transit, AES-256 at rest, in private object storage.
  • Source PDFs are discarded immediately after a successful automatic conversion. Extracted rows are kept 24 hours by default; longer if you pin them on a paid plan.
  • No LLM provider trains on your data. Both Anthropic (Claude API, Commercial Terms) and Google (Gemini API, paid tier) contractually exclude API customer data from model training.
  • Sub-processor list is published at /legal/subprocessors. A Data Processing Addendum is available at /legal/dpa.
  • SOC 2 Type II is on the roadmap when MRR clears $5K. We are honest about not having the badge yet.

What happens to a file you upload

  1. The file is uploaded over HTTPS to a private Supabase Storage bucket. The bucket is closed to the public internet — access requires a signed URL with a TTL of 5 minutes or less.
  2. The Python parser worker (Fly.io) downloads the PDF via that short-lived signed URL into memory.
  3. The fast-path parser (pdfplumber) attempts to extract rows locally — no third-party network call, no LLM. About 70% of uploads finish here.
  4. If the fast path can't reconcile the result, page images are sent to Google Gemini 2.5 Flash for vision extraction. If Gemini also can't reconcile, the request escalates to Anthropic Claude (Haiku 4.5, then Sonnet 4.5 as a last resort). Both providers contract against training on API inputs and outputs — see "LLM providers" below.
  5. The extracted JSON is stored in our database. The source PDF is deleted from storage as soon as the extraction is reconciled and the result file is generated.
  6. The reconciled rows are retained for 24 hours by default so you can re-download. Paid plans extend this to 30–90 days. You can delete any document manually from the dashboard at any time.

Encryption

All traffic between your browser, our application, the Python worker, and the LLM providers is encrypted in transit using TLS 1.2 or higher.

Files at rest in Supabase Storage are encrypted with AES-256. Database rows in Supabase Postgres are encrypted at rest with AES-256. Backups are encrypted with the same key management.

Access control

  • Every user-scoped table is gated by Postgres Row-Level Security. The policy is auth.uid() = user_id — your account can only read its own rows, regardless of which endpoint asks.
  • The storage bucket containing source PDFs is private. Access is gated by signed URLs with a 5-minute TTL, scoped to a single path.
  • Production database and infrastructure access is limited to Matrizexplícita Lda staff with a documented operational reason. Today that is one founder. SSO with hardware-backed MFA is required for every console.
  • Application logs do not contain PDF content, transaction descriptions, account numbers, or routing numbers. Only metadata (document id, page count, parse status, parse cost) is logged.

Manual review queue

When the automated pipeline can't reconcile a statement on its own (rate-limited model, unusual layout, unsupported language), the file is routed to a manual-review queue so an authorized engineer of Matrizexplícita Lda can reconcile it by hand. In that case the source PDF is held in the same private encrypted bucket so the reviewer can open it. You always receive an email when your upload enters the queue and another when it's delivered or declined. The PDF and any admin-produced result are deleted 30 days after delivery or decline. Reviewers operate under a written confidentiality obligation; the file is never used for training, demos, or analytics.

LLM providers — no training, no retention beyond abuse review

Two model providers handle the fallback path. Both contractually exclude API inputs and outputs from model training under the commercial terms we operate under:

  • Anthropic (Claude API). Under the Commercial Terms, Anthropic does not use prompts or completions submitted via the API to train Claude. Inputs and outputs may be retained for a limited window for safety and abuse review, then deleted. See Anthropic's published policy.
  • Google (Gemini API, paid tier). Google does not use prompts, system instructions, cached content, files, or responses from paid-tier Gemini API customers to improve Google products. Logs are retained for a limited window for abuse and policy enforcement. See Google's Gemini API Terms. pdftoexcel is on the paid tier.

Provider policies can change. We pin a dated reading of these terms at every quarterly review and update this page if the posture changes.

Sub-processors

The full list of sub-processors — what each one does, what data it sees, where it runs, and which compliance posture it carries — is published at /legal/subprocessors. We notify customers of new sub-processors at least 30 days before the new processor is engaged, with a right to object.

Data location and international transfers

Application hosting (Vercel) and the Python worker (Fly.io) run in US regions. The Supabase Postgres database and storage bucket run in the EU (Frankfurt). Product analytics (PostHog) runs in the EU cloud. Some sub-processors process data in the US — those transfers rely on the EU Standard Contractual Clauses (Commission Decision 2021/914) as supplemented by transfer impact assessments, and on the EU–US Data Privacy Framework where the sub-processor is certified. The transfer mechanism per sub-processor is listed at /legal/subprocessors.

Retention and deletion

  • Source PDF: deleted immediately on a successful automatic conversion. Held up to 30 days only when the file enters the manual-review queue.
  • Extracted rows: 24 hours on the free tier; 30 days on Starter; 90 days on Professional and Business; configurable on Enterprise.
  • Account metadata: kept for the life of the account. Deleted within 30 days of account closure, except where we must retain billing records for tax-law compliance (typically 10 years under Portuguese law, limited to invoice line items — never PDF content).
  • On-demand deletion: you can delete any document manually from the dashboard. Email hello@bankpdftoxls.com for full account erasure or for a deletion certificate.

Verified ≠ legal or accounting certification

The Verified badge means one specific thing: the sum of extracted transaction amounts equals the difference between the statement's printed beginning and ending balance, to within one cent. It is an arithmetic reconciliation, not a legal, accounting, or forensic certification of the statement's authenticity, of the extracted descriptions, or of the categorization. You remain responsible for reviewing the output before relying on it for audit, court, tax, regulatory, or underwriting purposes. This is also stated in our Terms of Service.

Vulnerability disclosure

If you believe you have found a security vulnerability, email security@bankpdftoxls.com with reproduction steps. We acknowledge reports within two business days. Please do not publicly disclose until we have had a reasonable window (90 days or earlier on agreement) to remediate. We do not currently run a paid bug bounty, but we credit researchers who follow responsible-disclosure norms.

Breach notification

In the event of a personal-data breach affecting your data, we will notify the affected customers without undue delay and in any event within 72 hours of becoming aware, in line with GDPR Article 33. The notice will describe the nature of the breach, the categories and approximate number of data subjects and records concerned, the likely consequences, and the measures taken or proposed.

Compliance roadmap (honest)

  • Today: documented information-security policy, RLS-enforced data isolation, short-lived signed-URL storage access, encrypted transport and storage, dependency scanning, SSO + hardware MFA on every production console, principle-of-least-privilege production access.
  • SOC 2 Type II: audit kicked off via a recognised auditor when MRR ≥ $5K. Targeting initial Type II report 6 months after kickoff.
  • Annual penetration test: once we have a paying customer base above 100. Reports available under NDA.
  • ISO 27001: not on the near-term roadmap. Most of our infrastructure providers (Vercel, Stripe, Sentry) are certified, which carries down to their handling.

We do not list certifications we don't have. If you need a control we don't yet operate, email and ask — we will tell you whether it's on the roadmap or not.

GDPR and your rights

For most of our B2B customers, the relationship is you (the controller) → Matrizexplícita Lda (the processor) for the bank-statement data of your end clients. The Data Processing Addendum at /legal/dpa governs that relationship and incorporates the EU Standard Contractual Clauses for any onward transfer outside the EEA.

For data we collect about you directly (account email, billing data, usage metadata) we are the controller. The full list of controller-side rights — access, rectification, erasure, restriction, portability, objection — is in the privacy policy. The competent supervisory authority is the Comissão Nacional de Proteção de Dados (CNPD) in Portugal — you can lodge a complaint at cnpd.pt.

Contact

Security questions, DPA requests, deletion requests, or compliance questionnaires — email hello@bankpdftoxls.com. Vulnerability reports — security@bankpdftoxls.com. Postal: Matrizexplícita Lda, Portugal.

Need a signed DPA or a security questionnaire filled in?

Read the offered DPA at /legal/dpa and email a counter-signature request to hello@bankpdftoxls.com. Standard turnaround is one business day for an unmodified DPA; longer if you need bespoke clauses or a vendor questionnaire.