When to read this
You're uploading a file that got rejected, or you're about to send a batch and want to know what's accepted, or you're integrating with a vendor's export and need to confirm the file shape works for us. This article is the canonical reference for the formats + limits across all upload paths (web, mobile, email forwarding).
For the upload flow itself, see upload via the web, scan with the mobile app, or set up email forwarding. For the email-specific attachment rules, see email attachment size and format limits — same rules, but with a few email-specific quirks (inline images, .msg files).
What works
| Format | Notes |
|---|---|
| Single- and multi-page. Most common — exported invoices, scanned receipts, bank statements. | |
| JPG / JPEG | Phone photos, scanned receipts, screenshots saved as JPEG. EXIF orientation is respected. |
| PNG | Screenshots (e.g. SaaS invoice from a web app), scans saved as PNG. |
| HEIC | iPhone's default photo format. We convert internally; you don't need to do anything. |
That's the entire allow-list. Other formats are rejected at upload time with a specific reason ("invalid file type", "encrypted PDF", "archive — extract first").
The 20 MB size limit
20 MB per file. This covers virtually all real-world invoices, receipts, and short bank statements. Things that bump up against it:
- High-resolution scans. A 600-DPI colour scan of a multi-page document can easily hit 50 MB. Re-export at 300 DPI (still readable, looks identical to your eyes) and the file drops to under 20 MB.
- Long PDFs. A 50-page bank statement at decent resolution is typically 5–15 MB; a 200-page one might exceed. Split it.
- Phone photos. Almost always under 5 MB. If your photo is over, you're probably on a phone with the camera set to "max resolution"; the default works fine.
If you hit the limit, three workarounds:
- Re-export at lower resolution in any PDF viewer (Preview on macOS:
File → Export → Quartz Filter → Reduce File Size; Adobe Acrobat:File → Save As Other → Reduced Size PDF). - Split multi-page documents. Most PDF viewers can extract selected pages to a new file. Each part then uploads under 20 MB.
- Ask the vendor for a smaller version. Vendors sometimes send extreme-resolution invoices that don't need to be that big. A polite "could you send a normal-resolution PDF" usually works.
The 20 MB limit isn't arbitrary — it's the point where extraction latency and reliability start to degrade noticeably, and where storage costs per document become disproportionate to a typical invoice's value. Raising it would push all users' costs higher to accommodate the few documents that need it.
What doesn't work and what to do instead
- Office formats (.docx, .xlsx, .odt, .pptx) — convert to PDF first. Both Word/Excel and Google Docs/Sheets export to PDF in one click. Once it's PDF, we accept it.
- Email files (.eml, .msg) — forward the email to your
u-…@in.taxiteasy.orgaddress instead. See set up email forwarding. Direct .eml upload would mean dropping the email body anyway (we don't persist email bodies), so the forwarding path is the right shape. - Archives (.zip, .rar, .7z, .tar) — extract first, then upload individual documents. We deliberately don't unpack archives server-side; that's a footgun for malware delivery and the user-side unpacking is a 10-second task.
- Encrypted PDFs — remove the password first, save a decrypted copy. We can't extract from encrypted PDFs (and we wouldn't want to store decrypted versions of password-protected documents on your behalf — that defeats the password's purpose).
- TIFF, BMP, GIF, WebP — convert to JPG or PNG first. Any image editor does this in 10 seconds. TIFF in particular often shows up from old scanners; converting drops the file size as a side effect.
- SVG, EPS, AI — vector formats with no consistent text layer for extraction. Convert to PDF first.
Magic-byte verification (security)
We check the actual file bytes (a content-sniffing check using "magic numbers" — the few bytes at the start of every file that identify its format). Renaming notes.txt to invoice.pdf won't slip through; the magic-byte check sees a text file disguised as PDF and rejects the upload.
This is a security control with two purposes:
- Pipeline safety. The OCR pipeline expects to see actual PDFs / images. Feeding it a misnamed text file would cause unpredictable errors. The check fails fast at upload time.
- Anti-exfiltration. It stops anyone from accidentally exfiltrating arbitrary files (logs, secrets, source code) into our processing chain by renaming them. Nothing gets stored except files that pass content verification.
The verification runs on the raw bytes before any decoding — there's no way to spoof it by adjusting the extension, the MIME type the browser claims, or any other client-side hint.
Multi-page PDFs
Multi-page PDFs are treated as one document, not one document per page. The AI reads all pages, extracts line items across pages (an invoice with line items on page 1 and totals on page 3 is parsed correctly), and produces a single document record.
This matters for:
- Quota — one PDF = one document, regardless of page count.
- Display — the document detail page shows all pages with thumbnails.
- Export — the document is one entry in the JSON export; the PDF itself is linked as one file.
If you want each page treated as a separate document (rare — typically when a single PDF was generated by stapling several invoices together), split the PDF first and upload each part.
Multi-currency, multi-language documents
We don't impose a language or currency restriction on the file itself. Confidence is highest for the languages we explicitly train on (EN, DE, FR, IT, ES, NL, PL); other languages still work but with more manual review. See what the AI reads for the language-confidence breakdown.
For currency, the extraction returns whatever currency the document uses; the conversion to your base currency happens after extraction. See multi-currency and live ECB rates.
Troubleshooting
Password-protected PDF. Decrypt and re-save without the password. We can't extract from encrypted PDFs (the cryptographic structure of the PDF hides the text from the OCR engine entirely), and we wouldn't want to store decrypted versions of password-protected docs on your behalf — that defeats the purpose of the password. Common decryption: open in Preview / Adobe, File → Save As / Export, untick or omit the password.
100-page bank statement. Upload it. Page count alone isn't the issue; file size is. A well-compressed multi-page PDF can be 100+ pages and stay under 20 MB. The extraction can take 2–3 minutes for a long statement (vs 30 seconds for a typical invoice), but it works.
My .HEIC file uploads but shows the wrong orientation. Open it on your iPhone first (which auto-rotates based on EXIF), share/save as JPG, then upload the JPG. The EXIF rotation tag is sometimes ignored by our viewer for HEICs whose rotation is encoded ambiguously. Saving as JPG bakes the rotation into pixels and removes the ambiguity.
GIF / WebP / TIFF. Convert to JPG or PNG first. Any image editor does this; on macOS, Preview's File → Export with "Format: JPEG" handles all three. On Windows, the built-in Photos app or Paint will too.
My PDF is 25 MB and got rejected. Three options: (1) re-export at lower DPI from the source, (2) re-save through Preview / Acrobat with the "reduce file size" option, (3) split into multiple PDFs. For invoices from vendors who chronically send huge files (some bank PDFs, some specialised tax forms), ask the vendor for a normal-resolution version — they almost always have one.
The file passed the size check but extraction failed. That's a different problem from this article — see upload failed — troubleshooting and the OCR extracted the wrong fields for extraction issues.
Why no .docx support? My vendor sends Word. Word documents lack a consistent text/layout structure for invoice extraction — different invoice templates render completely differently when parsed as Word XML. PDF is the standard for invoices precisely because it locks the rendering. Converting Word to PDF (built into Word / LibreOffice / any office suite) takes 5 seconds and produces the right shape for our pipeline.
Related
- Upload via the web — the desktop upload paths
- Scan with the mobile app — phone camera path
- Email attachment size and format limits — same rules for email forwarding
- Upload failed — troubleshooting — when something else breaks