When to read this article
You uploaded a document and the AI returned something you weren't sure about — a missing field, a low-confidence yellow highlight, or just wrong data. This article explains exactly what the AI extracts, how to read the confidence scores, how the correction flow works, and how your corrections train future extractions for the same vendor.
If you're trying to understand why a particular score is low (or why a confident-looking field is wrong), see the deeper understanding confidence scores article. For the specific vendor-rename flow, see how to fix a misread vendor name.
What the AI extracts
Every uploaded or forwarded document goes through two stages. The first decides what kind of document it is; the second extracts structured fields.
Stage 1 — classification
The classifier returns one of four labels:
invoice— an inbound bill (most common)receipt— a smaller-format proof of purchasestatement— a multi-line summary (bank, credit-card, vendor account)irrelevant— newsletters, marketing emails, internal forwards, blank pages
Classification is fast (sub-second) and runs before extraction. If a document is classified irrelevant, no extraction runs and no document is created — the file is dropped from the pipeline and not counted against your quota. You can override classification from the document detail page if it got it wrong.
Stage 2 — field extraction
For documents classified as invoice, receipt, or statement, the AI returns:
- Vendor name — the supplier's legal or trading name
- Vendor address — full postal address as printed on the invoice
- Vendor VAT ID — when visible (
DE...,IE...,CYxxx,GB...) - Invoice number — exactly as printed
- Invoice date — parsed in the document's own date format and normalised to ISO
- Due date — if the invoice has one
- Currency — three-letter ISO code (EUR, USD, GBP, CHF, …)
- Net total — pre-tax sum
- VAT rate — single rate when the whole invoice is one rate; line-item rates when mixed
- VAT amount — the actual VAT figure printed
- Gross total — what you actually paid / owe
- Line items — quantity, unit, unit price, line total (for itemised invoices)
- Suggested category — Office supplies, Travel, Software, Hosting, Telecom, etc.
For multi-currency users, the gross total is auto-converted to your base currency at the European Central Bank's reference rate for the invoice date. See multi-currency and live ECB rates for the conversion details.
Confidence scores — what 70% means
Every extracted field carries a confidence score from 0 to 100. The score reflects how sure the model is that the extracted value matches what's actually on the document.
- Above 70% — auto-applied. The field shows in the normal text colour. No action needed.
- 40–70% — auto-applied but highlighted yellow. The model picked a value but wants you to glance at it. One click confirms; one click edits.
- Below 40% — left blank with a hint icon. The model couldn't pick a confident value. You enter it manually.
The 70% threshold is calibrated empirically across our extraction corpus — it's the point where auto-applying does more good than harm. Lowering it produces more silent errors; raising it makes every field a manual one. We don't expose a user-adjustable threshold; we'd rather tune the model itself.
A high score doesn't guarantee correctness, just confidence. The model can be confidently wrong when an invoice has a layout it has never seen, when text is overlapping, or when there are two plausible candidates for the same field (two dates, two totals on a credit note). Glance through high-confidence fields too, especially on the first invoice from a new vendor.
For the deeper score breakdown, see understanding confidence scores.
How to correct a field
- Open any document on the Documents page (or click the in-app notification when extraction finishes).
- The detail page shows the original (left) and the extracted fields (right).
- Click any field to edit. Type the correct value.
- Hit
Tabor click outside the field — the edit is saved immediately. The yellow highlight clears (if there was one); the confidence display jumps toManualfor that field. - Hit
Save & Approveat the top when you're done. The document moves fromPending reviewtoApproved.
You don't have to approve a document for it to count against your quota — quota is consumed on upload, not on approval. You also don't have to approve for an exported year-end CSV to include it. Approval is a workflow signal for tax-advisor review queues; see the advisor articles if you're working with one.
Vendor learning — your corrections feed forward
This is the most powerful feature people miss. When you correct a vendor name, address, or category, TaxItEasy remembers the mapping. The next invoice from the same vendor (matched by domain, logo, header layout, or VAT ID) gets your correction applied automatically, with a vendor memory badge so you know it was inferred from prior data, not a fresh extraction.
This means:
- Your second invoice from a vendor takes less correction time than your first
- Your tenth invoice from a vendor is usually approved in one click
- Renaming a vendor once (e.g. fixing "Hetzner GmbH" → "Hetzner Online GmbH") fixes it permanently going forward
For the specific vendor-rename mechanics, see how to fix a misread vendor name.
Vendor memory is per-company, not global. Two different companies in your account learn vendors independently. Vendor memory also propagates to your tax advisor's read-only view when they're invited — they see the corrected names, not the original raw extraction.
What the AI cannot do (well)
- Handwritten text. Note next to a receipt total? The AI ignores it. Notes go on the document detail page in the
Notesfield instead. - Severely blurred or shadowed photos. If a human struggles to read it, the AI struggles too. Re-shoot under better light, then re-scan from the detail page.
- Languages outside its training set. EN, DE, FR, IT, ES, NL, PL: confident. Cyrillic, Greek, Asian scripts: usable but more manual review. For invoices in unusual scripts, the safest path is to type fields manually rather than rely on extraction.
- Very long statements. A 40-page bank statement extracts cleanly but line items are aggregated; for line-by-line transaction matching, see how the matching pipeline works.
- Hand-altered documents. Invoices with handwritten amendments (cross-outs, scribbled discounts, signatures over numbers) lose accuracy. Treat the printed value as authoritative and use the
Notesfield for the amendment.
Troubleshooting
The AI got the vendor wrong on every invoice from the same supplier. Fix it once on the latest invoice and the vendor-memory layer takes over. If the misread is consistent across many already-processed invoices, you can bulk-rename: on the Documents page, filter by the wrong vendor name, select all, and use the Bulk edit → Set vendor action to retroactively rename.
The AI extracted the wrong VAT rate. Click the rate, change it. The VAT amount recomputes from the new rate and the visible gross/net. If multiple line items have different rates (e.g. mixed 19% / 7% on a German invoice), open the line-items view and adjust per line; the totals recompute automatically.
Confidence score is 30% on every field. Almost always a blurry photo or a tilted scan. Re-upload a cleaner version from the document detail page (Re-scan button) — you get up to 3 retries per document without burning quota.
Same document was extracted twice with different values. You uploaded it twice and the duplicate-detector didn't catch it (a re-export with a different filename produces a different hash). Delete one of them; quota was consumed only once if duplicate detection caught it, twice if not.
I want to re-run extraction without re-uploading. On the detail page, click Re-scan. Up to 3 re-scans per document — useful when you've just renamed a vendor and want the system to re-apply its memory, or when you've corrected the original photo and want a fresh read.
The extracted total doesn't match what the invoice prints. First glance: are you looking at net or gross? Most invoices show gross most prominently, but the system labels them separately. If the value is genuinely wrong (e.g. extracted 1.270,00 EUR for an invoice that shows 1.270,00 — the European decimal-comma vs the en-US decimal-point), fix the field; the model adapts to your locale over time.
Currency conversion is using a stale rate. Click the currency field, then Re-fetch rate next to it. The ECB rate for the invoice date is re-pulled. If the invoice date itself is wrong, fix it first — currency conversion uses the invoice date, not the upload date.
Related
- Understanding confidence scores — the full score-calibration story
- How to fix a misread vendor name — vendor memory in detail
- Multi-currency and live ECB rates — the conversion details
- Upload via the web — getting documents in
- Export your records for year-end — what corrected data feeds into