What does the AI read, and how do I correct it?

all Updated Mon Jul 13 2026 00:00:00 GMT+0000 (Coordinated Universal Time)

Each document is classified (invoice, receipt, credit note, bank statement, tax document, contract) and then 25+ data points are extracted — vendor details, invoice number, dates, net / VAT / gross totals, VAT rate, currency, suggested category, and every line item. Each document gets one overall confidence score: at 70 or above the record is created automatically; below 70 the document is re-read with a stronger model, and if it's still unsure, nothing is booked — the document is stored for your manual review. You can edit any field at any time; corrections feed back into vendor learning for next time.

When to read this article

You uploaded a document and the AI returned something you weren't sure about — a missing field, a low confidence score, or just wrong data. This article explains exactly what the AI extracts, how to read the confidence score, how the correction flow works, and how your corrections train future extractions for the same vendor.

If you're trying to understand why a score is low (or why a confident-looking extraction is wrong), see the deeper understanding confidence scores article. For the specific vendor-rename flow, see how to fix a misread vendor name.

What the AI extracts

Every uploaded or forwarded document goes through two stages. The first decides what kind of document it is; the second extracts structured fields.

Stage 1 — classification

The classifier decides what kind of document it is — invoice, receipt, credit note, bank statement, tax document, contract, or other — along with its language and country of origin. That decision drives everything after it: bank statements go through their own transaction extraction, and the detected country selects the right VAT rates and expense categories for stage 2. You can override the classification from the document detail page if it got it wrong. (On the email-forwarding path, irrelevant mail — newsletters, marketing — is filtered out before a document is ever created.)

Stage 2 — field extraction

For invoices and receipts, the AI extracts 25+ data points, including:

Vendor name — the supplier's legal or trading name
Vendor address — full postal address as printed on the invoice
Vendor VAT ID — when visible (DE..., IE..., CYxxx, GB...)
Invoice number — exactly as printed
Invoice date — parsed in the document's own date format and normalised to ISO
Due date — if the invoice has one
Currency — three-letter ISO code (EUR, USD, GBP, CHF, …)
Net total — pre-tax sum
VAT rate — single rate when the whole invoice is one rate; line-item rates when mixed
VAT amount — the actual VAT figure printed
Gross total — what you actually paid / owe
Line items — quantity, unit, unit price, line total (for itemised invoices)
Suggested category — Office supplies, Travel, Software, Hosting, Telecom, etc.

Bank statements get their own extraction — bank name, account holder, statement period, opening and closing balances, and every transaction on the page, imported as bank transactions.

For multi-currency users, the gross total is auto-converted to your base currency at the European Central Bank's reference rate for the invoice date. See multi-currency and official ECB rates for the conversion details.

The confidence score — what 70 means

Every document gets one overall confidence score from 0 to 100 — a single score for the whole document, not one per field. It reflects how sure the model is that what it read matches what's actually on the document.

70 or above — the record is created automatically from the scan. No action needed.
Below 70 — the document is automatically re-read with a stronger model. If the re-read reaches 70, the record is created as usual. If it's still below 70, nothing is booked: the document stays stored, searchable, and readable, and you review the extracted data and create the record yourself.

One safety net works independently of the score: if the extracted net + VAT don't add up to the total (to the cent), the total you actually paid is treated as the anchor, the net is re-derived from it, and the document is flagged for your review instead of booking contradictory numbers.

The 70 threshold isn't user-adjustable — lowering it produces more silent errors; raising it makes every scan a manual one. We'd rather tune the model itself.

A high score doesn't guarantee correctness, just confidence. The model can be confidently wrong when an invoice has a layout it has never seen, when text is overlapping, or when there are two plausible candidates for the same value (two dates, two totals on a credit note). Glance through the extracted data on the first invoice from a new vendor even when the score is high.

For the deeper score breakdown, see understanding confidence scores.

How to correct a field

Open any document on the Documents page (or click the in-app notification when extraction finishes).
The detail page shows the original (left) and the extracted fields (right).
Click any field to edit. Type the correct value.
Hit Tab or click outside the field — the edit is saved immediately.
Hit Save & Approve at the top when you're done. The document moves from Pending review to Approved.

You don't have to approve a document for it to count against your quota — quota is consumed on upload, not on approval. You also don't have to approve for an exported year-end CSV to include it. Approval is a workflow signal for tax-advisor review queues; see the advisor articles if you're working with one.

Vendor learning — your corrections feed forward

This is the most powerful feature people miss. When you correct a scanned invoice, TaxItEasy learns vendor-specific patterns — category, currency, VAT rate, net-vs-gross layout. After three confirmed corrections a pattern becomes active and is fed into future scans of that vendor, so the same mistake stops repeating. Patterns are learned per country too — a vendor's German invoices never contaminate its UK ones.

This means:

Your third invoice from a vendor takes less correction time than your first
If a vendor's values genuinely vary, the system learns not to assume — rather than guessing wrong confidently
Renaming a vendor (e.g. fixing "Hetzner GmbH" → "Hetzner Online GmbH") fixes it going forward

For the specific vendor-rename mechanics, see how to fix a misread vendor name.

Vendor learning is per-company, not global. Two different companies in your account learn vendors independently. The corrections also carry through to your tax advisor's view when they're invited — they see the corrected names, not the original raw extraction.

What the AI cannot do (well)

Handwritten text. Note next to a receipt total? The AI ignores it. Notes go on the document detail page in the Notes field instead.
Severely blurred or shadowed photos. If a human struggles to read it, the AI struggles too. Re-shoot under better light, then re-scan from the detail page.
Languages outside its training set. EN, DE, FR, IT, ES, NL, PL: confident. Cyrillic, Greek, Asian scripts: usable but more manual review. For invoices in unusual scripts, the safest path is to type fields manually rather than rely on extraction.
Very long statements. A 40-page bank statement extracts cleanly but line items are aggregated; for line-by-line transaction matching, see how the matching pipeline works.
Hand-altered documents. Invoices with handwritten amendments (cross-outs, scribbled discounts, signatures over numbers) lose accuracy. Treat the printed value as authoritative and use the Notes field for the amendment.

Troubleshooting

The AI got the vendor wrong on every invoice from the same supplier. Fix it on the latest invoice and the vendor-learning layer takes over for future scans. If the misread is consistent across many already-processed invoices, you can bulk-rename: on the Documents page, filter by the wrong vendor name, select all, and use the Bulk edit → Set vendor action to retroactively rename.

The AI extracted the wrong VAT rate. Click the rate, change it. The VAT amount recomputes from the new rate and the visible gross/net. If multiple line items have different rates (e.g. mixed 19% / 7% on a German invoice), open the line-items view and adjust per line; the totals recompute automatically.

The confidence score is low on everything I upload. Almost always a blurry photo or a tilted scan. Re-upload a cleaner version from the document detail page (Re-scan button) — you get up to 3 retries per document without burning quota.

Same document was extracted twice with different values. You uploaded it twice and the duplicate-detector didn't catch it (a re-export with a different filename produces a different hash). Delete one of them; quota was consumed only once if duplicate detection caught it, twice if not.

I want to re-run extraction without re-uploading. On the detail page, click Re-scan. Up to 3 re-scans per document — useful when you've just renamed a vendor and want the system to re-apply its memory, or when you've corrected the original photo and want a fresh read.

The extracted total doesn't match what the invoice prints. First glance: are you looking at net or gross? Most invoices show gross most prominently, but the system labels them separately. If the value is genuinely wrong (e.g. extracted 1.270,00 EUR for an invoice that shows 1.270,00 — the European decimal-comma vs the en-US decimal-point), fix the field; the model adapts to your locale over time.

Currency conversion is using a stale rate. Click the currency field, then Re-fetch rate next to it. The ECB rate for the invoice date is re-pulled. If the invoice date itself is wrong, fix it first — currency conversion uses the invoice date, not the upload date.

Understanding confidence scores — the full score-calibration story
How to fix a misread vendor name — vendor memory in detail
Multi-currency and official ECB rates — the conversion details
Upload via the web — getting documents in
Export your records for year-end — what corrected data feeds into