When this matters
You set up email forwarding (a Gmail rule that auto-forwards Stripe receipts, or you forward invoices manually) and you're wondering: what happens when the rule accidentally forwards a newsletter? Will it create a fake document and burn quota? Will it spam your audit log?
Short version: no. The classifier eats it silently, logs the decision in the Activity log so you can audit, and moves on. This article explains exactly how the classifier makes its call, what each classification does, and how to override when it gets one wrong.
What the classifier checks
Every inbound email is scored against four labels. The classifier looks at:
- Sender domain —
[email protected],[email protected],[email protected]are strong invoice signals; mailing-list senders (newsletter@,info@,marketing@,news@) are strong newsletter signals. - Subject line keywords — "Rechnung", "Invoice", "Facture", "Receipt", "Beleg", "Quittung", "Statement", "Bill" all point to invoice-shaped content. "Newsletter", "Update", "Notice", "Deals" point to newsletter-shaped content.
- Body keywords — phrases like "amount due", "VAT", "thank you for your payment", invoice numbers in standard formats. The body is read once for classification and dropped — we don't keep it.
- Attachment names and types —
invoice-123.pdforreceipt-2026-05.pdfare strong positive signals;newsletter.htmlor no attachment is a negative signal. - A small set of light heuristics — e.g. emails with an
unsubscribelink in the body are heavily down-weighted to invoice; emails with both an invoice-shaped subject and a PDF attachment are heavily up-weighted.
The classifier returns one of the four labels plus a confidence score (0–100). Below ~60% confidence the email is treated cautiously (typically still classified, but with the lower-confidence path that doesn't auto-extract aggressively).
What happens with each classification
| Class | Action | Counts against quota? |
|---|---|---|
| invoice | Attachments extracted, OCR runs, document created | Yes — one per attachment |
| receipt | Same as invoice | Yes |
| statement | Same as invoice (but flagged as multi-line summary) | Yes |
| irrelevant | Logged in Activity log, no document created | No |
The "no document created, no quota use" path is the important guarantee: forwarded newsletters do not silently eat your monthly quota. If you set up an aggressive Gmail rule that forwards a wide net of senders, the false-positive newsletters land on irrelevant and disappear from the pipeline cleanly.
Where to see the classification
Settings → Email integration → Activity log. Every inbound email is listed with:
- Sender, date, subject
- Classification (invoice / receipt / statement / irrelevant) and confidence score
- Number of attachments extracted (0 for irrelevant)
- Link to the resulting document (for invoice / receipt / statement)
- Override actions (
Mark as invoice + extract,Delete entry)
The log is searchable and filterable. The most useful filter is "irrelevant only" — if you suspect the classifier is missing real invoices, scroll the irrelevant log and look for senders that surprise you.
When the classifier gets it wrong
The classifier is calibrated for high precision on invoice (don't create fake documents) at the cost of some recall on edge cases (occasional missed invoice). When that happens, you have two recovery paths.
The invoice got classified as irrelevant
This usually happens with very minimal "Your receipt is attached" emails — short subject, generic body, no clear invoice keywords, attachment with a generic name like document.pdf. The classifier doesn't have enough signal.
Recovery: open the Activity log, find the email, click Mark as invoice + extract. This:
- Forces extraction of the attachments.
- Creates the document(s) as normal — quota is consumed now (which is fair; you confirmed it's a real invoice).
- Trains the classifier (slightly) for that sender. Next time the same sender sends a similar email, the classifier sees a positive example and is more likely to classify as invoice. Over a few corrections per sender, the classifier becomes accurate for your specific senders.
The newsletter got classified as invoice
Rarer (the precision tuning prevents most of these) but can happen with marketing emails that include a PDF attachment (e.g. "Read our latest whitepaper.pdf"). The classifier saw a PDF + an invoice-shaped subject and over-fired.
Recovery: open the resulting document, click Delete. The document is removed and quota is not refunded (we already committed quota at upload time). For repeat offenders (same newsletter keeps tripping the classifier), open the Activity log entry and click Mark as irrelevant + train, which signals the classifier to be more cautious on that sender.
Quota is consumed on commit, not on classification
A subtle point worth knowing: quota is consumed when the document is created (after successful classification + extraction), not when the email arrives. This means:
- Irrelevant emails consume no quota (correct).
- Failed extractions (file too large, virus scan failed) consume no quota (correct — there's no document).
- Successful extractions where you later delete the document do consume quota (the deletion doesn't refund). This is deliberate: it would be too easy to bypass quota by uploading, exporting, deleting, repeating.
See understanding your document counter for the full quota accounting.
How accurate is the classifier?
Across our user base, classifier precision on invoice / receipt / statement is roughly 95–98% — meaning a small fraction of classifications need correction. Recall on actual invoices is roughly 90–95% — meaning a small fraction of real invoices get classified as irrelevant on the first try. Both numbers improve over time as you correct miscalls (the per-sender training adapts the classifier to your specific senders).
The numbers vary by language and by vendor mix. EU vendors with templated invoice subjects (Stripe, Hetzner, AWS) classify nearly perfectly. Smaller vendors with idiosyncratic email styles produce more misclassifications. If you have a high-volume specific vendor that the classifier consistently misses, write to [email protected] with a few sample (de-identified) subject lines and we can hand-tune for that sender.
Edge cases
It classified my invoice as irrelevant. Open the Activity log, find the email, click Mark as invoice + extract. The extraction runs immediately and trains the classifier for that sender. After 2–3 corrections per sender, the classifier usually generalises.
It classified my newsletter as invoice — there's now a fake document. Open the document, click Delete. For repeat offenders, use Mark as irrelevant + train from the Activity log.
Can I disable the classifier and force-process everything? No, by design. The classifier protects your quota; without it, every newsletter would consume a slot. If you want to force-extract a specific email that was classified irrelevant, use the per-email "Mark as invoice + extract" action. If you want every email to be processed (you're using forwarding for compliance archiving, not invoice automation), forwarding may not be the right tool — talk to us at [email protected] about your use case.
I forwarded a calendar invite — what happened? Classified as irrelevant. No document, no quota use. (Calendar invites have very specific MIME structure that the classifier recognises.)
I forwarded an email with no attachment. Classification still runs on the body and subject. If the email itself contains the receipt (e.g. some merchants email plain-text receipts without attaching a PDF), the classifier might pick receipt and create a document from the email body. If you'd rather only PDFs become documents, configure your forwarding rule to only forward emails with attachments — most providers expose this as a filter condition.
The Activity log says "rejected — SPF/DKIM failed". Different problem — the email never reached the classifier because authentication failed. See the inbound address explained for sender authentication.
Two-step forwarding (my Gmail forwards to a shared inbox which forwards to TaxItEasy) is showing the wrong sender. Multi-hop forwarding rewrites the From: header on each hop. The classifier sees the last hop's sender, not the original. For accurate sender attribution, configure the upstream rule to forward directly to TaxItEasy rather than through an intermediate.
Can I see the classifier's confidence score per email? Yes — Activity log entry → expand → shows the score (0–100) and the top features that contributed to the call. Useful for understanding why an edge-case email went one way or another.
Related
- Set up email forwarding — the rules that produce the forwarding traffic
- What the AI reads, and how to correct it — the extraction step that runs after classification
- Understanding your document counter — what counts and what doesn't
- The inbound address explained — the address that receives forwarded mail