Language handling in exports

tax-advisor Updated Tue Jul 14 2026 00:00:00 GMT+0000 (Coordinated Universal Time)

Exports preserve original-language vendor names and line items as-is. No automated translation. Column headers and status values are English (the posting-batch file for accounting software keeps that format's own conventions). The original document files are available as a ZIP of receipts, so anything that needs reading-in-language stays accessible.

When this article is for you

You're a tax advisor working with multi-locale clients — a client in Madrid invoicing French customers, a Berlin freelancer with Italian + Greek suppliers, a Warsaw consultancy with English + Polish + Czech documents. You want to know how TaxItEasy preserves the original-language information through the export pipeline, what gets normalised, and what stays as-is.

For the multi-currency story alongside multi-language (often the same clients), see multi-currency for cross-border clients. For the broader bulk-export flow, see bulk export as a tax advisor. For which export formats exist, see CSV and PDF export status.

What's English, what's original-language

The export files have a clear separation. The structure is in English (predictable for any downstream tool); the content extracted from documents stays in the language of the source document.

Always English

These are platform-side, always normalised:

Column headers — Invoice Number, Date, Due Date, Vendor, Net, VAT Rate, VAT, Gross, Currency, Status, Category, and so on.
Status values — pending / reviewed / flagged, paid / unpaid, matched / unmatched.
Categories — exported as their resolved English labels (never internal keys). Custom categories a company created export under the name they were given.
Metadata — export date, period parameters, report titles.

One deliberate exception: the posting-batch export (the hand-off file for your accounting software) mirrors that import format's own column conventions, because that's what the receiving software expects.

Original-language preserved

These are document-side content fields, kept verbatim as extracted from the original document:

Vendor name — "Société Générale" stays "Société Générale"; we don't transliterate or translate. Same for Cyrillic, Greek, or CJK scripts.
Invoice number / reference — exactly as on the invoice, including any prefix codes ("R-2026-0042", "FA-23-001").
Line-item descriptions — "Software-Lizenz Adobe Creative Cloud" stays German if the invoice was German; "Hébergement web mensuel" stays French.
Free-text fields written by humans — comments and notes are in whatever language the author wrote them.

The combined result: English column headers around original-language content. That makes the files importable into any downstream tool (which expects standard columns) while preserving original-language readability for any human eyes on the data.

The original document files

Exports carry the extracted, structured data — not embedded document images. When you need the originals (e.g. a Greek-language invoice you want to inspect by hand), use the receipts ZIP on the advisor Reports tab: it packages the client's original uploaded files as an encrypted ZIP, delivered as an emailed single-use download link (valid 24 hours) with a password shown once on screen.

For long-term archival, store the downloaded originals per your engagement-retention process.

Why no automated translation

The product positioning is: TaxItEasy extracts + organises; downstream tools translate if needed. Reasons:

Tax-substance terms don't translate cleanly. A German "Vorsteuer" isn't quite the same concept as a French "TVA déductible" — close, but with subtle legal differences that translation flattens. For tax purposes, drift is risk.
Audit defensibility. Original-language is the document of record. A tax-authority audit asks "what did the actual invoice say?" — the answer is whatever script + language was on the document, not a translated rendition.
Most client-advisor setups already have the language alignment they need. Cross-language relationships are the exception.

If you need bulk translation for downstream purposes, standard tools (DeepL, Google Translate, an LLM with a tax-term glossary) work well on the exported files: translate the content columns, leave the headers alone.

Multi-locale clients in practice

Client invoices in 3 languages

A Lisbon-based SMB receives Portuguese, English, and Spanish invoices from EU vendors. Each invoice arrives in its own language; the export carries each in original. The client's accounting system handles the import with whatever per-language rules it has.

Multiple spellings for the same vendor

The vendor name on invoices varies over time (rebrand, transliteration, formal-vs-informal). Fix it at the vendor record: correct the name once (see how to fix a misread vendor name), or merge duplicate partner records — exports then show the canonical partner name consistently.

Edge cases

"My client's bookkeeping system needs localised column headers." The export headers are English. Workaround: a one-time rename step in your import pipeline — most tools map columns on import anyway.

"Vendor changed to a transliterated name (e.g. 'IKEA' → 'ИКЕА')." We store what's on the document. If the vendor sends a Cyrillic-script invoice, the extracted name is Cyrillic. Merge or edit the partner record to your preferred canonical form if you want a single name across the books.

"My client has invoices in 5 different languages." Fine. Each invoice carries its own language; exports preserve all 5 in original. The English headers stay the only constant.

"Vendor name encoding is broken in my downstream tool (shows '???' for special chars)." The exports are UTF-8 (the company-export CSVs additionally carry a BOM so spreadsheet apps detect it). If your downstream tool defaults to a legacy encoding (Windows-1252, ISO-8859-1), force UTF-8 on the receive side.

"Special characters in vendor name break my downstream regex." Vendor names can contain any character that was on the document. If your tool's regex assumes ASCII, widen the character class (e.g. [\p{L}]+ instead of [a-zA-Z]+).

"Date formats — original or normalised?" Dates are normalised to ISO 8601 (YYYY-MM-DD) regardless of the document's locale. The AI parses locale formats (DE 01.02.2026, FR 01/02/2026, ES 01-02-2026) during extraction and stores the normalised date.

"Decimal comma or decimal point?" The original document's formatting (European 1.234,56 vs US 1,234.56) is parsed during extraction; exports carry normalised decimal numbers plus a currency column. Your downstream tool handles display formatting.

"What about the personal GDPR data export?" That's a different export: the account-level right-of-access export is a JSON file of your own account data — separate from the company/business exports this article covers.

Multi-currency for cross-border clients — multi-locale + multi-currency often go together
CSV and PDF export status — which export formats exist
Bulk export as a tax advisor — the export flow that uses these conventions