DeepL Pre-Translation Quality Gate
A DeepL pre-translation quality gate runs every machine-translated string through automated checks — placeholder parity, length ratio, and glossary adherence — before it is written back as needs-review, so broken ICU never reaches a human or a build. DeepL is genuinely strong, but it will still occasionally drop a {count} placeholder, mangle a <b> tag, “translate” the ICU keyword other, or ignore a glossary term under inflection. Without a gate, those defects land in your catalog flagged as if they were merely unreviewed prose, and the next npm run build dies on Expected "}" but found "uno". This page wires DeepL into a machine-translation pre-fill workflow with tag handling, a glossary, and a confidence check that rejects defective output instead of trusting it.
Root cause analysis
DeepL’s neural model treats the whole input string as natural language. Anything that looks like prose — and ICU keywords, HTML tags, and named placeholders all do — is fair game for reordering, inflection, casing changes, or outright omission. Three behaviours cause almost every broken pre-fill:
- Placeholder loss or duplication. With
tag_handlingoff, DeepL sees{count}as a word. German output frequently becomes{Anzahl}or drops the braces entirely; long sentences sometimes emit a placeholder twice. The ICU parser then throws on an unknown argument or a malformed token. - ICU keyword translation. In
{count, plural, one {...} other {...}}the tokensplural,one, andotherare reserved selectors defined by CLDR plural rules, not words. DeepL has no way to know that and will happily renderoneasunooreins, producing a syntactically invalid message. - Glossary drift under morphology. Even with a glossary attached, DeepL applies terms case- and form-sensitively. A glossary entry
Dashboard → Übersichtmay be respected in nominative but silently re-translated when the sentence demands a different case, so your enforced glossary terms never actually appear.
The fix is two-sided: prevent corruption by masking placeholders into XML tags DeepL is contractually told to preserve, and detect residual corruption with a gate that compares the output against the source before trusting it.
Minimal reproducible example
Send an ICU plural straight to DeepL as plain text and the structure does not survive. This is the smallest call that reproduces the breakage:
import * as deepl from "deepl-node";
const translator = new deepl.Translator(process.env.DEEPL_KEY!);
const source = "{count, plural, one {# file} other {# files}}";
const r = await translator.translateText(source, "en", "de"); // no tag handling
console.log(r.text);
// → "{Anzahl, Plural, eins {# Datei} andere {# Dateien}}"
// ^^^^^^^ ^^^^^^ ^^^^ ^^^^^^
// placeholder selectors translated → ICU parser will throw
Feeding r.text back into your catalog and running the build yields the runtime error every team recognises:
$ npm run build
SyntaxError: Expected "plural", "select", "selectordinal" but "Plural" found. (de.json: messages.fileCount)
The string was written back as needs-review, which looks safe, but the defect is structural, not stylistic — no human post-edit step protects the compiler from a string that never reaches a translator before CI runs.
Fix with annotated code block
Mask placeholders into DeepL’s XML tag format, translate with tag_handling: "xml" plus the glossary, unmask, then run the gate. Tokens inside <x/> tags are preserved verbatim by DeepL when ignore tags are declared.
import * as deepl from "deepl-node";
const translator = new deepl.Translator(process.env.DEEPL_KEY!);
// 1. Extract every ICU placeholder, tag, and the plural/select scaffold.
// We mask runtime args ({count}, <b>…</b>) but NOT the ICU keywords,
// which we strip out separately and reattach so DeepL never sees them.
const PLACEHOLDER = /(\{[^{}]+\}|<\/?[a-z][^>]*>)/gi;
function mask(src: string) {
const tokens: string[] = [];
const masked = src.replace(PLACEHOLDER, (m) => {
const id = tokens.push(m) - 1; // store original, index = id
return `<x id="${id}"/>`; // DeepL-preserved ignore tag
});
return { masked, tokens };
}
function unmask(text: string, tokens: string[]) {
return text.replace(/<x id="(\d+)"\/>/g, (_, id) => tokens[+id]); // restore verbatim
}
async function pretranslate(src: string) {
const { masked, tokens } = mask(src);
const res = await translator.translateText(masked, "en", "de", {
tagHandling: "xml",
ignoreTags: ["x"], // <x/> contents are never translated
glossary: process.env.DEEPL_GLOSSARY_ID, // term overrides, case-insensitive seed
});
return unmask(res.text, tokens);
}
For nested ICU ({count, plural, ...}) translate only the human-readable branch bodies, never the scaffold. Parse the message, run pretranslate on each leaf, and re-emit the structure so plural/one/other are reconstructed by your code, not the model. The gate below catches anything that still slips through.
// QA gate — runs AFTER unmask, BEFORE writing the catalog.
type GateResult = { ok: boolean; reason?: string };
function gate(source: string, target: string, glossary: Record<string, string>): GateResult {
// (a) Placeholder parity: same multiset of {args} and tags in/out.
const tokens = (s: string) => (s.match(PLACEHOLDER) ?? []).sort();
const a = tokens(source), b = tokens(target);
if (a.length !== b.length || a.some((t, i) => t !== b[i]))
return { ok: false, reason: "placeholder_parity" };
// (b) Length ratio: target wildly off vs. source = likely truncation/hallucination.
// 0.4–2.5 covers normal expansion (DE ~1.3×, FI ~1.4×); outside = reject.
const ratio = target.length / Math.max(source.length, 1);
if (ratio < 0.4 || ratio > 2.5) return { ok: false, reason: "length_ratio" };
// (c) Glossary adherence: every required target term must appear literally.
for (const [src, tgt] of Object.entries(glossary))
if (source.includes(src) && !target.toLowerCase().includes(tgt.toLowerCase()))
return { ok: false, reason: `glossary:${src}` };
return { ok: true };
}
const out = await pretranslate(source);
const verdict = gate(source, out, { Dashboard: "Übersicht" });
writeEntry(out, verdict.ok ? "needs-review" : "rejected", verdict.reason);
The key discipline: a passing string still becomes needs-review, never translated. The gate only decides whether MT output is worth showing a human, not whether it is correct.
Verification snippet
Assert that the gate rejects the exact defects DeepL produces and accepts a clean translation. This is the test that belongs in CI alongside your other i18n CI gates:
import { test, expect } from "vitest";
const G = { Dashboard: "Übersicht" };
test("rejects dropped placeholder", () => {
const r = gate("Hello {name}", "Hallo", G);
expect(r).toEqual({ ok: false, reason: "placeholder_parity" });
});
test("rejects glossary miss", () => {
const r = gate("Open the Dashboard", "Öffne das Panel", G);
expect(r.reason).toBe("glossary:Dashboard");
});
test("accepts clean expansion", () => {
const r = gate("Save {count} files", "{count} Dateien speichern", G);
expect(r.ok).toBe(true);
});
To prove ICU survives end to end, compile every pre-filled string with @formatjs/cli:
$ npx formatjs compile de.json --out-file /dev/null && echo "ICU OK"
ICU OK # non-zero exit = a needs-review string still contains broken ICU
When to escalate
This gate is a structural safety net, not a quality judge. It guarantees that no string with mismatched placeholders, absurd length, or a missing glossary term enters review — but a translation can clear all three checks and still be wrong in tone, register, or meaning. DeepL also has no reliable per-string confidence score, so the length ratio is a heuristic proxy, not a true probability. When pre-fill output passes the gate but reviewers keep rejecting it (high post-edit distance over several sprints), the answer is not a tighter ratio; it is more source context, a richer glossary, or human-first translation for that surface. At that point, route the strings back through the full machine-translation pre-fill workflow and reconsider whether MT belongs on that namespace at all.
FAQ
Why mask placeholders into <x/> tags instead of using DeepL’s HTML mode?
DeepL’s tag_handling: "xml" with ignoreTags gives you an explicit contract: tokens inside declared ignore tags are never translated or reordered. HTML mode infers structure and can still move or re-case real <b> tags. Masking everything — ICU args and markup — into uniform <x/> tags means a single, predictable round-trip, and your parity check compares the original tokens you stored, not whatever DeepL emits.
Can the gate run inside DeepL’s response, or must it be a separate step?
It must be separate. DeepL returns text only; it has no awareness of your ICU grammar or glossary intent beyond term seeding. The parity, length-ratio, and adherence checks all compare the unmasked output against your source string, so they can only run after the translation returns and tokens are restored. Treat the gate as a pure function you can unit-test independently of any network call.
What length-ratio bounds should I use for non-Latin or verbose languages?
The 0.4–2.5 default is deliberately wide. Tighten it per target: German and Finnish expand ~1.3–1.4×, while CJK targets contract sharply, so a 0.4 floor is appropriate there but a 2.5 ceiling is generous. Measure your own corpus — take the median source/target character ratio per locale and reject beyond roughly ±2 standard deviations rather than using one global band.
Related
- Machine-Translation Pre-fill Workflows — the full pipeline this gate plugs into, including write-back states and cost control.
- Enforcing glossary terms in CI — turn glossary adherence into a hard build gate beyond MT pre-fill.
- GitHub Actions i18n CI gates — where the verification tests above run on every pull request.
- Pluralization rules across languages — why ICU plural selectors must never be machine-translated.
- ICU MessageFormat syntax for complex plurals — the structure your mask/unmask logic must preserve.