PO / XLIFF Format Bridging
Bridging gettext PO and XLIFF 2.1 is where translation pipelines silently lose data: a clean po2xliff followed by xliff2po round-trip drops translator comments, plural forms collapse, and state="needs-translation" reappears as untranslated, producing the classic msgstr[1] empty after import. This page maps the two formats field-by-field, identifies which attributes survive conversion, and shows how to build a lossless bridge between gettext-based tooling and XLIFF-native translation management systems. It sits inside the broader Translation Workflows & CI/CD Pipeline Sync discipline, where format conversion is the most common source of regressions between an engineering repository and a translator’s editor.
Prerequisites
Concept & Spec
gettext PO and XLIFF describe the same idea — a source string paired with a translation — but their data models diverge enough that no conversion is automatically lossless. PO is a flat, line-oriented format defined by the GNU gettext manual: each entry is a msgid (source), a msgstr (target), optional plural arrays msgstr[0]…msgstr[n], and comment lines distinguished by prefix (#. extracted, #: reference, #, flag, # translator). XLIFF 2.1 is an XML vocabulary standardized by OASIS, structured as <xliff> → <file> → <unit> → <segment> → <source>/<target>, with translation status carried on the state attribute and commentary in <note> elements. The hierarchical, attribute-rich XLIFF model can express more than PO can, which is exactly why the PO-to-XLIFF direction is “expanding” and the XLIFF-to-PO direction is “narrowing” — and where data is silently discarded.
The lossy boundary follows from three structural mismatches. First, PO encodes plurals as an ordered array keyed by integer index, whose meaning depends on the file’s Plural-Forms formula; XLIFF expresses each plural variant as a separate <segment> or a separate <unit> annotated with a CLDR plural category (one, few, many, other). Mapping integer index → CLDR category requires re-deriving the category from the Plural-Forms rule, which most naive converters skip. Second, PO has a single flat msgctxt disambiguation field, while XLIFF uses <note category="..."> plus the unit id. Third, PO’s #, fuzzy flag is a boolean, while XLIFF’s state is an enumeration (initial, translated, reviewed, final) plus an optional subState — a one-to-many relationship that loses precision on the way back. This bridging discipline operationalizes the Translation Workflows & CI/CD Pipeline Sync goal of deterministic handoffs: if conversion is non-deterministic, your translation memory and review history drift on every sync.
The OASIS XLIFF 2.1 specification and the GNU gettext manual are the two authorities to keep open while building a bridge. XLIFF 2.1 makes <unit> the unit of extraction and <segment> the unit of translation, allowing a single source phrase to be split into several segments for translation memory leverage — a granularity PO has no concept of, since a PO entry is atomic. The reverse implication matters for round trips: if a TMS re-segments a <source> that originated from one msgid, naive re-joining on the way back can reorder or merge text. Pin segmentation off (one segment per unit) when PO is the canonical store. Likewise, gettext’s msgid_plural carries exactly one alternate source form, whereas XLIFF lets each plural category own a distinct <source>; collapsing those extra source forms back into PO’s binary singular/plural model is inherently lossy for languages whose CLDR rules exceed two categories.
A reference PO entry with the fields most at risk:
#. Shown on the cart page; {count} is the item total
#: src/cart/summary.ts:42
#, fuzzy
msgctxt "cart"
msgid "{count} item"
msgid_plural "{count} items"
msgstr[0] "{count} artigo"
msgstr[1] "{count} artigos"
The same content expressed as an XLIFF 2.1 unit, showing where each PO field lands:
<unit id="cart.item_count">
<notes>
<note category="developer">Shown on the cart page; {count} is the item total</note>
<note category="location">src/cart/summary.ts:42</note>
</notes>
<segment state="needs-review-translation">
<source>{count} item</source>
<target>{count} artigo</target>
</segment>
</unit>
Step-by-Step Implementation
-
Normalize the source PO first. Run
msgcat --no-wrapover the input so multi-linemsgstrblocks fold to a canonical shape. Converters that diff on whitespace will otherwise mark unchanged strings as modified, polluting the round-trip comparison.msgcat --no-wrap locales/pt_BR/messages.po -o locales/pt_BR/messages.norm.po -
Verify the
Plural-Formsheader is present and correct. The plural index → CLDR category mapping is derived from this header; a missing or wrong formula silently misroutesmsgstr[1]. Confirm it before converting.grep -i "Plural-Forms" locales/pt_BR/messages.norm.po # expect: "Plural-Forms: nplurals=2; plural=(n > 1);\n" -
Convert PO to XLIFF 2.1 with Translate Toolkit. Pin the version, set the explicit target language so
trgLangis valid BCP 47, and keep the source for the reverse leg.po2xliff --version=2.1 -l pt-BR \ locales/pt_BR/messages.norm.po locales/pt_BR/messages.xlf -
Inject plural categories the converter omits. If
po2xliffflattens plurals to a single segment, post-process with a small script that readsPlural-Forms, evaluates each index, and emits one<segment>per CLDR category so the TMS seesone/otherrather than[0]/[1].python tools/expand_plurals.py \ --in locales/pt_BR/messages.xlf --locale pt-BR --in-place -
Hand off to the translation management system, then export the reviewed XLIFF. The TMS updates
<target>andstate. Pull the file back into the repo as a distinct artifact so the original is never overwritten in place.crowdin download --format xliff -l pt-BR \ --dest locales/pt_BR/messages.reviewed.xlf -
Convert XLIFF back to PO against a template. Always pass the original PO as a template (
-t) so references, ordering, and untranslated entries are preserved instead of being regenerated.xliff2po -t locales/pt_BR/messages.norm.po \ locales/pt_BR/messages.reviewed.xlf locales/pt_BR/messages.out.po -
Run a semantic round-trip diff, not a byte diff. Compare
msgid/msgstrpairs and plural arrays for equality while ignoring cosmetic differences, and fail CI on any field-level divergence (see Verification).
Configuration Reference
| Option | Type | Description / default |
|---|---|---|
po2xliff --version |
enum 1.1|1.2|2.0|2.1 |
XLIFF schema version to emit. Default 1.1; set 2.1 for OASIS 2.1 unit/segment model. |
po2xliff -l / --language |
BCP 47 string | Target language written to trgLang. Required; converts pt_BR intent to pt-BR form. |
xliff2po -t / --template |
path | Original PO used to preserve references, comments, and entry order on reverse conversion. Strongly recommended. |
--filteraction |
enum none|warn|exclude|error |
How Translate Toolkit handles entries failing filters. Default none. Use error in CI. |
--threshold (po2xliff) |
int 0–100 | Minimum fuzzy match score to carry a <target>. Default unset (carry all). |
xliffmerge --i18nFormat |
enum xlf|xlf2|xmb |
Output flavour for Angular workspaces. Use xlf2 for XLIFF 2.x merge semantics. |
xliffmerge --removeUnusedIds |
boolean | Drop units absent from the new extraction. Default true; set false to retain history. |
msgcat --no-wrap |
flag | Disable line wrapping so converted output diffs cleanly. No default; pass explicitly. |
Framework Variants
Angular (xliffmerge / @angular/localize)
Angular extracts to XLIFF natively via ng extract-i18n --format=xlf2, so the bridge is usually XLIFF-to-PO for teams whose translators prefer gettext editors. Use xliffmerge to reconcile newly extracted units against existing translations, then convert the merged XLIFF to PO only at the editor boundary. Keep XLIFF as the source of truth in the repo; the round-trip integrity rules here apply to the PO export, not the canonical store. For the extraction and merge configuration, see the Angular Localization Module Setup guide.
ng extract-i18n --format=xlf2 --output-path=src/locale
npx xliffmerge --profile xliffmerge.json en pt-BR
React / Next.js (formatjs, react-intl)
formatjs tooling speaks JSON and XLIFF, not PO directly. Bridge by exporting compiled messages to XLIFF, converting to PO with xliff2po for gettext-based reviewers, then converting back. Because formatjs message IDs are content-hashed, set the XLIFF <unit id> from the message id and never regenerate it, or the round trip will orphan every translation.
formatjs extract 'src/**/*.tsx' --out-file build/en.json
# bridge JSON -> XLIFF -> PO via a thin Translate Toolkit step
Vue / Nuxt (vue-i18n)
vue-i18n stores messages as JSON or YAML. Treat PO/XLIFF bridging as an import/export boundary at the TMS edge: convert JSON to PO with json2po, run the PO-to-XLIFF leg for the TMS, and reverse on the way back. Keep the JSON canonical so application code never reads PO at runtime.
Node.js backend (gettext-native services)
Services using node-gettext or gettext-parser keep .mo/.po as the runtime format. Here PO is canonical and XLIFF is the transient transport for the TMS. The lossy direction is therefore PO→XLIFF→PO, so always pass the original PO as the xliff2po template and gate the build on the round-trip diff. Self-hosted teams routing this through a TMS should follow the Weblate Self-Hosted Setup, which consumes PO and XLIFF components side by side.
Compile .po to binary .mo only after the round-trip gate passes, never before — msgfmt --check validates plural counts against the Plural-Forms header and will reject a file whose plurals were mangled in transit, catching corruption that a syntactic XML check on the XLIFF side cannot see. Keep .mo artifacts out of version control and regenerate them in CI so the PO file remains the single reviewed source.
msgfmt --check --check-format locales/pt_BR/messages.po -o locales/pt_BR/messages.mo
Verification
The build must fail when any msgid, msgstr, or plural variant diverges after a full round trip. Run the conversion both ways and compare semantically with pofilter/podiff from Translate Toolkit, plus a msgcmp check for missing entries:
#!/usr/bin/env bash
set -euo pipefail
po2xliff --version=2.1 -l pt-BR messages.norm.po /tmp/rt.xlf
xliff2po -t messages.norm.po /tmp/rt.xlf /tmp/rt.po
# msgcmp exits non-zero if msgids/plurals differ between the two files
msgcmp --use-untranslated /tmp/rt.po messages.norm.po
echo "Round-trip integrity OK"
Expected output on success is the single line Round-trip integrity OK with exit code 0. A divergence makes msgcmp print this message is used but not defined (or a plural-count warning) and exit non-zero, failing the job. Wire this into CI as a gate:
# .github/workflows/i18n.yml (excerpt)
- name: PO/XLIFF round-trip integrity
run: ./scripts/roundtrip-check.sh
Common Pitfalls
- Plural arrays collapse to a single segment. A naive
po2xliffthat does not consultPlural-Formswrites onlymsgstr[0], dropping every other variant. Expand to one CLDR-categorized segment per plural index; see Converting gettext PO to XLIFF 2.1 Without Data Loss for the index-to-category derivation. - Translator and developer comments merge or vanish. PO’s
#.(extracted) and#(translator) comments must map to distinct<note category>values; converters that emit a single<note>lose the distinction and reviewers lose context. #, fuzzyflattens irreversibly. Mappingfuzzy→state="needs-review-translation"is fine going out, but XLIFF’s richerfinal/reviewedstates have no PO equivalent and collapse back to “not fuzzy”, silently promoting unreviewed strings.- BCP 47 vs gettext locale codes.
pt_BRis valid in PO headers but invalid as XLIFFtrgLang; emitpt-BRor schema validation rejects the file. - Reverse conversion without a template. Skipping
xliff2po -tregenerates entry order and drops#:source references, producing noisy diffs and broken IDE jump-to-source. - Whitespace/wrapping churn. Differing line-wrap widths between tools mark unchanged strings as modified; normalize with
msgcat --no-wrapbefore every conversion.
FAQ
Is converting between gettext PO and XLIFF 2.1 lossless?
Not by default. msgid and msgstr map cleanly, but plural arrays, the #, fuzzy flag, msgctxt, and the various comment types each need explicit handling. PO→XLIFF expands the model and XLIFF→PO narrows it, so the lossy risk concentrates on the reverse leg. Always pass the original PO as a template to xliff2po and gate CI on a semantic round-trip diff to make any loss visible.
Why do my plural forms break after a round trip?
PO encodes plurals as an integer-indexed array (msgstr[0], msgstr[1]) whose meaning is defined by the file’s Plural-Forms header, while XLIFF 2.1 expects each variant tagged with a CLDR category (one, few, many, other). A converter that ignores Plural-Forms cannot reconstruct the categories and typically keeps only index 0. Verify the header exists and post-process the XLIFF to emit one categorized segment per index.
Should PO or XLIFF be the source of truth in my repo?
Pick the format your runtime consumes natively. Node and Python gettext services keep PO canonical and treat XLIFF as transient TMS transport; Angular and formatjs projects keep XLIFF (or JSON) canonical and export PO only for translators who prefer gettext editors. Whichever you choose, never overwrite the canonical file in place during conversion — write a separate artifact and diff against it.
Which converter should I use, po2xliff or xliffmerge?
They solve different problems. Translate Toolkit’s po2xliff/xliff2po are general format converters for the bridge itself. xliffmerge is an Angular-focused tool that reconciles newly extracted XLIFF units against existing translations during extraction — it merges, it does not convert formats. Use Translate Toolkit for PO↔XLIFF and xliffmerge for XLIFF-to-XLIFF reconciliation in Angular workspaces.
How do I map the fuzzy flag to an XLIFF state?
Map #, fuzzy to state="needs-review-translation" (XLIFF 2.1 uses initial/translated/reviewed/final on <segment>, with subState for finer grain). The relationship is one-to-many: XLIFF’s reviewed and final states have no PO equivalent and both collapse to “not fuzzy” on the way back, so treat a return to PO as a loss of review precision and reconcile review status from the TMS, not the PO file.
Related
- Converting gettext PO to XLIFF 2.1 Without Data Loss — the plural-index-to-CLDR-category derivation and a loss-free converter script.
- Angular Localization Module Setup — native XLIFF extraction and
xliffmergereconciliation for Angular workspaces. - Weblate Self-Hosted Setup — running PO and XLIFF components side by side in a self-hosted TMS.
- Crowdin Integration for Dev Teams — pushing and pulling XLIFF artifacts through a managed translation platform.