PO / XLIFF Format Bridging

Q: Is converting between gettext PO and XLIFF 2.1 lossless?

Not by default. msgid and msgstr map cleanly, but plural arrays, the fuzzy flag, msgctxt, and comment types each need explicit handling. PO to XLIFF expands the model and XLIFF to PO narrows it, so the lossy risk is on the reverse leg. Pass the original PO as a template to xliff2po and gate CI on a semantic round-trip diff.

Q: Why do my plural forms break after a round trip?

PO encodes plurals as an integer-indexed array whose meaning comes from the Plural-Forms header, while XLIFF 2.1 expects each variant tagged with a CLDR category like one, few, many, other. A converter that ignores Plural-Forms cannot reconstruct the categories and usually keeps only index 0. Verify the header and post-process the XLIFF to emit one categorized segment per index.

Q: Which converter should I use, po2xliff or xliffmerge?

They solve different problems. Translate Toolkit's po2xliff and xliff2po are general format converters for the bridge itself. xliffmerge is an Angular-focused tool that reconciles newly extracted XLIFF units against existing translations during extraction; it merges, it does not convert formats. Use Translate Toolkit for PO to XLIFF and xliffmerge for XLIFF-to-XLIFF reconciliation.

Q: How do I map the fuzzy flag to an XLIFF state?

Map fuzzy to state needs-review-translation. XLIFF 2.1 uses initial, translated, reviewed, final on the segment, with subState for finer grain. The relationship is one-to-many: XLIFF reviewed and final states have no PO equivalent and both collapse to not fuzzy on the way back, so treat a return to PO as a loss of review precision and reconcile review status from the TMS.

Bridging gettext PO and XLIFF 2.1 is where translation pipelines silently lose data: a clean po2xliff followed by xliff2po round-trip drops translator comments, plural forms collapse, and state="needs-translation" reappears as untranslated, producing the classic msgstr[1] empty after import. This page maps the two formats field-by-field, identifies which attributes survive conversion, and shows how to build a lossless bridge between gettext-based tooling and XLIFF-native translation management systems. It sits inside the broader Translation Workflows & CI/CD Pipeline Sync discipline, where format conversion is the most common source of regressions between an engineering repository and a translator’s editor.

Field-by-field bridge: only msgid/msgstr map cleanly; plurals, comments, and flags require deliberate handling.

Prerequisites

Python 3.9+ with translate-toolkit 3.6+ installed (pip install translate-toolkit) — provides po2xliff and xliff2po.
A source PO file with a valid Plural-Forms: header (e.g. nplurals=2; plural=(n != 1);).
Target language tags as BCP 47 (RFC 5646) codes, not gettext-style underscores (pt-BR, not pt_BR) for XLIFF trgLang.
Node 18+ if you intend to use xliffmerge (Angular workspaces) in the same pipeline.
An XLIFF 2.1-aware editor or TMS for review (the OASIS 2.1 schema differs materially from the older 1.2).
CI runner with the ability to fail the build on a non-zero exit from the round-trip check (see Verification).

Concept & Spec

gettext PO and XLIFF describe the same idea — a source string paired with a translation — but their data models diverge enough that no conversion is automatically lossless. PO is a flat, line-oriented format defined by the GNU gettext manual: each entry is a msgid (source), a msgstr (target), optional plural arrays msgstr[0]…msgstr[n], and comment lines distinguished by prefix (#. extracted, #: reference, #, flag, # translator). XLIFF 2.1 is an XML vocabulary standardized by OASIS, structured as <xliff> → <file> → <unit> → <segment> → <source>/<target>, with translation status carried on the state attribute and commentary in <note> elements. The hierarchical, attribute-rich XLIFF model can express more than PO can, which is exactly why the PO-to-XLIFF direction is “expanding” and the XLIFF-to-PO direction is “narrowing” — and where data is silently discarded.

The lossy boundary follows from three structural mismatches. First, PO encodes plurals as an ordered array keyed by integer index, whose meaning depends on the file’s Plural-Forms formula; XLIFF expresses each plural variant as a separate <segment> or a separate <unit> annotated with a CLDR plural category (one, few, many, other). Mapping integer index → CLDR category requires re-deriving the category from the Plural-Forms rule, which most naive converters skip. Second, PO has a single flat msgctxt disambiguation field, while XLIFF uses <note category="..."> plus the unit id. Third, PO’s #, fuzzy flag is a boolean, while XLIFF’s state is an enumeration (initial, translated, reviewed, final) plus an optional subState — a one-to-many relationship that loses precision on the way back. This bridging discipline operationalizes the Translation Workflows & CI/CD Pipeline Sync goal of deterministic handoffs: if conversion is non-deterministic, your translation memory and review history drift on every sync.

The OASIS XLIFF 2.1 specification and the GNU gettext manual are the two authorities to keep open while building a bridge. XLIFF 2.1 makes <unit> the unit of extraction and <segment> the unit of translation, allowing a single source phrase to be split into several segments for translation memory leverage — a granularity PO has no concept of, since a PO entry is atomic. The reverse implication matters for round trips: if a TMS re-segments a <source> that originated from one msgid, naive re-joining on the way back can reorder or merge text. Pin segmentation off (one segment per unit) when PO is the canonical store. Likewise, gettext’s msgid_plural carries exactly one alternate source form, whereas XLIFF lets each plural category own a distinct <source>; collapsing those extra source forms back into PO’s binary singular/plural model is inherently lossy for languages whose CLDR rules exceed two categories.

A reference PO entry with the fields most at risk:

#. Shown on the cart page; {count} is the item total
#: src/cart/summary.ts:42
#, fuzzy
msgctxt "cart"
msgid "{count} item"
msgid_plural "{count} items"
msgstr[0] "{count} artigo"
msgstr[1] "{count} artigos"

The same content expressed as an XLIFF 2.1 unit, showing where each PO field lands:

<unit id="cart.item_count">
  <notes>
    <note category="developer">Shown on the cart page; {count} is the item total</note>
    <note category="location">src/cart/summary.ts:42</note>
  </notes>
  <segment state="needs-review-translation">
    <source>{count} item</source>
    <target>{count} artigo</target>
  </segment>
</unit>

Step-by-Step Implementation

Normalize the source PO first. Run msgcat --no-wrap over the input so multi-line msgstr blocks fold to a canonical shape. Converters that diff on whitespace will otherwise mark unchanged strings as modified, polluting the round-trip comparison.
```
msgcat --no-wrap locales/pt_BR/messages.po -o locales/pt_BR/messages.norm.po
```
Verify the Plural-Forms header is present and correct. The plural index → CLDR category mapping is derived from this header; a missing or wrong formula silently misroutes msgstr[1]. Confirm it before converting.
```
grep -i "Plural-Forms" locales/pt_BR/messages.norm.po
# expect: "Plural-Forms: nplurals=2; plural=(n > 1);\n"
```
Convert PO to XLIFF 2.1 with Translate Toolkit. Pin the version, set the explicit target language so trgLang is valid BCP 47, and keep the source for the reverse leg.
```
po2xliff --version=2.1 -l pt-BR \
  locales/pt_BR/messages.norm.po locales/pt_BR/messages.xlf
```
Inject plural categories the converter omits. If po2xliff flattens plurals to a single segment, post-process with a small script that reads Plural-Forms, evaluates each index, and emits one <segment> per CLDR category so the TMS sees one/other rather than [0]/[1].
```
python tools/expand_plurals.py \
  --in locales/pt_BR/messages.xlf --locale pt-BR --in-place
```
Hand off to the translation management system, then export the reviewed XLIFF. The TMS updates <target> and state. Pull the file back into the repo as a distinct artifact so the original is never overwritten in place.
```
crowdin download --format xliff -l pt-BR \
  --dest locales/pt_BR/messages.reviewed.xlf
```
Convert XLIFF back to PO against a template. Always pass the original PO as a template (-t) so references, ordering, and untranslated entries are preserved instead of being regenerated.
```
xliff2po -t locales/pt_BR/messages.norm.po \
  locales/pt_BR/messages.reviewed.xlf locales/pt_BR/messages.out.po
```
Run a semantic round-trip diff, not a byte diff. Compare msgid/msgstr pairs and plural arrays for equality while ignoring cosmetic differences, and fail CI on any field-level divergence (see Verification).

Configuration Reference

Option	Type	Description / default
`po2xliff --version`	enum `1.1\|1.2\|2.0\|2.1`	XLIFF schema version to emit. Default `1.1`; set `2.1` for OASIS 2.1 unit/segment model.
`po2xliff -l / --language`	BCP 47 string	Target language written to `trgLang`. Required; converts `pt_BR` intent to `pt-BR` form.
`xliff2po -t / --template`	path	Original PO used to preserve references, comments, and entry order on reverse conversion. Strongly recommended.
`--filteraction`	enum `none\|warn\|exclude\|error`	How Translate Toolkit handles entries failing filters. Default `none`. Use `error` in CI.
`--threshold` (po2xliff)	int 0–100	Minimum fuzzy match score to carry a `<target>`. Default unset (carry all).
`xliffmerge --i18nFormat`	enum `xlf\|xlf2\|xmb`	Output flavour for Angular workspaces. Use `xlf2` for XLIFF 2.x merge semantics.
`xliffmerge --removeUnusedIds`	boolean	Drop units absent from the new extraction. Default `true`; set `false` to retain history.
`msgcat --no-wrap`	flag	Disable line wrapping so converted output diffs cleanly. No default; pass explicitly.

Framework Variants

Angular (xliffmerge / `@angular/localize`)

Angular extracts to XLIFF natively via ng extract-i18n --format=xlf2, so the bridge is usually XLIFF-to-PO for teams whose translators prefer gettext editors. Use xliffmerge to reconcile newly extracted units against existing translations, then convert the merged XLIFF to PO only at the editor boundary. Keep XLIFF as the source of truth in the repo; the round-trip integrity rules here apply to the PO export, not the canonical store. For the extraction and merge configuration, see the Angular Localization Module Setup guide.

ng extract-i18n --format=xlf2 --output-path=src/locale
npx xliffmerge --profile xliffmerge.json en pt-BR

React / Next.js (formatjs, react-intl)

formatjs tooling speaks JSON and XLIFF, not PO directly. Bridge by exporting compiled messages to XLIFF, converting to PO with xliff2po for gettext-based reviewers, then converting back. Because formatjs message IDs are content-hashed, set the XLIFF <unit id> from the message id and never regenerate it, or the round trip will orphan every translation.

formatjs extract 'src/**/*.tsx' --out-file build/en.json
# bridge JSON -> XLIFF -> PO via a thin Translate Toolkit step

Vue / Nuxt (vue-i18n)

vue-i18n stores messages as JSON or YAML. Treat PO/XLIFF bridging as an import/export boundary at the TMS edge: convert JSON to PO with json2po, run the PO-to-XLIFF leg for the TMS, and reverse on the way back. Keep the JSON canonical so application code never reads PO at runtime.

Node.js backend (gettext-native services)

Services using node-gettext or gettext-parser keep .mo/.po as the runtime format. Here PO is canonical and XLIFF is the transient transport for the TMS. The lossy direction is therefore PO→XLIFF→PO, so always pass the original PO as the xliff2po template and gate the build on the round-trip diff. Self-hosted teams routing this through a TMS should follow the Weblate Self-Hosted Setup, which consumes PO and XLIFF components side by side.

Compile .po to binary .mo only after the round-trip gate passes, never before — msgfmt --check validates plural counts against the Plural-Forms header and will reject a file whose plurals were mangled in transit, catching corruption that a syntactic XML check on the XLIFF side cannot see. Keep .mo artifacts out of version control and regenerate them in CI so the PO file remains the single reviewed source.

msgfmt --check --check-format locales/pt_BR/messages.po -o locales/pt_BR/messages.mo

Verification

The build must fail when any msgid, msgstr, or plural variant diverges after a full round trip. Run the conversion both ways and compare semantically with pofilter/podiff from Translate Toolkit, plus a msgcmp check for missing entries:

#!/usr/bin/env bash
set -euo pipefail

po2xliff --version=2.1 -l pt-BR messages.norm.po /tmp/rt.xlf
xliff2po -t messages.norm.po /tmp/rt.xlf /tmp/rt.po

# msgcmp exits non-zero if msgids/plurals differ between the two files
msgcmp --use-untranslated /tmp/rt.po messages.norm.po

echo "Round-trip integrity OK"

Expected output on success is the single line Round-trip integrity OK with exit code 0. A divergence makes msgcmp print this message is used but not defined (or a plural-count warning) and exit non-zero, failing the job. Wire this into CI as a gate:

# .github/workflows/i18n.yml (excerpt)
- name: PO/XLIFF round-trip integrity
  run: ./scripts/roundtrip-check.sh

Common Pitfalls

Plural arrays collapse to a single segment. A naive po2xliff that does not consult Plural-Forms writes only msgstr[0], dropping every other variant. Expand to one CLDR-categorized segment per plural index; see Converting gettext PO to XLIFF 2.1 Without Data Loss for the index-to-category derivation.
Translator and developer comments merge or vanish. PO’s #. (extracted) and # (translator) comments must map to distinct <note category> values; converters that emit a single <note> lose the distinction and reviewers lose context.
#, fuzzy flattens irreversibly. Mapping fuzzy → state="needs-review-translation" is fine going out, but XLIFF’s richer final/reviewed states have no PO equivalent and collapse back to “not fuzzy”, silently promoting unreviewed strings.
BCP 47 vs gettext locale codes. pt_BR is valid in PO headers but invalid as XLIFF trgLang; emit pt-BR or schema validation rejects the file.
Reverse conversion without a template. Skipping xliff2po -t regenerates entry order and drops #: source references, producing noisy diffs and broken IDE jump-to-source.
Whitespace/wrapping churn. Differing line-wrap widths between tools mark unchanged strings as modified; normalize with msgcat --no-wrap before every conversion.

FAQ

Is converting between gettext PO and XLIFF 2.1 lossless?

Not by default. msgid and msgstr map cleanly, but plural arrays, the #, fuzzy flag, msgctxt, and the various comment types each need explicit handling. PO→XLIFF expands the model and XLIFF→PO narrows it, so the lossy risk concentrates on the reverse leg. Always pass the original PO as a template to xliff2po and gate CI on a semantic round-trip diff to make any loss visible.

Why do my plural forms break after a round trip?

PO encodes plurals as an integer-indexed array (msgstr[0], msgstr[1]) whose meaning is defined by the file’s Plural-Forms header, while XLIFF 2.1 expects each variant tagged with a CLDR category (one, few, many, other). A converter that ignores Plural-Forms cannot reconstruct the categories and typically keeps only index 0. Verify the header exists and post-process the XLIFF to emit one categorized segment per index.

Should PO or XLIFF be the source of truth in my repo?

Pick the format your runtime consumes natively. Node and Python gettext services keep PO canonical and treat XLIFF as transient TMS transport; Angular and formatjs projects keep XLIFF (or JSON) canonical and export PO only for translators who prefer gettext editors. Whichever you choose, never overwrite the canonical file in place during conversion — write a separate artifact and diff against it.

Which converter should I use, po2xliff or xliffmerge?

They solve different problems. Translate Toolkit’s po2xliff/xliff2po are general format converters for the bridge itself. xliffmerge is an Angular-focused tool that reconciles newly extracted XLIFF units against existing translations during extraction — it merges, it does not convert formats. Use Translate Toolkit for PO↔XLIFF and xliffmerge for XLIFF-to-XLIFF reconciliation in Angular workspaces.

How do I map the fuzzy flag to an XLIFF state?

Map #, fuzzy to state="needs-review-translation" (XLIFF 2.1 uses initial/translated/reviewed/final on <segment>, with subState for finer grain). The relationship is one-to-many: XLIFF’s reviewed and final states have no PO equivalent and both collapse to “not fuzzy” on the way back, so treat a return to PO as a loss of review precision and reconcile review status from the TMS, not the PO file.

Converting gettext PO to XLIFF 2.1 Without Data Loss — the plural-index-to-CLDR-category derivation and a loss-free converter script.
Angular Localization Module Setup — native XLIFF extraction and xliffmerge reconciliation for Angular workspaces.
Weblate Self-Hosted Setup — running PO and XLIFF components side by side in a self-hosted TMS.
Crowdin Integration for Dev Teams — pushing and pulling XLIFF artifacts through a managed translation platform.

Part of Translation Workflows & CI/CD Pipeline Sync.

PO / XLIFF Format Bridging ¶

Prerequisites ¶

Concept & Spec ¶

Step-by-Step Implementation ¶

Configuration Reference ¶

Framework Variants ¶

Angular (xliffmerge / @angular/localize) ¶

React / Next.js (formatjs, react-intl) ¶

Vue / Nuxt (vue-i18n) ¶

Node.js backend (gettext-native services) ¶

Verification ¶

Common Pitfalls ¶

FAQ ¶

Related ¶