Region-to-Language Fallback Without a 404
A request for fr-CA when only a fr bundle exists should degrade to French and then to your default locale — never return a 404 or a blank page. This page traces why region-only requests hard-fail, applies the RFC 4647 lookup algorithm to build a deterministic fr-CA → fr → en chain, and shows where to run that resolution — at the edge/CDN as a rewrite or in the app as a resolver — so an unmatched region tag always lands on a real page.
The failure is specific: a user in Quebec hits /fr-CA/pricing, your router has routes for /fr/ and /en/ but not /fr-CA/, and the framework returns its catch-all 404 instead of serving the perfectly good French page. The same bug appears server-side as res.status(404) when Accept-Language: fr-CA finds no exact bundle, and client-side as a blank <RouterView> because no route segment matched. In every case the cause is identical: the matcher did exact-match-or-nothing instead of progressively truncating the region subtag.
Root cause: exact matching where the spec calls for lookup
A BCP 47 language tag like fr-CA is hierarchical — fr is the language subtag and CA is the region subtag. RFC 4647 §3.4 defines a lookup matching scheme precisely for this case: when the full tag has no match, you progressively remove the rightmost subtag and retry, until you reach the default range *. So fr-CA is meant to fall back to fr, and zh-Hant-TW to zh-Hant then zh. A router or resolver that keys off the literal string fr-CA and treats a miss as “not found” is implementing the wrong RFC 4647 scheme — filtering or naive equality — and that mismatch is the entire bug.
Two layers commonly get this wrong. The HTTP router (Next.js middleware, Express path matching, an Astro route) does string-equality on the path segment, so /fr-CA/ 404s because no static or dynamic segment was generated for it. Separately, the translation resolver may itself lack a fallback, but that is the graceful fallback chain for missing strings problem — a missing key within a present locale. Region-to-language fallback is upstream of that: it decides which locale page to serve before any key is looked up, and it is the same truncation logic that the locale negotiation strategies resolver applies to the Accept-Language header.
Minimal reproducible example
The smallest reproduction is a locale list that holds only base languages plus an exact-match guard. Any region-suffixed request walks straight into the 404 branch:
const SUPPORTED = ['en', 'fr', 'de']; // base languages only — no fr-CA, no de-AT
function resolveLocale(requested: string): string {
if (SUPPORTED.includes(requested)) return requested;
// exact-match-or-nothing — this is the bug:
throw new Response('Not Found', { status: 404 });
}
resolveLocale('fr-CA'); // → throws 404, even though 'fr' is right there
In a Next.js App Router project the same defect shows up as middleware that redirects only when the first segment is in the supported list, otherwise letting the request fall through to the framework’s catch-all not-found.tsx. The page is blank not because French content is missing, but because fr-CA never got truncated to fr.
The fix: an RFC 4647 lookup resolver
The corrected resolver truncates the requested tag one subtag at a time and returns the first supported match, defaulting only when every truncation misses. It never throws on an unmatched region.
// locale-lookup.ts
const SUPPORTED = ['en', 'fr', 'de', 'pt-BR']; // base langs + a few real regional bundles
const DEFAULT_LOCALE = 'en';
/** RFC 4647 §3.4 "lookup": drop the rightmost subtag until a supported tag is found. */
export function lookupLocale(requested: string): string {
// Normalize case: language subtag lowercase, region uppercase (BCP 47 canonical form).
let tag = requested.trim();
const supported = new Set(SUPPORTED.map((s) => s.toLowerCase()));
while (tag) {
if (supported.has(tag.toLowerCase())) {
// Return the canonically-cased supported entry, not the requester's casing:
return SUPPORTED.find((s) => s.toLowerCase() === tag.toLowerCase())!;
}
const lastDash = tag.lastIndexOf('-');
if (lastDash === -1) break; // no more subtags to drop
tag = tag.slice(0, lastDash); // fr-CA → fr, zh-Hant-TW → zh-Hant → zh
}
return DEFAULT_LOCALE; // every tier missed → default, NEVER a 404
}
lookupLocale('fr-CA'); // → 'fr' (truncated, no 404)
lookupLocale('pt-BR'); // → 'pt-BR' (exact regional bundle exists)
lookupLocale('pt-PT'); // → 'en' (pt-PT and pt both absent → default)
lookupLocale('de-AT'); // → 'de' (region dropped to base)
The two non-obvious details: truncation operates on the rightmost subtag only (so zh-Hant-TW correctly tries zh-Hant before zh, preserving the script subtag), and the function returns the canonically-cased supported string rather than echoing the caller’s fr-ca, so downstream bundle keys and URL prefixes stay consistent.
Edge/CDN rewrite vs app resolver
Run this where it is cheapest and most cacheable. Two placements, with a clear trade-off:
// edge-middleware.ts — Next.js / Vercel / Cloudflare Workers
import { lookupLocale } from './locale-lookup';
export function middleware(request: Request) {
const url = new URL(request.url);
const [, first, ...rest] = url.pathname.split('/');
const resolved = lookupLocale(first || '');
if (resolved !== first) {
// Rewrite (not 301) so the URL the user typed still works, but the
// origin only ever sees a supported prefix — fr-CA path serves /fr/ content.
url.pathname = `/${resolved}/${rest.join('/')}`;
return Response.redirect(url, 307); // or rewrite() to keep the visible URL
}
}
A CDN/edge rewrite resolves the region before the request reaches the origin, so the cache key collapses (/fr-CA/* and /fr/* share one cached object) and the origin never renders a 404 path. The cost is that the rewrite layer needs the supported-locale list at the edge. An app-level resolver keeps all logic in one runtime and can read richer signals (cookies, user record), but every region variant hits the origin and may fragment the cache. Use a 307 redirect when you want the canonical /fr/ URL in the address bar, or an internal rewrite when you want /fr-CA/ to stay visible while serving /fr/ content. Either way, always emit a Content-Language: fr response header and a Vary: Accept-Language header so caches and crawlers see the locale that was actually served.
Verification
Assert that every region-only request resolves to a real locale and that no input path can produce a 404 from the resolver itself:
import { describe, it, expect } from 'vitest';
import { lookupLocale } from './locale-lookup';
describe('region-to-language lookup', () => {
it('truncates an unmatched region to its base language', () => {
expect(lookupLocale('fr-CA')).toBe('fr');
expect(lookupLocale('de-AT')).toBe('de');
});
it('keeps an exact regional bundle when one exists', () => {
expect(lookupLocale('pt-BR')).toBe('pt-BR');
});
it('falls back to default, never throws, for fully unmatched tags', () => {
expect(lookupLocale('pt-PT')).toBe('en');
expect(() => lookupLocale('xx-YY-ZZ')).not.toThrow();
});
});
Then prove it end-to-end against the running app — a region-only path must answer 200 with the base language served, not 404:
# fr-CA must resolve to French content, not a 404
curl -s -o /dev/null -w '%{http_code} %{redirect_url}\n' https://example.com/fr-CA/pricing
# Confirm the served locale is announced for caches/crawlers:
curl -sI https://example.com/fr-CA/pricing | grep -iE 'content-language|vary'
# Fail CI if any region path returns a 4xx:
for loc in fr-CA de-AT pt-PT zh-Hant-TW; do
code=$(curl -s -o /dev/null -w '%{http_code}' "https://example.com/$loc/pricing")
[ "$code" -ge 400 ] && { echo "FAIL $loc → $code"; exit 1; }
done
echo 'all region paths resolve OK'
When to escalate
Lookup truncation fixes the routing 404, but it does not fix content that genuinely should differ by region. If fr-CA and fr-FR must show different currency, legal copy, or spelling, silently serving fr to a Canadian user is a product gap, not a bug the resolver can close — you need a real fr-CA bundle that overrides only the divergent keys on top of fr. That work belongs to the broader Fallback Chain Configuration policy and the per-key cascade in graceful fallback chains for missing strings. Escalate when fallback telemetry shows a region consistently degrading: that signals demand for a dedicated regional bundle, and the decision moves from engineering into localization planning.
FAQ
Why does requesting fr-CA return a 404 when an fr page clearly exists?
Because the router or resolver is doing exact string matching on fr-CA and treating “no exact route” as “not found”, instead of the RFC 4647 lookup scheme. Lookup says: drop the rightmost subtag and retry, so fr-CA should truncate to fr. Implement progressive truncation (fr-CA → fr → default) and the existing French page resolves with a 200 instead of a 404.
Should I redirect fr-CA to fr, or rewrite it internally?
Use a 307 redirect when you want the canonical /fr/ URL visible in the address bar and indexed; use an internal rewrite when /fr-CA/ should stay in the URL while serving /fr/ content. Doing the resolution at the edge/CDN is preferable either way because it collapses the cache key and keeps the origin from ever rendering a 404 path. Always send Content-Language and Vary: Accept-Language so caches serve the right variant.
Does lookup truncation ever lose the script subtag, like zh-Hant?
No, as long as you truncate one subtag at a time from the right. zh-Hant-TW first drops TW to try zh-Hant (Traditional Chinese), and only drops Hant to reach zh if zh-Hant is also absent. Stripping everything down to the language in one step is the common mistake — it would serve Simplified zh to a Traditional reader. Truncate incrementally and the script tier is preserved.
Related
- Fallback Chain Configuration — the parent policy covering cascade tiers, regional overrides, and persistent-fallback escalation.
- Setting up graceful fallback chains for missing strings — the per-key cascade that runs after this resolver picks a locale.
- Implementing locale negotiation in Express.js — applying the same truncation to the
Accept-Languageheader. - Handling pluralization in Arabic and Slavic languages — why a base-language fallback can still pick the wrong plural form.
Part of Fallback Chain Configuration.