Two-pass PII redaction with multi-country pattern packs

Pass 1: deterministic regex with Nordic/European/ECHR/Global packs
covering fødselsnummer, Swedish personnummer, Danish/Finnish CPR,
UK NI, French INSEE, IBAN, EU phones, ECHR application numbers, DOB,
and national ID label patterns.

Pass 2: LLM semantic scan (Azure OpenAI) finds names, orgs, places
and identifying descriptions missed by regex. Runs on pre-redacted
text so no raw PII reaches the LLM.

Adds region selector (Nordic/European/ECHR/Global) to the Redact UI.
Falls back gracefully when Azure is not yet configured.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-05-07 01:27:52 +02:00
parent 2d8d1c7409
commit 9b22947eb2
5 changed files with 229 additions and 42 deletions
+6
View File
@@ -195,6 +195,12 @@ function dbnToolsNormalizeLanguage(mixed $value): string
return in_array($language, ['no', 'en'], true) ? $language : 'en';
}
function dbnToolsNormalizeRegion(mixed $value): string
{
$region = strtolower(trim((string)$value));
return in_array($region, ['nordic', 'european', 'echr', 'global'], true) ? $region : 'nordic';
}
function dbnToolsString(array $input, string $key, int $maxChars, bool $required = true): string
{
$value = trim((string)($input[$key] ?? ''));