Two-pass PII redaction with multi-country pattern packs
Pass 1: deterministic regex with Nordic/European/ECHR/Global packs covering fødselsnummer, Swedish personnummer, Danish/Finnish CPR, UK NI, French INSEE, IBAN, EU phones, ECHR application numbers, DOB, and national ID label patterns. Pass 2: LLM semantic scan (Azure OpenAI) finds names, orgs, places and identifying descriptions missed by regex. Runs on pre-redacted text so no raw PII reaches the LLM. Adds region selector (Nordic/European/ECHR/Global) to the Redact UI. Falls back gracefully when Azure is not yet configured. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -195,6 +195,12 @@ function dbnToolsNormalizeLanguage(mixed $value): string
|
||||
return in_array($language, ['no', 'en'], true) ? $language : 'en';
|
||||
}
|
||||
|
||||
function dbnToolsNormalizeRegion(mixed $value): string
|
||||
{
|
||||
$region = strtolower(trim((string)$value));
|
||||
return in_array($region, ['nordic', 'european', 'echr', 'global'], true) ? $region : 'nordic';
|
||||
}
|
||||
|
||||
function dbnToolsString(array $input, string $key, int $maxChars, bool $required = true): string
|
||||
{
|
||||
$value = trim((string)($input[$key] ?? ''));
|
||||
|
||||
Reference in New Issue
Block a user