api/extract.php — new endpoint accepting .pdf/.docx/.txt up to 4 MB;
pdftotext for PDFs, ZipArchive+DOMXPath for DOCX, mb_convert_encoding
for TXT; truncates to 32 000 chars to stay within redact limit.
index.php — drop/browse upload zone above the textarea, visible only
in Redact mode.
tools.js — setupUpload(), handleFileUpload(), resetUpload(); drag-and-drop
and file picker both call the extract endpoint then populate the textarea.
tools.css — upload zone, drag-over, file-info, clear button styles.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- index.php: public showcase landing page (hero, how-it-works, capabilities,
evidence mock, login form) visible to unauthenticated visitors; full OG/SEO
meta; app shell hidden behind auth as before
- tools.css: showcase section styles (gradient hero, step cards, capability
grid, CTA button, evidence mock, footer)
- LegalTools.php: sourceFromChunk() batch-fetches doc_summaries from RAG DB
for non-private chunks; excerpt shows doc summary when available, falls back
to raw chunk text; chunk_text field always carries the raw excerpt
- tools.js: renderEvidenceItem() shows doc summary as card body; adds a
collapsible "View chunk" toggle when summary differs from raw chunk text
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pass 1: deterministic regex with Nordic/European/ECHR/Global packs
covering fødselsnummer, Swedish personnummer, Danish/Finnish CPR,
UK NI, French INSEE, IBAN, EU phones, ECHR application numbers, DOB,
and national ID label patterns.
Pass 2: LLM semantic scan (Azure OpenAI) finds names, orgs, places
and identifying descriptions missed by regex. Runs on pre-redacted
text so no raw PII reaches the LLM.
Adds region selector (Nordic/European/ECHR/Global) to the Redact UI.
Falls back gracefully when Azure is not yet configured.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>