Commit Graph

172 Commits

Author SHA1 Message Date
daveadmin c77efa241c feat(transcribe): English UI default, language switcher (NO/UK/PL), fix 504 timeout
- Default UI language to English; lang switcher (EN/NO/UK/PL) persisted in localStorage
- Rename 'rettssak/tingrett' preset to 'Mediation / legal meeting' — court recording is illegal
- Add Ukrainian (uk) and Polish (pl) as selectable audio transcription languages
- TRANSCRIBE_I18N translation object drives all status messages, labels, and trace text
- Apache ProxyTimeout raised to 1800s on server (was 300s — caused 504 on large files)
- set_time_limit(0) + ignore_user_abort(true) in api/transcribe.php
- applyTranscribeI18n() patches data-i18n / data-i18n-placeholder / data-i18n-aria attrs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-14 22:47:32 +02:00
daveadmin 26f4e2231b feat(transcribe): Norwegian defaults, vocabulary presets, multi-file court day queue
- Default language → nb (Bokmål); auto-detect demoted with warning note
- Default model → large-v3; VAD filter on by default
- Vocabulary prompt promoted to main form with 4 preset buttons
  (Barnerett/CPS, Rettssak/tingrett, Generell norsk, Egendefinert)
- Multi-file upload queue: drop/select multiple clips, numbered list UI
- Sequential queue processing with cumulative time_offset per clip
- Backend shifts segment timestamps so SRT/VTT covers full court day
- Merged transcript + segments across all clips for single download

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-14 22:20:11 +02:00
daveadmin df31674f2e SSO integration: validate dobetternorge.no signed tokens, update landing page
- bootstrap.php: dbnToolsValidateSsoToken(), SSO session check in dbnToolsIsAuthenticated()
- index.php: SSO handler at top, Do Better Norge member panel in login card
- .env: DBN_SSO_SECRET placeholder

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-14 18:47:05 +02:00
daveadmin eaff2a4d86 Per-tool pages + multi-engine transcribe with expert controls
- Split monolithic index.php into per-tool pages (ask, search, summarize,
  timeline, redact, transcribe), each with its own URL and bookmarkable state
- Shared shell: includes/layout.php + layout_footer.php; shared form:
  includes/tool_form.php used by all text-tool pages
- index.php now redirects authenticated users to ask.php; unauthenticated
  users see the login gate only
- transcribe.php: engine selector (GPU/OpenAI/Azure), model size (small/
  medium/large-v3), diarize, language, expert settings (beam, VAD, task,
  initial prompt)
- api/transcribe.php: engine routing — GPU (cuttlefish), OpenAI BYOK,
  Azure AI Speech; passes model/beam/task/vad/prompt to Whisper server
- tools.js: data-active-tool body attr drives setTool() on load; <a> nav
  tabs skip click listeners; null guards on form/passcodeForm; engine radio
  toggle shows/hides BYOK key inputs and model selector; RTF shown in status
- tools.css: styles for BYOK inputs, expert settings panel, prompt textarea

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 22:14:20 +02:00
daveadmin d178fbf295 Fix double file picker on audioZone click
Guard against INPUT clicks bubbling up to zone handler,
which caused the file picker to open twice.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 20:28:24 +02:00
daveadmin 59644414f5 Fix transcribe form blocked by required textarea validation
The hidden textarea still had required=true, so browser-native form
validation silently blocked submit when no audio was the only input.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 19:19:30 +02:00
daveadmin aa2d64b599 Add transcribe progress indicator — elapsed timer and progressive trace messages
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 19:12:09 +02:00
daveadmin e81b8f8ecc Webhook: full paths for nohup/bash in FPM environment
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 18:59:14 +02:00
daveadmin 6f91bfb575 Fix webhook: absolute paths, nohup exec
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 18:54:30 +02:00
daveadmin 8d116828f5 Add Gitea push webhook endpoint for automated deploy
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 18:51:54 +02:00
daveadmin d425c99e8e Transcribe: audio-to-text tool with diarization and speaker role labelling
New sixth tool in the hub. Accepts MP3/WAV/OGG/M4A/FLAC/WEBM up to 200 MB,
proxies to Whisper on cuttlefish GPU. Optional speaker separation with LLM
role labelling (dommer, advokat, forelder, sakkyndig, etc. via GPT-4o-mini).
Client-side TXT / SRT / VTT download from segment data.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 18:43:22 +02:00
daveadmin bddafea049 Timeline: document upload, upgraded prompt, CSV export, date_type badge 2026-05-13 08:10:40 +02:00
daveadmin 634a4fa154 Raise MAX_PASTE_CHARS to 128K and redaction max_tokens to 8000 2026-05-13 07:41:41 +02:00
daveadmin 95685862ab Redact: multi-doc upload, contextual person naming, aliases
- Extract limit raised from 32K to 128K chars per file (long legal docs now fit)
- Redact API body/text limits raised (400KB / 128K chars) to match
- Upload zone accepts multiple files (up to 5); extracted text concatenated with
  doc separator and combined before redaction; shows per-file char counts
- LLM redact pass now infers contextual person roles (FATHER, MOTHER, CHILD,
  ATTORNEY, JUDGE, etc.) instead of generic [PERSON] for all names; same
  individual gets consistent tag throughout the document
- Tag validation widened to allow any [A-Za-z0-9_- ] pattern (not just the
  five hardcoded tags), supporting contextual and alias tags
- Alias UI added to Redact mode: user maps real names to bracketed aliases
  (e.g. "David Jr" -> [Junior]); aliases injected into LLM system prompt as
  override instructions; max 20 aliases, 100 chars each
- max_tokens raised from 2000 to 4000; timeout from 60s to 90s for larger docs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 07:17:02 +02:00
daveadmin bbe5307c03 Add document upload to Redact tool
api/extract.php — new endpoint accepting .pdf/.docx/.txt up to 4 MB;
pdftotext for PDFs, ZipArchive+DOMXPath for DOCX, mb_convert_encoding
for TXT; truncates to 32 000 chars to stay within redact limit.

index.php — drop/browse upload zone above the textarea, visible only
in Redact mode.

tools.js — setupUpload(), handleFileUpload(), resetUpload(); drag-and-drop
and file picker both call the extract endpoint then populate the textarea.

tools.css — upload zone, drag-over, file-info, clear button styles.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 06:52:14 +02:00
daveadmin 3c8d7ebc34 feat: pass temporal_mode and as_of_date through DBN search API
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-12 18:45:54 +02:00
daveadmin 1f4f01bda3 Add public showcase landing, doc summary cards, and chunk toggle
- index.php: public showcase landing page (hero, how-it-works, capabilities,
  evidence mock, login form) visible to unauthenticated visitors; full OG/SEO
  meta; app shell hidden behind auth as before
- tools.css: showcase section styles (gradient hero, step cards, capability
  grid, CTA button, evidence mock, footer)
- LegalTools.php: sourceFromChunk() batch-fetches doc_summaries from RAG DB
  for non-private chunks; excerpt shows doc summary when available, falls back
  to raw chunk text; chunk_text field always carries the raw excerpt
- tools.js: renderEvidenceItem() shows doc summary as card body; adds a
  collapsible "View chunk" toggle when summary differs from raw chunk text

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-12 08:37:36 +02:00
daveadmin 3442b1b6d3 Add Caveau corpus ownership setup 2026-05-08 17:32:14 +02:00
daveadmin 3282db7c19 Support live Caveau plan enum 2026-05-08 17:15:41 +02:00
daveadmin 62dbb8d900 Gate tools login with Caveau access 2026-05-08 17:12:38 +02:00
daveadmin 9b22947eb2 Two-pass PII redaction with multi-country pattern packs
Pass 1: deterministic regex with Nordic/European/ECHR/Global packs
covering fødselsnummer, Swedish personnummer, Danish/Finnish CPR,
UK NI, French INSEE, IBAN, EU phones, ECHR application numbers, DOB,
and national ID label patterns.

Pass 2: LLM semantic scan (Azure OpenAI) finds names, orgs, places
and identifying descriptions missed by regex. Runs on pre-redacted
text so no raw PII reaches the LLM.

Adds region selector (Nordic/European/ECHR/Global) to the Redact UI.
Falls back gracefully when Azure is not yet configured.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-07 01:27:52 +02:00
daveadmin 2d8d1c7409 Initial release: Do Better Norge Legal Tools Hub
Five MVP tools (Ask, Search, Summarize, Timeline, Redact) with
email+password auth, Azure OpenAI gateway, evidence trail panel,
and process-and-forget privacy default.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-07 00:01:07 +02:00