dobetternorge-tools

Author	SHA1	Message	Date
daveadmin	bc44b0eee2	Add My Documents panel to workbench + user-docs API - api/user-docs.php: GET/DELETE shared dbn_user_docs table (SSO users only) connects to dobetternorge DB via DBN_DB_* env vars - workbench.php: My Documents panel (section 05) for SSO/free-tier users; shows docs uploaded from either AI chat or tools, links to AI Chat for upload - workbench.js: fetch + render doc list, delete with Qdrant cleanup - tools.css: workbench-docs panel + item styles - i18n.php: my_docs_* strings in all 4 languages Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-19 09:37:19 +02:00
daveadmin	e977bbb6b3	Add Document Discrepancy Finder tool 8-step NDJSON-streaming pipeline that compares two Barnevernet documents: classifies each doc, extracts parties and timelines, cross-references both for contradictions/deletions/additions, retrieves corpus legal context, and synthesises a full discrepancy report with tabbed UI. New files: DiscrepancyAgent.php, api/discrepancy.php, discrepancy.php, discrepancy.js. Modified: FreeTier.php (cost=4), i18n.php (all 4 langs), tool-svgs.php (DC icon), tools.css (dc-* component styles). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-18 19:30:38 +02:00
daveadmin	2e2dfd7310	feat(corpus): category filter, passage expand, drill enhancements, URL hash state - Search: category filter pills scope results to a legal domain - Search: full chunk text returned; click to expand inline beyond 600-char excerpt - Drill panel: total count label ("Showing X of Y"), sort dropdown, title filter (300ms debounce) - URL hash: preserves query/mode/lang/category/drill state for bookmarking Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-18 15:47:56 +02:00
daveadmin	ffcf887428	feat(timeline): add live filter, actor chips, group headers, copy button, source toggle, count badge - Live search/filter bar: filters events by keyword across event, actor, source_excerpt, date - Actor filter chips: click to filter by actor, multi-select, teal active state - Year/month group headers when sorted chronologically (── 2023 ──, Mar 2024 ──) - Per-event copy button (hover-revealed 📋): copies "date · actor · event" to clipboard - "Hide/show sources" toggle: collapses all source excerpts without re-rendering - Count badge: "23 events · 3 actors · 2022–2025" above the list - applyTimelineFilters() unifies sort + actor + text filters in one re-render pass - CSV export now includes end_date column - Reset all filter state on each new run Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-18 15:46:59 +02:00
daveadmin	c4362738c1	feat(transcribe): GPT cleanup pass + advanced options i18n Adds optional post-transcription cleanup via GPT-4o/GPT-4o-mini to fix mishearing errors, punctuation, and domain terms. Speaker role labelling now accepts a deployment param. Adds i18n strings for advanced options panel (task, VAD filter, Whisper model, AI cleanup) in all four languages. Updates BvjAnalyzerAgent and DeepResearchAgent. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-18 07:23:01 +02:00
daveadmin	8b77acb828	feat: free-tier credit system + Syttende Mai access for Google users - FreeTier.php: credit check/deduct/reset engine with hourly rate limit - bootstrap.php: dbnmDb() singleton, dbnToolsIsFreeTier(), credit gate helpers - index.php: store tier=free\|approved in session from SSO JWT - All 7 API endpoints: credit gate (402/429) + X-Credits-Remaining header - layout.php: credit meta tag, JS balance var, Syttende Mai banner (05-17 only) - tools.js: credit badge in topbar, 402 modal, 429 toast, dbnUpdateCredits() - barnevernet.js + deep-research.js: wire 402/429 handling for NDJSON streams - tools.css: styles for credit badge, no-credits modal, rate-limit toast Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-16 21:05:08 +02:00
daveadmin	568314c554	fix: wire GCP Speech client into tools transcribe (was using unreachable ai-portal path) Copies GcpSpeechClient into the tools repo so it's deployed with the code; removes the broken dbnToolsAiPortalRoot() path that resolved to a nonexistent /home/dobetternorge/ai-portal directory. Also restarted the CPU Whisper service which had a stuck CLOSE_WAIT socket causing silent fetch failures. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-16 13:43:28 +02:00
daveadmin	08d1e3cee3	feat: auto-select STT engine (Azure → Google Cloud → Whisper) and show provider in results Removes user-facing engine/model/key/beam controls. The server now picks the best available engine automatically: 1. Microsoft Azure Speech — short clips (≤1MB, no diarization, audio/*) 2. Google Cloud Speech v2 — long audio, diarization, all languages 3. OpenAI Whisper GPU — local fallback Results display which provider was used (e.g. "Transcribed with Google Cloud Speech") via transcript-engine-badge and traceMeta. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-16 13:22:24 +02:00
daveadmin	43cf5b8ce4	feat: Barnevernet Analyzer — document analysis + partisan RAG brief 7-step agent pipeline: document classification, party extraction, timeline extraction, corpus RAG (child_welfare/echr/family_core/bufdir_guidance), and synthesis using the user's chosen engine (including dbn-legal-agent). Progressive NDJSON streaming renders doc_meta, parties, and timeline cards before the final advocacy brief and procedural red flags arrive. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 20:49:46 +02:00
daveadmin	343b19d0b4	Add sub-question branching + document summary modals - Source modal now shows LLM-generated document summary (lazy-gen + cached in documents.summary) instead of raw chunk text; toggle reveals matched chunk; "View all chunks" button fetches every chunk of the document via new api/document-chunks.php endpoint - Each sub-question card gets a "Branch ↓" button that pre-fills the query with that sub-question and shows a context panel with the prior brief summary; prior_context + branch_notes are injected into interpretSeed() and synthesise() so the LLM knows where the research is coming from - Upload document summaries generated at synthesis time and attached to upload sources alongside corpus summaries - DB: documents.summary TEXT column added to bnl_corpus on chloe Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 19:44:27 +02:00
daveadmin	a61329eb85	Route Whisper to chloe localhost (127.0.0.1:20019) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 18:56:02 +02:00
daveadmin	464b8572d3	Wire Azure AI Search into dobetternorge-tools health.php: Add azure_search check — calls /$count endpoint and reports doc count in the index. Reads DBN_AZURE_SEARCH_{ENDPOINT,KEY,INDEX}. corpus-search.php: Add azure mode — semantic + vector hybrid search via Azure AI Search bnl-legal-v2. Embeds query with LiteLLM nomic-embed-text; expands keepCats to include government-policy, health-law, social-services, labour-law, immigration (previously blocked by contamination workaround, now safe to include). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 13:32:15 +02:00
daveadmin	d5e61d656a	Fix MariaDB LIMIT/OFFSET bound-parameter error in corpus API MariaDB rejects ? placeholders for LIMIT/OFFSET when emulate_prepares=false. Interpolate $limit and $offset as ints directly into SQL strings in both corpus-documents.php and corpus-search.php BM25 paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 12:31:20 +02:00
daveadmin	640778454f	Add Case Advocate tab — partisan brief grounded in Norwegian law New /advocate.php tab: user selects who they represent (biological father, mother, foster carer, CWS, etc.) and the agent takes their side entirely. Adversarial sub-questions target supporting Lovdata statutes + ECHR precedents; synthesis returns client_strengths[] and opposing_weaknesses[] alongside the advocate brief. - DeepResearchAgent: add advocateRole param to run(), interpretSeed(), expandQueries(), synthesise(). Neutral path unchanged (empty string). - api/deep-research.php: extract + validate advocate_role from payload; telemetry logs tool='advocate' vs 'deep_research'. - advocate.php: new page with role dropdown (presets + custom), same corpus slices/engine/controls/upload zone as deep research. - assets/js/advocate.js: page-scoped JS; renders advocate banner, client strengths card (teal), advocate brief, opposing weaknesses card (amber), sub-Q cards, sources, uncertainty, next step. - assets/css/tools.css: append .adv-* rules (~120 lines). - includes/layout.php: add Advocate nav tab between Deep research and Summarize. - index.php: add Advocate cap-card tile. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 12:26:05 +02:00
daveadmin	85a6bc8134	Exclude dobetternorge.no docs from all corpus search modes BM25: adds NOT LIKE filter to SQL WHERE in both FULLTEXT and LIKE paths. Hybrid + Vector: post-filter hits array by source_url after results return. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 12:10:46 +02:00
daveadmin	38255669a9	Add corpus explorer: search bar (Hybrid/BM25/Vector), category drill-down, source row expand - api/corpus-search.php: new endpoint with three search modes (hybrid RAG, BM25 keyword, Qdrant vector) - api/corpus-documents.php: paginated document browser by category or source name - corpus.php: search bar with mode+language pills, Browse docs button on each category card with drill-down panel, expand toggle on each source row showing doc count and scraper class - tools.css: all new corpus interactive styles appended Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 11:55:54 +02:00
daveadmin	d2f9831472	feat: Corpus Intelligence page + timeline background events Adds /corpus.php — a data transparency page showing what powers the legal tools: 9 coverage categories with live doc counts, a full sources table pulled from the corpus DB, the AI stack (LLMs, Whisper, Qdrant, Azure AI Search, embeddings, chunking), and a pipeline flow diagram. Stats are live via a new /api/corpus-stats.php endpoint (queries dobetter_rag + bnl_admin). The reasoning sidebar is repurposed as a Corpus health panel on this page. Also ships the in-progress timeline background events toggle: API and UI wired together via include_background param. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 11:31:24 +02:00
daveadmin	a1a7f442a7	Deep Research: NDJSON streaming so the connection survives long runs Previously the endpoint returned a single JSON object at the end. Apache+ PHP-FPM buffers the entire body until PHP exits, so a 160s azure_full run caused the browser to drop the fetch as "Failed to fetch" while the server was still synthesising — the response then arrived to a dead socket. Switch to application/x-ndjson with one event per line. The endpoint emits 'progress', 'start', 'step' (running/complete/warning/error), 'subq', and a final 'final' event carrying the full result payload. Output buffering is explicitly disabled so each line flushes through Apache as soon as the agent emits it. DbnDeepResearchAgent::run() now accepts an optional ?callable $emit and fires step:running before each step + step:complete after, plus a subq event per sub-question retrieval round. JS reads response.body as a stream, splits on newlines, updates the trace panel live, and renders the final result when the final event arrives. Status pill shows live progress detail (e.g. "Synthesising with Azure gpt-4o — this is the slowest step…"). Engine row in the form now shows expected duration per engine (~15-45s mini, ~60-180s full, ~30-90s GPU) so users know what they're in for before clicking Run.	2026-05-15 10:47:35 +02:00
daveadmin	4cbe0a4ac4	Add Deep Research tool — agent + rank/rerank RAG New surface at /deep-research.php where the user pastes a question or uploads PDF/DOCX/TXT case files and a LLM-orchestrated agent researches the Do Better Norge legal corpus from 3-5 angles, with hybrid retrieval, cross-encoder rerank, and synthesis that emits an inline-[n]-cited markdown brief plus a numbered sources panel. Uploaded documents are chunked + embedded in memory only (nomic-embed-text via LiteLLM) and searched alongside the shared corpus during the same request — never persisted to disk, DB, or Qdrant. Reuses ClientRagPipeline::searchAll (hybrid + rerank), dbnV6 slice helpers, and the existing extract.php text-extraction logic via a new dbnToolsExtractUploadedFile() helper. Also adds dbnToolsCallGpuLlm() helper in bootstrap.php — fixes a latent bug where LegalTools.php was already calling that name with no definition. Search.php is unchanged.	2026-05-15 10:30:47 +02:00
daveadmin	d429e785e8	feat(feedback): thumbs up/down + missed-items widget across all tools New api/feedback.php stores rating + correction text to tool_feedback table in bnl_admin. renderFeedbackWidget() appended to all tool results (timeline, redact, transcribe, ask, summarize, search). Thumbs reveal a textarea for missed/wrong items on click; submit POSTs asynchronously. Engine from last run is stored alongside the rating. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 01:13:42 +02:00
daveadmin	7690ed17ee	feat(timeline): full form UI with engine selection and advanced settings Add 4-language switcher (EN/NO/UK/PL), engine choice (Azure mini/full, GPU/cuttlefish), and expandable Advanced panel (Focus, Confidence filter, Date types) to timeline.php. Wire new params through api/timeline.php and LegalTools::timeline() with engine routing, focus-aware prompt injection, and confidence/date-type post-filters. Add TIMELINE_I18N to tools.js with improved renderTimeline() confidence colour-coding and new CSS classes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 00:59:12 +02:00
daveadmin	30915bcb09	Redact: collapsible advanced settings, download TXT/DOCX/copy - Wrap Mode/Region/Entities/Officials/Output/Exempt/Aliases in a <details> toggle so the form opens clean with only engine + input visible - After redaction: Copy, Download .txt, Download .docx buttons appear below the redacted output (all four languages translated) - New api/redact-download.php: returns plain text or a minimal valid DOCX built from scratch with ZipArchive (no external dependencies) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 00:33:50 +02:00
daveadmin	8c12d5e778	Redact tool: rich UI, multilingual, engine choice, output formats - Custom inline form (EN/NO/UK/PL lang switcher) replacing generic stub - Engine selector: Azure gpt-4o-mini (default), gpt-4o, GPU cuttlefish, regex-only - Entity type toggles: names, organisations, places, dates of birth - Output formats: contextual role tags, generic [PERSON], Norwegian pseudonyms - Keep officials mode: judges/experts kept as [JUDGE: Andersen] format - Exempt names list: specific names excluded from redaction - Hint paragraphs explaining each option in all four languages - Backend: engine routing, callGpuLlm(), applyGenericTags(), applyPseudonymization() - AzureOpenAiGateway: withDeployment() clone pattern for per-call model override Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 00:20:16 +02:00
daveadmin	e3d8daf6ca	feat(transcribe): Azure Speech server-side key, remove translate option, add beam/VAD hints - api/transcribe.php falls back to DBN_AZURE_SPEECH_KEY/REGION env vars so BYOK not required - JS hides Azure key input when DBN_AZURE_SPEECH_CONFIGURED is true - Remove Translate to English task option from Advanced settings - Add explanatory hint text for Beam size and VAD filter in all 4 languages Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-14 23:23:33 +02:00
daveadmin	c77efa241c	feat(transcribe): English UI default, language switcher (NO/UK/PL), fix 504 timeout - Default UI language to English; lang switcher (EN/NO/UK/PL) persisted in localStorage - Rename 'rettssak/tingrett' preset to 'Mediation / legal meeting' — court recording is illegal - Add Ukrainian (uk) and Polish (pl) as selectable audio transcription languages - TRANSCRIBE_I18N translation object drives all status messages, labels, and trace text - Apache ProxyTimeout raised to 1800s on server (was 300s — caused 504 on large files) - set_time_limit(0) + ignore_user_abort(true) in api/transcribe.php - applyTranscribeI18n() patches data-i18n / data-i18n-placeholder / data-i18n-aria attrs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-14 22:47:32 +02:00
daveadmin	26f4e2231b	feat(transcribe): Norwegian defaults, vocabulary presets, multi-file court day queue - Default language → nb (Bokmål); auto-detect demoted with warning note - Default model → large-v3; VAD filter on by default - Vocabulary prompt promoted to main form with 4 preset buttons (Barnerett/CPS, Rettssak/tingrett, Generell norsk, Egendefinert) - Multi-file upload queue: drop/select multiple clips, numbered list UI - Sequential queue processing with cumulative time_offset per clip - Backend shifts segment timestamps so SRT/VTT covers full court day - Merged transcript + segments across all clips for single download Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-14 22:20:11 +02:00
daveadmin	eaff2a4d86	Per-tool pages + multi-engine transcribe with expert controls - Split monolithic index.php into per-tool pages (ask, search, summarize, timeline, redact, transcribe), each with its own URL and bookmarkable state - Shared shell: includes/layout.php + layout_footer.php; shared form: includes/tool_form.php used by all text-tool pages - index.php now redirects authenticated users to ask.php; unauthenticated users see the login gate only - transcribe.php: engine selector (GPU/OpenAI/Azure), model size (small/ medium/large-v3), diarize, language, expert settings (beam, VAD, task, initial prompt) - api/transcribe.php: engine routing — GPU (cuttlefish), OpenAI BYOK, Azure AI Speech; passes model/beam/task/vad/prompt to Whisper server - tools.js: data-active-tool body attr drives setTool() on load; <a> nav tabs skip click listeners; null guards on form/passcodeForm; engine radio toggle shows/hides BYOK key inputs and model selector; RTF shown in status - tools.css: styles for BYOK inputs, expert settings panel, prompt textarea Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-13 22:14:20 +02:00
daveadmin	d425c99e8e	Transcribe: audio-to-text tool with diarization and speaker role labelling New sixth tool in the hub. Accepts MP3/WAV/OGG/M4A/FLAC/WEBM up to 200 MB, proxies to Whisper on cuttlefish GPU. Optional speaker separation with LLM role labelling (dommer, advokat, forelder, sakkyndig, etc. via GPT-4o-mini). Client-side TXT / SRT / VTT download from segment data. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-13 18:43:22 +02:00
daveadmin	bddafea049	Timeline: document upload, upgraded prompt, CSV export, date_type badge	2026-05-13 08:10:40 +02:00
daveadmin	95685862ab	Redact: multi-doc upload, contextual person naming, aliases - Extract limit raised from 32K to 128K chars per file (long legal docs now fit) - Redact API body/text limits raised (400KB / 128K chars) to match - Upload zone accepts multiple files (up to 5); extracted text concatenated with doc separator and combined before redaction; shows per-file char counts - LLM redact pass now infers contextual person roles (FATHER, MOTHER, CHILD, ATTORNEY, JUDGE, etc.) instead of generic [PERSON] for all names; same individual gets consistent tag throughout the document - Tag validation widened to allow any [A-Za-z0-9_- ] pattern (not just the five hardcoded tags), supporting contextual and alias tags - Alias UI added to Redact mode: user maps real names to bracketed aliases (e.g. "David Jr" -> [Junior]); aliases injected into LLM system prompt as override instructions; max 20 aliases, 100 chars each - max_tokens raised from 2000 to 4000; timeout from 60s to 90s for larger docs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-13 07:17:02 +02:00
daveadmin	bbe5307c03	Add document upload to Redact tool api/extract.php — new endpoint accepting .pdf/.docx/.txt up to 4 MB; pdftotext for PDFs, ZipArchive+DOMXPath for DOCX, mb_convert_encoding for TXT; truncates to 32 000 chars to stay within redact limit. index.php — drop/browse upload zone above the textarea, visible only in Redact mode. tools.js — setupUpload(), handleFileUpload(), resetUpload(); drag-and-drop and file picker both call the extract endpoint then populate the textarea. tools.css — upload zone, drag-over, file-info, clear button styles. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-13 06:52:14 +02:00
daveadmin	3c8d7ebc34	feat: pass temporal_mode and as_of_date through DBN search API Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-12 18:45:54 +02:00
daveadmin	62dbb8d900	Gate tools login with Caveau access	2026-05-08 17:12:38 +02:00
daveadmin	9b22947eb2	Two-pass PII redaction with multi-country pattern packs Pass 1: deterministic regex with Nordic/European/ECHR/Global packs covering fødselsnummer, Swedish personnummer, Danish/Finnish CPR, UK NI, French INSEE, IBAN, EU phones, ECHR application numbers, DOB, and national ID label patterns. Pass 2: LLM semantic scan (Azure OpenAI) finds names, orgs, places and identifying descriptions missed by regex. Runs on pre-redacted text so no raw PII reaches the LLM. Adds region selector (Nordic/European/ECHR/Global) to the Redact UI. Falls back gracefully when Azure is not yet configured. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-07 01:27:52 +02:00
daveadmin	2d8d1c7409	Initial release: Do Better Norge Legal Tools Hub Five MVP tools (Ask, Search, Summarize, Timeline, Redact) with email+password auth, Azure OpenAI gateway, evidence trail panel, and process-and-forget privacy default. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-07 00:01:07 +02:00

35 Commits