dobetternorge-tools

Author	SHA1	Message	Date
daveadmin	d2f9831472	feat: Corpus Intelligence page + timeline background events Adds /corpus.php — a data transparency page showing what powers the legal tools: 9 coverage categories with live doc counts, a full sources table pulled from the corpus DB, the AI stack (LLMs, Whisper, Qdrant, Azure AI Search, embeddings, chunking), and a pipeline flow diagram. Stats are live via a new /api/corpus-stats.php endpoint (queries dobetter_rag + bnl_admin). The reasoning sidebar is repurposed as a Corpus health panel on this page. Also ships the in-progress timeline background events toggle: API and UI wired together via include_background param. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 11:31:24 +02:00
daveadmin	e130db8119	Deep Research v2: exclude marketing site, deep-link sources, per-agent reports Three user-flagged issues after the first real run with a 920KB sakkyndig PDF: 1. dobetternorge.no marketing-website chunks leaked into the retrieval pool. ClientRagPipeline::searchAll defaults include_beta_website=true; we now pass false for both website flags, AND defensively drop any returned chunk whose source_name contains "website" or title contains "dobetternorge.no" before it can pollute synthesis. 2. Brief returned was "just a paragraph". Bumped synthesis max_tokens 2200→3200, raised timeout 120→180s, and rewrote the prompt to require 400-900 words with min 4 paragraphs when source_count>=3, covering EACH sub-question in its own paragraph. Now also passes authority + jurisdiction into the sources block so the model can pinpoint statutes correctly. 3. No way to see what each "sub-question agent" researched or click through to the source articles. Restructured the results panel so per-sub-question report cards now render ABOVE the synthesised brief. Each report shows the question, the rationale, and the top 3 retrieved sources for that sub-Q with title→deep link + 1-line excerpt. Brief follows. Consolidated numbered sources list at the bottom, with titles as deep links too. Deep-link construction: source_url is hydrated via dbnV6QueryDocumentMeta in a single batched call after retrieval. For Lovdata sources with a section_title containing §<n>, the link is path-anchored to that section (/§43). For other hosts (HUDOC, Regjeringen, Bufdir, etc.) we link to the document root URL. Telemetry: trace_metadata now carries retrieval_counts {raw_corpus, filtered_website, post_filter_corpus, raw_upload, after_dedupe, after_topk} so future regressions are diagnosable from the metadata.jsonl log alone. The completion status pill surfaces the corpus/website/upload split.	2026-05-15 11:12:13 +02:00
daveadmin	4cbe0a4ac4	Add Deep Research tool — agent + rank/rerank RAG New surface at /deep-research.php where the user pastes a question or uploads PDF/DOCX/TXT case files and a LLM-orchestrated agent researches the Do Better Norge legal corpus from 3-5 angles, with hybrid retrieval, cross-encoder rerank, and synthesis that emits an inline-[n]-cited markdown brief plus a numbered sources panel. Uploaded documents are chunked + embedded in memory only (nomic-embed-text via LiteLLM) and searched alongside the shared corpus during the same request — never persisted to disk, DB, or Qdrant. Reuses ClientRagPipeline::searchAll (hybrid + rerank), dbnV6 slice helpers, and the existing extract.php text-extraction logic via a new dbnToolsExtractUploadedFile() helper. Also adds dbnToolsCallGpuLlm() helper in bootstrap.php — fixes a latent bug where LegalTools.php was already calling that name with no definition. Search.php is unchanged.	2026-05-15 10:30:47 +02:00
daveadmin	e156ed4553	Add timeline sort toggle (doc order / chronological) with CSS - Wire sortDocOrder / sortChronological click handlers in renderResults() - Add .timeline-sort-bar and .sort-btn styles to tools.css Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 01:50:13 +02:00
daveadmin	d429e785e8	feat(feedback): thumbs up/down + missed-items widget across all tools New api/feedback.php stores rating + correction text to tool_feedback table in bnl_admin. renderFeedbackWidget() appended to all tool results (timeline, redact, transcribe, ask, summarize, search). Thumbs reveal a textarea for missed/wrong items on click; submit POSTs asynchronously. Engine from last run is stored alongside the rating. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 01:13:42 +02:00
daveadmin	7690ed17ee	feat(timeline): full form UI with engine selection and advanced settings Add 4-language switcher (EN/NO/UK/PL), engine choice (Azure mini/full, GPU/cuttlefish), and expandable Advanced panel (Focus, Confidence filter, Date types) to timeline.php. Wire new params through api/timeline.php and LegalTools::timeline() with engine routing, focus-aware prompt injection, and confidence/date-type post-filters. Add TIMELINE_I18N to tools.js with improved renderTimeline() confidence colour-coding and new CSS classes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 00:59:12 +02:00
daveadmin	30915bcb09	Redact: collapsible advanced settings, download TXT/DOCX/copy - Wrap Mode/Region/Entities/Officials/Output/Exempt/Aliases in a <details> toggle so the form opens clean with only engine + input visible - After redaction: Copy, Download .txt, Download .docx buttons appear below the redacted output (all four languages translated) - New api/redact-download.php: returns plain text or a minimal valid DOCX built from scratch with ZipArchive (no external dependencies) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 00:33:50 +02:00
daveadmin	8c12d5e778	Redact tool: rich UI, multilingual, engine choice, output formats - Custom inline form (EN/NO/UK/PL lang switcher) replacing generic stub - Engine selector: Azure gpt-4o-mini (default), gpt-4o, GPU cuttlefish, regex-only - Entity type toggles: names, organisations, places, dates of birth - Output formats: contextual role tags, generic [PERSON], Norwegian pseudonyms - Keep officials mode: judges/experts kept as [JUDGE: Andersen] format - Exempt names list: specific names excluded from redaction - Hint paragraphs explaining each option in all four languages - Backend: engine routing, callGpuLlm(), applyGenericTags(), applyPseudonymization() - AzureOpenAiGateway: withDeployment() clone pattern for per-call model override Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 00:20:16 +02:00
daveadmin	c77efa241c	feat(transcribe): English UI default, language switcher (NO/UK/PL), fix 504 timeout - Default UI language to English; lang switcher (EN/NO/UK/PL) persisted in localStorage - Rename 'rettssak/tingrett' preset to 'Mediation / legal meeting' — court recording is illegal - Add Ukrainian (uk) and Polish (pl) as selectable audio transcription languages - TRANSCRIBE_I18N translation object drives all status messages, labels, and trace text - Apache ProxyTimeout raised to 1800s on server (was 300s — caused 504 on large files) - set_time_limit(0) + ignore_user_abort(true) in api/transcribe.php - applyTranscribeI18n() patches data-i18n / data-i18n-placeholder / data-i18n-aria attrs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-14 22:47:32 +02:00
daveadmin	26f4e2231b	feat(transcribe): Norwegian defaults, vocabulary presets, multi-file court day queue - Default language → nb (Bokmål); auto-detect demoted with warning note - Default model → large-v3; VAD filter on by default - Vocabulary prompt promoted to main form with 4 preset buttons (Barnerett/CPS, Rettssak/tingrett, Generell norsk, Egendefinert) - Multi-file upload queue: drop/select multiple clips, numbered list UI - Sequential queue processing with cumulative time_offset per clip - Backend shifts segment timestamps so SRT/VTT covers full court day - Merged transcript + segments across all clips for single download Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-14 22:20:11 +02:00
daveadmin	eaff2a4d86	Per-tool pages + multi-engine transcribe with expert controls - Split monolithic index.php into per-tool pages (ask, search, summarize, timeline, redact, transcribe), each with its own URL and bookmarkable state - Shared shell: includes/layout.php + layout_footer.php; shared form: includes/tool_form.php used by all text-tool pages - index.php now redirects authenticated users to ask.php; unauthenticated users see the login gate only - transcribe.php: engine selector (GPU/OpenAI/Azure), model size (small/ medium/large-v3), diarize, language, expert settings (beam, VAD, task, initial prompt) - api/transcribe.php: engine routing — GPU (cuttlefish), OpenAI BYOK, Azure AI Speech; passes model/beam/task/vad/prompt to Whisper server - tools.js: data-active-tool body attr drives setTool() on load; <a> nav tabs skip click listeners; null guards on form/passcodeForm; engine radio toggle shows/hides BYOK key inputs and model selector; RTF shown in status - tools.css: styles for BYOK inputs, expert settings panel, prompt textarea Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-13 22:14:20 +02:00
daveadmin	aa2d64b599	Add transcribe progress indicator — elapsed timer and progressive trace messages Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-13 19:12:09 +02:00
daveadmin	d425c99e8e	Transcribe: audio-to-text tool with diarization and speaker role labelling New sixth tool in the hub. Accepts MP3/WAV/OGG/M4A/FLAC/WEBM up to 200 MB, proxies to Whisper on cuttlefish GPU. Optional speaker separation with LLM role labelling (dommer, advokat, forelder, sakkyndig, etc. via GPT-4o-mini). Client-side TXT / SRT / VTT download from segment data. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-13 18:43:22 +02:00
daveadmin	bddafea049	Timeline: document upload, upgraded prompt, CSV export, date_type badge	2026-05-13 08:10:40 +02:00
daveadmin	95685862ab	Redact: multi-doc upload, contextual person naming, aliases - Extract limit raised from 32K to 128K chars per file (long legal docs now fit) - Redact API body/text limits raised (400KB / 128K chars) to match - Upload zone accepts multiple files (up to 5); extracted text concatenated with doc separator and combined before redaction; shows per-file char counts - LLM redact pass now infers contextual person roles (FATHER, MOTHER, CHILD, ATTORNEY, JUDGE, etc.) instead of generic [PERSON] for all names; same individual gets consistent tag throughout the document - Tag validation widened to allow any [A-Za-z0-9_- ] pattern (not just the five hardcoded tags), supporting contextual and alias tags - Alias UI added to Redact mode: user maps real names to bracketed aliases (e.g. "David Jr" -> [Junior]); aliases injected into LLM system prompt as override instructions; max 20 aliases, 100 chars each - max_tokens raised from 2000 to 4000; timeout from 60s to 90s for larger docs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-13 07:17:02 +02:00
daveadmin	bbe5307c03	Add document upload to Redact tool api/extract.php — new endpoint accepting .pdf/.docx/.txt up to 4 MB; pdftotext for PDFs, ZipArchive+DOMXPath for DOCX, mb_convert_encoding for TXT; truncates to 32 000 chars to stay within redact limit. index.php — drop/browse upload zone above the textarea, visible only in Redact mode. tools.js — setupUpload(), handleFileUpload(), resetUpload(); drag-and-drop and file picker both call the extract endpoint then populate the textarea. tools.css — upload zone, drag-over, file-info, clear button styles. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-13 06:52:14 +02:00
daveadmin	1f4f01bda3	Add public showcase landing, doc summary cards, and chunk toggle - index.php: public showcase landing page (hero, how-it-works, capabilities, evidence mock, login form) visible to unauthenticated visitors; full OG/SEO meta; app shell hidden behind auth as before - tools.css: showcase section styles (gradient hero, step cards, capability grid, CTA button, evidence mock, footer) - LegalTools.php: sourceFromChunk() batch-fetches doc_summaries from RAG DB for non-private chunks; excerpt shows doc summary when available, falls back to raw chunk text; chunk_text field always carries the raw excerpt - tools.js: renderEvidenceItem() shows doc summary as card body; adds a collapsible "View chunk" toggle when summary differs from raw chunk text Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-12 08:37:36 +02:00
daveadmin	2d8d1c7409	Initial release: Do Better Norge Legal Tools Hub Five MVP tools (Ask, Search, Summarize, Timeline, Redact) with email+password auth, Azure OpenAI gateway, evidence trail panel, and process-and-forget privacy default. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-07 00:01:07 +02:00

18 Commits