dobetternorge-tools

Author	SHA1	Message	Date
daveadmin	bc44b0eee2	Add My Documents panel to workbench + user-docs API - api/user-docs.php: GET/DELETE shared dbn_user_docs table (SSO users only) connects to dobetternorge DB via DBN_DB_* env vars - workbench.php: My Documents panel (section 05) for SSO/free-tier users; shows docs uploaded from either AI chat or tools, links to AI Chat for upload - workbench.js: fetch + render doc list, delete with Qdrant cleanup - tools.css: workbench-docs panel + item styles - i18n.php: my_docs_* strings in all 4 languages Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-19 09:37:19 +02:00
daveadmin	47aa35e946	Add dbn-legal-agent targeted check step to BVJ Analyzer (Step 6b) Probe testing revealed the fine-tune loops when asked to check a brief directly (tool-planning architecture conflict) but answers focused legal Q&A reliably in ~55s. New step 6b asks one targeted question per document type (akuttvedtak → § 4-25 klar nødvendighet, adopsjon → Strand Lobben, undersøkelse → fvl § 17/§ 41) and merges the finding into procedural_red_flags with check_model provenance. Silent on timeout/error. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-18 22:38:29 +02:00
daveadmin	04555a96b1	Add Citation Explorer tool and graph-expansion badges to Advocate results - citations.php + assets/js/citations.js: new tool page for browsing the FalkorDB citation graph by title/ID, with autocomplete, action pills (cites/cited_by/implements/chain), hop-by-hop navigation, and exploration trail - advocate.js: tag graph-expanded source cards with 'via citation graph' badge - DeepResearchAgent: propagate _graph_expanded flag through normalizeCorpusChunk and top_sources serialization so it reaches the frontend - tools.css: add .dr-source-tag--graph variant (green pill) - i18n.php: register 'citations' tool in all 4 languages with CIT icon Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-18 22:30:04 +02:00
daveadmin	60e341e98a	corpus: expand catIds map and add health-law card - Map 10 additional DB category slugs to UI cards (social-services, echr-case-law, child-abduction, anti-discrimination, children-rights, legislation, parliamentary, immigration, legal, civil-litigation, etc.) - Add health-law card (1,874 docs were invisible — largest unmapped category) - Add patient-rights, government-policy, policy-reports, ombudsman, bankruptcy mappings Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-18 20:46:52 +02:00
daveadmin	e977bbb6b3	Add Document Discrepancy Finder tool 8-step NDJSON-streaming pipeline that compares two Barnevernet documents: classifies each doc, extracts parties and timelines, cross-references both for contradictions/deletions/additions, retrieves corpus legal context, and synthesises a full discrepancy report with tabbed UI. New files: DiscrepancyAgent.php, api/discrepancy.php, discrepancy.php, discrepancy.js. Modified: FreeTier.php (cost=4), i18n.php (all 4 langs), tool-svgs.php (DC icon), tools.css (dc-* component styles). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-18 19:30:38 +02:00
daveadmin	1246b7a804	feat(workbench): add DBN Case Workbench guided case-preparation hub Additive-only change: new workbench.php authenticated page with guided intake flow, evidence map, tool sequence, output checklist, and sessionStorage-only note persistence. Dashboard and public index get a new Case Workbench card. No existing tools, APIs, or prompts modified. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-18 19:08:16 +02:00
daveadmin	b495ff29fd	feat(corpus): make source links visible on search result cards - a.passage-card__title now shows teal+underline (was ink/no-decoration, undetectable) - adds "View source ↗" link below each passage excerpt when source_url present Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-18 16:16:04 +02:00
daveadmin	2e2dfd7310	feat(corpus): category filter, passage expand, drill enhancements, URL hash state - Search: category filter pills scope results to a legal domain - Search: full chunk text returned; click to expand inline beyond 600-char excerpt - Drill panel: total count label ("Showing X of Y"), sort dropdown, title filter (300ms debounce) - URL hash: preserves query/mode/lang/category/drill state for bookmarking Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-18 15:47:56 +02:00
daveadmin	ffcf887428	feat(timeline): add live filter, actor chips, group headers, copy button, source toggle, count badge - Live search/filter bar: filters events by keyword across event, actor, source_excerpt, date - Actor filter chips: click to filter by actor, multi-select, teal active state - Year/month group headers when sorted chronologically (── 2023 ──, Mar 2024 ──) - Per-event copy button (hover-revealed 📋): copies "date · actor · event" to clipboard - "Hide/show sources" toggle: collapses all source excerpts without re-rendering - Count badge: "23 events · 3 actors · 2022–2025" above the list - applyTimelineFilters() unifies sort + actor + text filters in one re-render pass - CSV export now includes end_date column - Reset all filter state on each new run Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-18 15:46:59 +02:00
daveadmin	59b39ff85b	feat(redact): tag highlighting, inventory panel, before/after toggle, gpt-4o upgrade - CSS: colour-coded [TAG] spans by entity type (person=pink, org=blue, place=green, date=amber, id=purple) - Inventory panel: collapsible list showing tag → original text mappings with occurrence counts, sourced from new redaction_map API response key - Before/after toggle: Redacted / Original view-switch buttons wired to lastOriginalText captured at submission time - One-click gpt-4o upgrade button when mini or GPU engine was used - Backend: redaction_map built from applied LLM entities (tag → originals + occurrence count via substr_count on final text) - renderResults now calls setupRedactViewToggle() after DOM is written Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-18 08:22:41 +02:00
daveadmin	850937e4b3	feat(transcribe): UX improvements — progress bar, stats row, copy btn, char counter, batch errors - Vocab textarea now shows live 0/500 char counter (turns amber at 450+) - Animated progress bar during transcription; determinate for multi-clip, indeterminate for single - Results card shows inline stats row (duration, language, speakers) and AI cleanup badge - Copy button + Download TXT moved above transcript box; SRT/VTT remain below - Speaker role legend repeats inside Segments panel for easy cross-reference - Batch errors no longer halt the queue; remaining clips continue, failed files named in status bar Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-18 08:21:19 +02:00
daveadmin	d1ad19d3c2	Add lt-footer to all tool pages via layout_footer.php Replaces the one-liner workbench-attribution div with the shared footer include so all seven tool pages (transcribe, timeline, redact, barnevernet, advocate, deep-research, corpus) show the same compact 2-column footer as the landing and dashboard pages. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-18 07:52:20 +02:00
daveadmin	c4362738c1	feat(transcribe): GPT cleanup pass + advanced options i18n Adds optional post-transcription cleanup via GPT-4o/GPT-4o-mini to fix mishearing errors, punctuation, and domain terms. Speaker role labelling now accepts a deployment param. Adds i18n strings for advanced options panel (task, VAD filter, Whisper model, AI cleanup) in all four languages. Updates BvjAnalyzerAgent and DeepResearchAgent. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-18 07:23:01 +02:00
daveadmin	e32ee60e78	feat(timeline): tighten prompt for accuracy — year inference, month names, actor normalization, confidence calibration - Add 4-step year inference rule for DD.MM. entries (scan backward/forward for anchor year) - Add Norwegian month-name formats (18. september, den 18. september 2025, etc.) with month lookup table - Add $relativeInstruction to tell LLM upfront when relative dates are excluded (not just PHP-filtered post-hoc) - Define confidence calibration criteria explicitly (high/medium/low) - Improve source_excerpt guidance: most diagnostic phrase, not just any verbatim phrase - Add actor normalization for Norwegian institutions (Barnevernstjenesten, Fylkesnemnda, Statsforvalteren, etc.) - Add deduplication rule for events appearing across multiple documents - Add end_date field for date_type=period events - Improve what_we_found schema hint to require count/range/actors/gaps - Increase max_tokens to 8000 for azure_full (gpt-4o) to avoid truncation on large documents - Tighten system prompt with Norwegian CPS legal chain context Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-18 07:11:31 +02:00
daveadmin	f2fbb69e0a	feat: lightweight header/footer — IBM Plex Sans, slimmed nav badge, compact footer Drops Roboto + IBM Plex Mono from Google Fonts, replaces with IBM Plex Sans (matching dobetternorge.no). Nav badge loses bordered pill, becomes plain uppercase label with slash separator. Footer cut from 3-column text-wall (~300 words) to compact 2-column layout (~50 words) — logo + tagline + privacy note on left, 5 links in 2 columns on right. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-18 07:10:46 +02:00
daveadmin	f0b7d343a3	feat: unified landing page with auth-aware gate + /dashboard.php Removes the logged-in vs logged-out page bifurcation. index.php now always renders the public landing (tools overview, hero, trust section) with auth-conditional nav/hero CTAs and a two-column member/register gate shown only to unauthenticated visitors. Authenticated workbench extracted to new dashboard.php. Adds 8 new i18n keys across all 4 languages and new CSS for auth-nav, hero CTA, two-column gate, and register buttons. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-18 06:45:44 +02:00
daveadmin	93b28b8783	feat: rebuild preview pages with real API content and full i18n - preview.php: localized pitch + features for en/no (uk/pl fall back to en) - Sample outputs now match actual API response format: streaming pipeline steps, confidence fields, entity counts, corpus slice names, speaker roles - i18n.php: add 10 preview-specific keys across all 4 languages (en/no/uk/pl) - Transcribe: shows 3-engine cascade + real speaker roles (saksbehandler/dommer/advokat) - Timeline: shows date_type, confidence, what_remains_uncertain, next_practical_step - Redact: shows two-pass pipeline (regex Nordic pack + LLM NER) + contextual tags - Barnevernet: shows 7-step streaming trace + procedural flag severity levels - Advocate: shows partisan brief with advocate_role + citation confidence - Deep Research: shows corpus slices + sub-questions + contradiction-aware synthesis - Corpus: shows real Qdrant + Azure AI Search config, hybrid search result Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-17 14:39:12 +02:00
daveadmin	849a7cf434	feat: add public tool preview pages with realistic samples Each landing card now links to preview.php?tool=SLUG — a dedicated public page with an expanded pitch, 4 capability bullets, and a realistic Norwegian-language sample input+output for all 7 tools. - preview.php — new public page (no auth required), switch-driven content - includes/tool-svgs.php — extracted $toolSvgs into shared include - index.php — require tool-svgs.php, card href → preview.php?tool=SLUG - assets/css/tools.css — lt-preview-* component styles appended Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-17 13:08:46 +02:00
daveadmin	c350750b7e	fix: serve logo locally to fix broken image on nav/footer External URL was unreachable from tools subdomain (CSP or cross-origin block), causing a grey placeholder rectangle. Logo now served from assets/images/ and brightness/invert filter removed — logo is white-on-transparent, displays correctly on dark nav and footer without filtering. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-17 12:41:11 +02:00
daveadmin	38683cffc0	feat: rebrand landing page to match dobetternorge.no - Add sticky navy nav with logo-header.webp, Legal Tools badge, lang switcher, red CTA - Replace showcase-hero with full-bleed dark hero (Crimson Pro, IBM Plex Mono, stat pills) - Redesign tool cards: 3-col grid, 178px illustrated SVG art per card (7 unique illustrations) - Add lt-trust 3-col strip and lt-access navy gate panel - Rebuild footer with 3-col navy layout matching main site - Add Crimson Pro / Roboto / IBM Plex Mono Google Fonts via <link> + @import - CSS: new lt-* variables, all new landing component styles appended to tools.css Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-17 12:33:31 +02:00
daveadmin	8b77acb828	feat: free-tier credit system + Syttende Mai access for Google users - FreeTier.php: credit check/deduct/reset engine with hourly rate limit - bootstrap.php: dbnmDb() singleton, dbnToolsIsFreeTier(), credit gate helpers - index.php: store tier=free\|approved in session from SSO JWT - All 7 API endpoints: credit gate (402/429) + X-Credits-Remaining header - layout.php: credit meta tag, JS balance var, Syttende Mai banner (05-17 only) - tools.js: credit badge in topbar, 402 modal, 429 toast, dbnUpdateCredits() - barnevernet.js + deep-research.js: wire 402/429 handling for NDJSON streams - tools.css: styles for credit badge, no-credits modal, rate-limit toast Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-16 21:05:08 +02:00
daveadmin	568314c554	fix: wire GCP Speech client into tools transcribe (was using unreachable ai-portal path) Copies GcpSpeechClient into the tools repo so it's deployed with the code; removes the broken dbnToolsAiPortalRoot() path that resolved to a nonexistent /home/dobetternorge/ai-portal directory. Also restarted the CPU Whisper service which had a stuck CLOSE_WAIT socket causing silent fetch failures. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-16 13:43:28 +02:00
daveadmin	08d1e3cee3	feat: auto-select STT engine (Azure → Google Cloud → Whisper) and show provider in results Removes user-facing engine/model/key/beam controls. The server now picks the best available engine automatically: 1. Microsoft Azure Speech — short clips (≤1MB, no diarization, audio/*) 2. Google Cloud Speech v2 — long audio, diarization, all languages 3. OpenAI Whisper GPU — local fallback Results display which provider was used (e.g. "Transcribed with Google Cloud Speech") via transcript-engine-badge and traceMeta. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-16 13:22:24 +02:00
daveadmin	c6a9cc9199	feat: add site footer with privacy statement, CaveauAI attribution, and AI disclaimer Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 23:24:49 +02:00
daveadmin	13572e9dfb	feat: extract and display event times on timeline (kl. HH:MM etc.) Prompt now instructs the model to extract time of day (HH:MM) when present in Norwegian formats: kl. 14:30, kl 09.00, 14:30, 14.30. renderTimeline shows time as a muted inline annotation next to the date. CSV export gains a Time column after Date. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 23:03:20 +02:00
daveadmin	c5c90d92f3	feat: add Redact tool to launched nav and dashboard (all 4 languages) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 22:58:20 +02:00
daveadmin	a3d46f9756	feat: Legal Tools v1 — multilingual landing, dashboard, SSO bridge - Public landing page at / for unauthenticated users (EN/NO/UK/PL) - Authenticated / shows Case Workbench dashboard with manifesto strip, stats, and launched-tool grid (Transcribe, Timeline, BVJ, Advocate, Deep Research, Corpus) - Added includes/i18n.php with full 4-language translation layer - Extended layout.php to Case Workbench shell with tool rail, lang switcher - AI output language normalization extended to en/no/uk/pl in PHP agents - SSO token validation in bootstrap.php / index.php (dobetternorge.no bridge) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 22:53:27 +02:00
daveadmin	ba6c197f1b	refactor: remove dbn_legal engine from BVJ Analyzer dbn-legal-agent is not suitable for structured RAG synthesis: - Fine-tune contamination appends feedback loops after JSON output - 7-min latency vs 45s for gpt-4o-mini - 8B base gives weaker instruction-following on complex JSON contracts - No improvement in citation accuracy (RAG provides the legal content) dbn-legal-agent kept for open-ended freeform Norwegian legal Q&A where citation structure isn't required. BVJ synthesis now uses azure_mini\|azure_full\|gpu only. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 22:13:16 +02:00
daveadmin	7e0fce4167	fix: rein in dbn-legal-agent feedback-loop contamination (stop seqs + JSON extract + system prompt)	2026-05-15 22:05:49 +02:00
daveadmin	6161ceea75	fix: pass $emit into synthesiseBvj so dbn-legal-agent keepalives fire	2026-05-15 21:51:16 +02:00
daveadmin	bc52690472	fix: BVJ party extraction robustness + dbn-legal-agent streaming Party extraction: wider excerpt (12k chars), cleaner prompt, fallback for root-level array responses, log raw response on unexpected structure. dbn-legal-agent synthesis: replace blocking curl (200s timeout) with an SSE streaming approach (CURLOPT_WRITEFUNCTION). PHP now emits keepalive progress events every 15 s during generation, preventing browser network errors on slow ~6 t/s cuttlefish inference. Timeout extended to 660 s. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 21:35:18 +02:00
daveadmin	9b8cb9c6dc	fix: raise file upload limit from 4 MB to 8 MB PHP constant and all JS client-side guards updated. Server PHP ini is 64M. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 20:57:25 +02:00
daveadmin	43cf5b8ce4	feat: Barnevernet Analyzer — document analysis + partisan RAG brief 7-step agent pipeline: document classification, party extraction, timeline extraction, corpus RAG (child_welfare/echr/family_core/bufdir_guidance), and synthesis using the user's chosen engine (including dbn-legal-agent). Progressive NDJSON streaming renders doc_meta, parties, and timeline cards before the final advocacy brief and procedural red flags arrive. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 20:49:46 +02:00
daveadmin	343b19d0b4	Add sub-question branching + document summary modals - Source modal now shows LLM-generated document summary (lazy-gen + cached in documents.summary) instead of raw chunk text; toggle reveals matched chunk; "View all chunks" button fetches every chunk of the document via new api/document-chunks.php endpoint - Each sub-question card gets a "Branch ↓" button that pre-fills the query with that sub-question and shows a context panel with the prior brief summary; prior_context + branch_notes are injected into interpretSeed() and synthesise() so the LLM knows where the research is coming from - Upload document summaries generated at synthesis time and attached to upload sources alongside corpus summaries - DB: documents.summary TEXT column added to bnl_corpus on chloe Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 19:44:27 +02:00
daveadmin	0ff4eb6d31	Add dbn-legal-agent to deep-research and advocate pipelines - interpretSeed: uses dbn-legal-agent for Norwegian/advocate queries - expandQueries: uses dbn-legal-agent for Norwegian sub-question generation - synthesise: adds dbn_legal engine option (dbn-legal-agent via LiteLLM GPU) - advocate.php: adds Norwegian specialist radio button in engine selector Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 19:12:19 +02:00
daveadmin	a61329eb85	Route Whisper to chloe localhost (127.0.0.1:20019) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 18:56:02 +02:00
daveadmin	7bccd8c010	Expand corpus slices to 8: split ECHR/Hague, add Norwegian Courts, Bufdir, DBN Resources - Replace combined echr_hague slice with echr (Art.8+9, HUDOC, NIM) and hague (INCADAT, cross-border abduction) as separate toggles; echr defaults ON, hague defaults OFF - Add norwegian_courts slice: Domstol (src 5,26) + Rettspraksis.no (src 33, 482 docs) - Add bufdir_guidance slice: Barneombudet (19), Bufdir (20), Statsforvalteren (31) - Add dbn_resources slice: DBN website pages (flashcards, resource directory), defaults OFF - Replace isWebsiteChunk() with slice-aware shouldExcludeChunk(): always strips EU AI Act chunks (EUR-Lex source 7 leaks through when Qdrant runs unconstrained) and DBN website pages unless dbn_resources slice is explicitly ON - Update SLICE_DEFS in advocate.js and deep-research.js to match all 8 slices - Backward compat: echr_hague key in incoming requests fans out to echr+hague Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 16:01:05 +02:00
daveadmin	464b8572d3	Wire Azure AI Search into dobetternorge-tools health.php: Add azure_search check — calls /$count endpoint and reports doc count in the index. Reads DBN_AZURE_SEARCH_{ENDPOINT,KEY,INDEX}. corpus-search.php: Add azure mode — semantic + vector hybrid search via Azure AI Search bnl-legal-v2. Embeds query with LiteLLM nomic-embed-text; expands keepCats to include government-policy, health-law, social-services, labour-law, immigration (previously blocked by contamination workaround, now safe to include). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 13:32:15 +02:00
daveadmin	d5e61d656a	Fix MariaDB LIMIT/OFFSET bound-parameter error in corpus API MariaDB rejects ? placeholders for LIMIT/OFFSET when emulate_prepares=false. Interpolate $limit and $offset as ints directly into SQL strings in both corpus-documents.php and corpus-search.php BM25 paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 12:31:20 +02:00
daveadmin	640778454f	Add Case Advocate tab — partisan brief grounded in Norwegian law New /advocate.php tab: user selects who they represent (biological father, mother, foster carer, CWS, etc.) and the agent takes their side entirely. Adversarial sub-questions target supporting Lovdata statutes + ECHR precedents; synthesis returns client_strengths[] and opposing_weaknesses[] alongside the advocate brief. - DeepResearchAgent: add advocateRole param to run(), interpretSeed(), expandQueries(), synthesise(). Neutral path unchanged (empty string). - api/deep-research.php: extract + validate advocate_role from payload; telemetry logs tool='advocate' vs 'deep_research'. - advocate.php: new page with role dropdown (presets + custom), same corpus slices/engine/controls/upload zone as deep research. - assets/js/advocate.js: page-scoped JS; renders advocate banner, client strengths card (teal), advocate brief, opposing weaknesses card (amber), sub-Q cards, sources, uncertainty, next step. - assets/css/tools.css: append .adv-* rules (~120 lines). - includes/layout.php: add Advocate nav tab between Deep research and Summarize. - index.php: add Advocate cap-card tile. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 12:26:05 +02:00
daveadmin	85a6bc8134	Exclude dobetternorge.no docs from all corpus search modes BM25: adds NOT LIKE filter to SQL WHERE in both FULLTEXT and LIKE paths. Hybrid + Vector: post-filter hits array by source_url after results return. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 12:10:46 +02:00
daveadmin	38255669a9	Add corpus explorer: search bar (Hybrid/BM25/Vector), category drill-down, source row expand - api/corpus-search.php: new endpoint with three search modes (hybrid RAG, BM25 keyword, Qdrant vector) - api/corpus-documents.php: paginated document browser by category or source name - corpus.php: search bar with mode+language pills, Browse docs button on each category card with drill-down panel, expand toggle on each source row showing doc count and scraper class - tools.css: all new corpus interactive styles appended Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 11:55:54 +02:00
daveadmin	785de04f05	fix: batch embed 5 chunks at a time with flush between; fix hydrateSourceUrls SQL Embed timeout: bnl_corpus Ollama embeds ~49 chunks sequentially in CPU mode, easily exceeding the 60s cURL timeout. Now truncates upload text to MAX_UPLOAD_CHARS before chunking (~21 chunks max) and embeds in batches of 5 with a progress flush between batches to keep the stream alive. SQL error: bnl_corpus.documents lacks the temporal columns added in migration 136 (valid_from, valid_until, etc.). dbnV6QueryDocumentMeta uses IFNULL which doesn't protect against missing columns. Replaced with a direct query using only the columns confirmed to exist on this instance. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-15 11:42:38 +02:00
daveadmin	d2f9831472	feat: Corpus Intelligence page + timeline background events Adds /corpus.php — a data transparency page showing what powers the legal tools: 9 coverage categories with live doc counts, a full sources table pulled from the corpus DB, the AI stack (LLMs, Whisper, Qdrant, Azure AI Search, embeddings, chunking), and a pipeline flow diagram. Stats are live via a new /api/corpus-stats.php endpoint (queries dobetter_rag + bnl_admin). The reasoning sidebar is repurposed as a Corpus health panel on this page. Also ships the in-progress timeline background events toggle: API and UI wired together via include_background param. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 11:31:24 +02:00
daveadmin	3196c33ebb	fix: replace AiGateway.embedBatch with direct LiteLLM cURL for upload indexing AiGateway uses getenv(LITELLM_MASTER_KEY) + stream_context HTTP which was failing on the chloe virtualhost process. New dbnToolsLiteLLMEmbedBatch() helper mirrors dbnToolsCallGpuLlm — hardcoded URL + key, cURL-first, same pattern already proven for LLM calls. Removes AiGateway dependency from DeepResearchAgent entirely. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-15 11:30:25 +02:00
daveadmin	e130db8119	Deep Research v2: exclude marketing site, deep-link sources, per-agent reports Three user-flagged issues after the first real run with a 920KB sakkyndig PDF: 1. dobetternorge.no marketing-website chunks leaked into the retrieval pool. ClientRagPipeline::searchAll defaults include_beta_website=true; we now pass false for both website flags, AND defensively drop any returned chunk whose source_name contains "website" or title contains "dobetternorge.no" before it can pollute synthesis. 2. Brief returned was "just a paragraph". Bumped synthesis max_tokens 2200→3200, raised timeout 120→180s, and rewrote the prompt to require 400-900 words with min 4 paragraphs when source_count>=3, covering EACH sub-question in its own paragraph. Now also passes authority + jurisdiction into the sources block so the model can pinpoint statutes correctly. 3. No way to see what each "sub-question agent" researched or click through to the source articles. Restructured the results panel so per-sub-question report cards now render ABOVE the synthesised brief. Each report shows the question, the rationale, and the top 3 retrieved sources for that sub-Q with title→deep link + 1-line excerpt. Brief follows. Consolidated numbered sources list at the bottom, with titles as deep links too. Deep-link construction: source_url is hydrated via dbnV6QueryDocumentMeta in a single batched call after retrieval. For Lovdata sources with a section_title containing §<n>, the link is path-anchored to that section (/§43). For other hosts (HUDOC, Regjeringen, Bufdir, etc.) we link to the document root URL. Telemetry: trace_metadata now carries retrieval_counts {raw_corpus, filtered_website, post_filter_corpus, raw_upload, after_dedupe, after_topk} so future regressions are diagnosable from the metadata.jsonl log alone. The completion status pill surfaces the corpus/website/upload split.	2026-05-15 11:12:13 +02:00
daveadmin	a1a7f442a7	Deep Research: NDJSON streaming so the connection survives long runs Previously the endpoint returned a single JSON object at the end. Apache+ PHP-FPM buffers the entire body until PHP exits, so a 160s azure_full run caused the browser to drop the fetch as "Failed to fetch" while the server was still synthesising — the response then arrived to a dead socket. Switch to application/x-ndjson with one event per line. The endpoint emits 'progress', 'start', 'step' (running/complete/warning/error), 'subq', and a final 'final' event carrying the full result payload. Output buffering is explicitly disabled so each line flushes through Apache as soon as the agent emits it. DbnDeepResearchAgent::run() now accepts an optional ?callable $emit and fires step:running before each step + step:complete after, plus a subq event per sub-question retrieval round. JS reads response.body as a stream, splits on newlines, updates the trace panel live, and renders the final result when the final event arrives. Status pill shows live progress detail (e.g. "Synthesising with Azure gpt-4o — this is the slowest step…"). Engine row in the form now shows expected duration per engine (~15-45s mini, ~60-180s full, ~30-90s GPU) so users know what they're in for before clicking Run.	2026-05-15 10:47:35 +02:00
daveadmin	4cbe0a4ac4	Add Deep Research tool — agent + rank/rerank RAG New surface at /deep-research.php where the user pastes a question or uploads PDF/DOCX/TXT case files and a LLM-orchestrated agent researches the Do Better Norge legal corpus from 3-5 angles, with hybrid retrieval, cross-encoder rerank, and synthesis that emits an inline-[n]-cited markdown brief plus a numbered sources panel. Uploaded documents are chunked + embedded in memory only (nomic-embed-text via LiteLLM) and searched alongside the shared corpus during the same request — never persisted to disk, DB, or Qdrant. Reuses ClientRagPipeline::searchAll (hybrid + rerank), dbnV6 slice helpers, and the existing extract.php text-extraction logic via a new dbnToolsExtractUploadedFile() helper. Also adds dbnToolsCallGpuLlm() helper in bootstrap.php — fixes a latent bug where LegalTools.php was already calling that name with no definition. Search.php is unchanged.	2026-05-15 10:30:47 +02:00
daveadmin	55e11cb649	Azure: route azure_mini engine to gpt-4o-mini explicitly The .env default DBN_AZURE_OPENAI_CHAT_DEPLOYMENT is gpt-4o, so the azure_mini branch (which just called ->chat() without withDeployment) was silently hitting gpt-4o too. Both UI engine options resolved to the same model, and timed out together on long Norwegian documents. Fix: explicitly route azure_mini → gpt-4o-mini in both timeline and redact paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 09:38:55 +02:00
daveadmin	85c3cee719	Azure: raise chat timeout 45s → 90s default; timeline uses 120s Timeline was using no explicit timeout, falling back to the gateway's 45s default, which timed out on long Norwegian legal documents. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 02:09:02 +02:00

1 2

81 Commits