- api/user-docs.php: GET/DELETE shared dbn_user_docs table (SSO users only)
connects to dobetternorge DB via DBN_DB_* env vars
- workbench.php: My Documents panel (section 05) for SSO/free-tier users;
shows docs uploaded from either AI chat or tools, links to AI Chat for upload
- workbench.js: fetch + render doc list, delete with Qdrant cleanup
- tools.css: workbench-docs panel + item styles
- i18n.php: my_docs_* strings in all 4 languages
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Probe testing revealed the fine-tune loops when asked to check a brief
directly (tool-planning architecture conflict) but answers focused legal
Q&A reliably in ~55s. New step 6b asks one targeted question per document
type (akuttvedtak → § 4-25 klar nødvendighet, adopsjon → Strand Lobben,
undersøkelse → fvl § 17/§ 41) and merges the finding into
procedural_red_flags with check_model provenance. Silent on timeout/error.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- citations.php + assets/js/citations.js: new tool page for browsing the
FalkorDB citation graph by title/ID, with autocomplete, action pills
(cites/cited_by/implements/chain), hop-by-hop navigation, and exploration trail
- advocate.js: tag graph-expanded source cards with 'via citation graph' badge
- DeepResearchAgent: propagate _graph_expanded flag through normalizeCorpusChunk
and top_sources serialization so it reaches the frontend
- tools.css: add .dr-source-tag--graph variant (green pill)
- i18n.php: register 'citations' tool in all 4 languages with CIT icon
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
8-step NDJSON-streaming pipeline that compares two Barnevernet documents:
classifies each doc, extracts parties and timelines, cross-references both
for contradictions/deletions/additions, retrieves corpus legal context, and
synthesises a full discrepancy report with tabbed UI.
New files: DiscrepancyAgent.php, api/discrepancy.php, discrepancy.php,
discrepancy.js. Modified: FreeTier.php (cost=4), i18n.php (all 4 langs),
tool-svgs.php (DC icon), tools.css (dc-* component styles).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Additive-only change: new workbench.php authenticated page with guided
intake flow, evidence map, tool sequence, output checklist, and
sessionStorage-only note persistence. Dashboard and public index get
a new Case Workbench card. No existing tools, APIs, or prompts modified.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Search: category filter pills scope results to a legal domain
- Search: full chunk text returned; click to expand inline beyond 600-char excerpt
- Drill panel: total count label ("Showing X of Y"), sort dropdown, title filter (300ms debounce)
- URL hash: preserves query/mode/lang/category/drill state for bookmarking
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Live search/filter bar: filters events by keyword across event, actor, source_excerpt, date
- Actor filter chips: click to filter by actor, multi-select, teal active state
- Year/month group headers when sorted chronologically (── 2023 ──, Mar 2024 ──)
- Per-event copy button (hover-revealed 📋): copies "date · actor · event" to clipboard
- "Hide/show sources" toggle: collapses all source excerpts without re-rendering
- Count badge: "23 events · 3 actors · 2022–2025" above the list
- applyTimelineFilters() unifies sort + actor + text filters in one re-render pass
- CSV export now includes end_date column
- Reset all filter state on each new run
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- CSS: colour-coded [TAG] spans by entity type (person=pink, org=blue,
place=green, date=amber, id=purple)
- Inventory panel: collapsible list showing tag → original text mappings
with occurrence counts, sourced from new redaction_map API response key
- Before/after toggle: Redacted / Original view-switch buttons wired to
lastOriginalText captured at submission time
- One-click gpt-4o upgrade button when mini or GPU engine was used
- Backend: redaction_map built from applied LLM entities (tag → originals
+ occurrence count via substr_count on final text)
- renderResults now calls setupRedactViewToggle() after DOM is written
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Vocab textarea now shows live 0/500 char counter (turns amber at 450+)
- Animated progress bar during transcription; determinate for multi-clip, indeterminate for single
- Results card shows inline stats row (duration, language, speakers) and AI cleanup badge
- Copy button + Download TXT moved above transcript box; SRT/VTT remain below
- Speaker role legend repeats inside Segments panel for easy cross-reference
- Batch errors no longer halt the queue; remaining clips continue, failed files named in status bar
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces the one-liner workbench-attribution div with the shared footer
include so all seven tool pages (transcribe, timeline, redact, barnevernet,
advocate, deep-research, corpus) show the same compact 2-column footer as
the landing and dashboard pages.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds optional post-transcription cleanup via GPT-4o/GPT-4o-mini to fix
mishearing errors, punctuation, and domain terms. Speaker role labelling
now accepts a deployment param. Adds i18n strings for advanced options
panel (task, VAD filter, Whisper model, AI cleanup) in all four languages.
Updates BvjAnalyzerAgent and DeepResearchAgent.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add 4-step year inference rule for DD.MM. entries (scan backward/forward for anchor year)
- Add Norwegian month-name formats (18. september, den 18. september 2025, etc.) with month lookup table
- Add $relativeInstruction to tell LLM upfront when relative dates are excluded (not just PHP-filtered post-hoc)
- Define confidence calibration criteria explicitly (high/medium/low)
- Improve source_excerpt guidance: most diagnostic phrase, not just any verbatim phrase
- Add actor normalization for Norwegian institutions (Barnevernstjenesten, Fylkesnemnda, Statsforvalteren, etc.)
- Add deduplication rule for events appearing across multiple documents
- Add end_date field for date_type=period events
- Improve what_we_found schema hint to require count/range/actors/gaps
- Increase max_tokens to 8000 for azure_full (gpt-4o) to avoid truncation on large documents
- Tighten system prompt with Norwegian CPS legal chain context
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Drops Roboto + IBM Plex Mono from Google Fonts, replaces with IBM Plex
Sans (matching dobetternorge.no). Nav badge loses bordered pill, becomes
plain uppercase label with slash separator. Footer cut from 3-column
text-wall (~300 words) to compact 2-column layout (~50 words) — logo +
tagline + privacy note on left, 5 links in 2 columns on right.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Removes the logged-in vs logged-out page bifurcation. index.php now
always renders the public landing (tools overview, hero, trust section)
with auth-conditional nav/hero CTAs and a two-column member/register
gate shown only to unauthenticated visitors. Authenticated workbench
extracted to new dashboard.php. Adds 8 new i18n keys across all 4
languages and new CSS for auth-nav, hero CTA, two-column gate, and
register buttons.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Each landing card now links to preview.php?tool=SLUG — a dedicated
public page with an expanded pitch, 4 capability bullets, and a
realistic Norwegian-language sample input+output for all 7 tools.
- preview.php — new public page (no auth required), switch-driven content
- includes/tool-svgs.php — extracted $toolSvgs into shared include
- index.php — require tool-svgs.php, card href → preview.php?tool=SLUG
- assets/css/tools.css — lt-preview-* component styles appended
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
External URL was unreachable from tools subdomain (CSP or cross-origin block),
causing a grey placeholder rectangle. Logo now served from assets/images/ and
brightness/invert filter removed — logo is white-on-transparent, displays
correctly on dark nav and footer without filtering.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add sticky navy nav with logo-header.webp, Legal Tools badge, lang switcher, red CTA
- Replace showcase-hero with full-bleed dark hero (Crimson Pro, IBM Plex Mono, stat pills)
- Redesign tool cards: 3-col grid, 178px illustrated SVG art per card (7 unique illustrations)
- Add lt-trust 3-col strip and lt-access navy gate panel
- Rebuild footer with 3-col navy layout matching main site
- Add Crimson Pro / Roboto / IBM Plex Mono Google Fonts via <link> + @import
- CSS: new lt-* variables, all new landing component styles appended to tools.css
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copies GcpSpeechClient into the tools repo so it's deployed with the code;
removes the broken dbnToolsAiPortalRoot() path that resolved to a nonexistent
/home/dobetternorge/ai-portal directory. Also restarted the CPU Whisper
service which had a stuck CLOSE_WAIT socket causing silent fetch failures.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Removes user-facing engine/model/key/beam controls. The server now picks
the best available engine automatically:
1. Microsoft Azure Speech — short clips (≤1MB, no diarization, audio/*)
2. Google Cloud Speech v2 — long audio, diarization, all languages
3. OpenAI Whisper GPU — local fallback
Results display which provider was used (e.g. "Transcribed with Google
Cloud Speech") via transcript-engine-badge and traceMeta.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Prompt now instructs the model to extract time of day (HH:MM) when
present in Norwegian formats: kl. 14:30, kl 09.00, 14:30, 14.30.
renderTimeline shows time as a muted inline annotation next to the date.
CSV export gains a Time column after Date.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Public landing page at / for unauthenticated users (EN/NO/UK/PL)
- Authenticated / shows Case Workbench dashboard with manifesto strip,
stats, and launched-tool grid (Transcribe, Timeline, BVJ, Advocate,
Deep Research, Corpus)
- Added includes/i18n.php with full 4-language translation layer
- Extended layout.php to Case Workbench shell with tool rail, lang switcher
- AI output language normalization extended to en/no/uk/pl in PHP agents
- SSO token validation in bootstrap.php / index.php (dobetternorge.no bridge)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
dbn-legal-agent is not suitable for structured RAG synthesis:
- Fine-tune contamination appends feedback loops after JSON output
- 7-min latency vs 45s for gpt-4o-mini
- 8B base gives weaker instruction-following on complex JSON contracts
- No improvement in citation accuracy (RAG provides the legal content)
dbn-legal-agent kept for open-ended freeform Norwegian legal Q&A
where citation structure isn't required. BVJ synthesis now uses
azure_mini|azure_full|gpu only.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Party extraction: wider excerpt (12k chars), cleaner prompt, fallback for
root-level array responses, log raw response on unexpected structure.
dbn-legal-agent synthesis: replace blocking curl (200s timeout) with an
SSE streaming approach (CURLOPT_WRITEFUNCTION). PHP now emits keepalive
progress events every 15 s during generation, preventing browser network
errors on slow ~6 t/s cuttlefish inference. Timeout extended to 660 s.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
7-step agent pipeline: document classification, party extraction, timeline
extraction, corpus RAG (child_welfare/echr/family_core/bufdir_guidance),
and synthesis using the user's chosen engine (including dbn-legal-agent).
Progressive NDJSON streaming renders doc_meta, parties, and timeline cards
before the final advocacy brief and procedural red flags arrive.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Source modal now shows LLM-generated document summary (lazy-gen + cached
in documents.summary) instead of raw chunk text; toggle reveals matched
chunk; "View all chunks" button fetches every chunk of the document via
new api/document-chunks.php endpoint
- Each sub-question card gets a "Branch ↓" button that pre-fills the query
with that sub-question and shows a context panel with the prior brief
summary; prior_context + branch_notes are injected into interpretSeed()
and synthesise() so the LLM knows where the research is coming from
- Upload document summaries generated at synthesis time and attached to
upload sources alongside corpus summaries
- DB: documents.summary TEXT column added to bnl_corpus on chloe
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace combined echr_hague slice with echr (Art.8+9, HUDOC, NIM) and hague (INCADAT,
cross-border abduction) as separate toggles; echr defaults ON, hague defaults OFF
- Add norwegian_courts slice: Domstol (src 5,26) + Rettspraksis.no (src 33, 482 docs)
- Add bufdir_guidance slice: Barneombudet (19), Bufdir (20), Statsforvalteren (31)
- Add dbn_resources slice: DBN website pages (flashcards, resource directory), defaults OFF
- Replace isWebsiteChunk() with slice-aware shouldExcludeChunk(): always strips EU AI Act
chunks (EUR-Lex source 7 leaks through when Qdrant runs unconstrained) and DBN website
pages unless dbn_resources slice is explicitly ON
- Update SLICE_DEFS in advocate.js and deep-research.js to match all 8 slices
- Backward compat: echr_hague key in incoming requests fans out to echr+hague
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
health.php: Add azure_search check — calls /$count endpoint and
reports doc count in the index. Reads DBN_AZURE_SEARCH_{ENDPOINT,KEY,INDEX}.
corpus-search.php: Add azure mode — semantic + vector hybrid search
via Azure AI Search bnl-legal-v2. Embeds query with LiteLLM
nomic-embed-text; expands keepCats to include government-policy,
health-law, social-services, labour-law, immigration (previously
blocked by contamination workaround, now safe to include).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
MariaDB rejects ? placeholders for LIMIT/OFFSET when emulate_prepares=false.
Interpolate $limit and $offset as ints directly into SQL strings in both
corpus-documents.php and corpus-search.php BM25 paths.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New /advocate.php tab: user selects who they represent (biological
father, mother, foster carer, CWS, etc.) and the agent takes their
side entirely. Adversarial sub-questions target supporting Lovdata
statutes + ECHR precedents; synthesis returns client_strengths[] and
opposing_weaknesses[] alongside the advocate brief.
- DeepResearchAgent: add advocateRole param to run(), interpretSeed(),
expandQueries(), synthesise(). Neutral path unchanged (empty string).
- api/deep-research.php: extract + validate advocate_role from payload;
telemetry logs tool='advocate' vs 'deep_research'.
- advocate.php: new page with role dropdown (presets + custom), same
corpus slices/engine/controls/upload zone as deep research.
- assets/js/advocate.js: page-scoped JS; renders advocate banner,
client strengths card (teal), advocate brief, opposing weaknesses
card (amber), sub-Q cards, sources, uncertainty, next step.
- assets/css/tools.css: append .adv-* rules (~120 lines).
- includes/layout.php: add Advocate nav tab between Deep research and
Summarize.
- index.php: add Advocate cap-card tile.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
BM25: adds NOT LIKE filter to SQL WHERE in both FULLTEXT and LIKE paths.
Hybrid + Vector: post-filter hits array by source_url after results return.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- api/corpus-search.php: new endpoint with three search modes (hybrid RAG, BM25 keyword, Qdrant vector)
- api/corpus-documents.php: paginated document browser by category or source name
- corpus.php: search bar with mode+language pills, Browse docs button on each category card with drill-down panel, expand toggle on each source row showing doc count and scraper class
- tools.css: all new corpus interactive styles appended
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Embed timeout: bnl_corpus Ollama embeds ~49 chunks sequentially in CPU mode,
easily exceeding the 60s cURL timeout. Now truncates upload text to
MAX_UPLOAD_CHARS before chunking (~21 chunks max) and embeds in batches of 5
with a progress flush between batches to keep the stream alive.
SQL error: bnl_corpus.documents lacks the temporal columns added in migration
136 (valid_from, valid_until, etc.). dbnV6QueryDocumentMeta uses IFNULL which
doesn't protect against missing columns. Replaced with a direct query using
only the columns confirmed to exist on this instance.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds /corpus.php — a data transparency page showing what powers the
legal tools: 9 coverage categories with live doc counts, a full
sources table pulled from the corpus DB, the AI stack (LLMs, Whisper,
Qdrant, Azure AI Search, embeddings, chunking), and a pipeline flow
diagram. Stats are live via a new /api/corpus-stats.php endpoint
(queries dobetter_rag + bnl_admin). The reasoning sidebar is repurposed
as a Corpus health panel on this page.
Also ships the in-progress timeline background events toggle:
API and UI wired together via include_background param.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
AiGateway uses getenv(LITELLM_MASTER_KEY) + stream_context HTTP which was
failing on the chloe virtualhost process. New dbnToolsLiteLLMEmbedBatch()
helper mirrors dbnToolsCallGpuLlm — hardcoded URL + key, cURL-first, same
pattern already proven for LLM calls. Removes AiGateway dependency from
DeepResearchAgent entirely.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three user-flagged issues after the first real run with a 920KB sakkyndig PDF:
1. dobetternorge.no marketing-website chunks leaked into the retrieval pool.
ClientRagPipeline::searchAll defaults include_beta_website=true; we now
pass false for both website flags, AND defensively drop any returned
chunk whose source_name contains "website" or title contains
"dobetternorge.no" before it can pollute synthesis.
2. Brief returned was "just a paragraph". Bumped synthesis max_tokens
2200→3200, raised timeout 120→180s, and rewrote the prompt to require
400-900 words with min 4 paragraphs when source_count>=3, covering EACH
sub-question in its own paragraph. Now also passes authority + jurisdiction
into the sources block so the model can pinpoint statutes correctly.
3. No way to see what each "sub-question agent" researched or click through
to the source articles. Restructured the results panel so per-sub-question
report cards now render ABOVE the synthesised brief. Each report shows the
question, the rationale, and the top 3 retrieved sources for that sub-Q
with title→deep link + 1-line excerpt. Brief follows. Consolidated
numbered sources list at the bottom, with titles as deep links too.
Deep-link construction: source_url is hydrated via dbnV6QueryDocumentMeta
in a single batched call after retrieval. For Lovdata sources with a
section_title containing §<n>, the link is path-anchored to that section
(/§43). For other hosts (HUDOC, Regjeringen, Bufdir, etc.) we link to the
document root URL.
Telemetry: trace_metadata now carries retrieval_counts {raw_corpus,
filtered_website, post_filter_corpus, raw_upload, after_dedupe, after_topk}
so future regressions are diagnosable from the metadata.jsonl log alone.
The completion status pill surfaces the corpus/website/upload split.
Previously the endpoint returned a single JSON object at the end. Apache+
PHP-FPM buffers the entire body until PHP exits, so a 160s azure_full run
caused the browser to drop the fetch as "Failed to fetch" while the server
was still synthesising — the response then arrived to a dead socket.
Switch to application/x-ndjson with one event per line. The endpoint emits
'progress', 'start', 'step' (running/complete/warning/error), 'subq', and a
final 'final' event carrying the full result payload. Output buffering is
explicitly disabled so each line flushes through Apache as soon as the
agent emits it.
DbnDeepResearchAgent::run() now accepts an optional ?callable $emit and
fires step:running before each step + step:complete after, plus a subq
event per sub-question retrieval round.
JS reads response.body as a stream, splits on newlines, updates the
trace panel live, and renders the final result when the final event
arrives. Status pill shows live progress detail (e.g. "Synthesising with
Azure gpt-4o — this is the slowest step…").
Engine row in the form now shows expected duration per engine
(~15-45s mini, ~60-180s full, ~30-90s GPU) so users know what they're in
for before clicking Run.
New surface at /deep-research.php where the user pastes a question or
uploads PDF/DOCX/TXT case files and a LLM-orchestrated agent researches
the Do Better Norge legal corpus from 3-5 angles, with hybrid retrieval,
cross-encoder rerank, and synthesis that emits an inline-[n]-cited
markdown brief plus a numbered sources panel.
Uploaded documents are chunked + embedded in memory only (nomic-embed-text
via LiteLLM) and searched alongside the shared corpus during the same
request — never persisted to disk, DB, or Qdrant.
Reuses ClientRagPipeline::searchAll (hybrid + rerank), dbnV6 slice
helpers, and the existing extract.php text-extraction logic via a new
dbnToolsExtractUploadedFile() helper. Also adds dbnToolsCallGpuLlm()
helper in bootstrap.php — fixes a latent bug where LegalTools.php
was already calling that name with no definition.
Search.php is unchanged.
The .env default DBN_AZURE_OPENAI_CHAT_DEPLOYMENT is gpt-4o, so the
azure_mini branch (which just called ->chat() without withDeployment)
was silently hitting gpt-4o too. Both UI engine options resolved to
the same model, and timed out together on long Norwegian documents.
Fix: explicitly route azure_mini → gpt-4o-mini in both timeline and
redact paths.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Timeline was using no explicit timeout, falling back to the gateway's
45s default, which timed out on long Norwegian legal documents.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>