Files
dobetternorge-tools/assets/js
daveadmin e130db8119 Deep Research v2: exclude marketing site, deep-link sources, per-agent reports
Three user-flagged issues after the first real run with a 920KB sakkyndig PDF:

1. dobetternorge.no marketing-website chunks leaked into the retrieval pool.
   ClientRagPipeline::searchAll defaults include_beta_website=true; we now
   pass false for both website flags, AND defensively drop any returned
   chunk whose source_name contains "website" or title contains
   "dobetternorge.no" before it can pollute synthesis.

2. Brief returned was "just a paragraph". Bumped synthesis max_tokens
   2200→3200, raised timeout 120→180s, and rewrote the prompt to require
   400-900 words with min 4 paragraphs when source_count>=3, covering EACH
   sub-question in its own paragraph. Now also passes authority + jurisdiction
   into the sources block so the model can pinpoint statutes correctly.

3. No way to see what each "sub-question agent" researched or click through
   to the source articles. Restructured the results panel so per-sub-question
   report cards now render ABOVE the synthesised brief. Each report shows the
   question, the rationale, and the top 3 retrieved sources for that sub-Q
   with title→deep link + 1-line excerpt. Brief follows. Consolidated
   numbered sources list at the bottom, with titles as deep links too.

Deep-link construction: source_url is hydrated via dbnV6QueryDocumentMeta
in a single batched call after retrieval. For Lovdata sources with a
section_title containing §<n>, the link is path-anchored to that section
(/§43). For other hosts (HUDOC, Regjeringen, Bufdir, etc.) we link to the
document root URL.

Telemetry: trace_metadata now carries retrieval_counts {raw_corpus,
filtered_website, post_filter_corpus, raw_upload, after_dedupe, after_topk}
so future regressions are diagnosable from the metadata.jsonl log alone.
The completion status pill surfaces the corpus/website/upload split.
2026-05-15 11:12:13 +02:00
..