e130db8119
Three user-flagged issues after the first real run with a 920KB sakkyndig PDF:
1. dobetternorge.no marketing-website chunks leaked into the retrieval pool.
ClientRagPipeline::searchAll defaults include_beta_website=true; we now
pass false for both website flags, AND defensively drop any returned
chunk whose source_name contains "website" or title contains
"dobetternorge.no" before it can pollute synthesis.
2. Brief returned was "just a paragraph". Bumped synthesis max_tokens
2200→3200, raised timeout 120→180s, and rewrote the prompt to require
400-900 words with min 4 paragraphs when source_count>=3, covering EACH
sub-question in its own paragraph. Now also passes authority + jurisdiction
into the sources block so the model can pinpoint statutes correctly.
3. No way to see what each "sub-question agent" researched or click through
to the source articles. Restructured the results panel so per-sub-question
report cards now render ABOVE the synthesised brief. Each report shows the
question, the rationale, and the top 3 retrieved sources for that sub-Q
with title→deep link + 1-line excerpt. Brief follows. Consolidated
numbered sources list at the bottom, with titles as deep links too.
Deep-link construction: source_url is hydrated via dbnV6QueryDocumentMeta
in a single batched call after retrieval. For Lovdata sources with a
section_title containing §<n>, the link is path-anchored to that section
(/§43). For other hosts (HUDOC, Regjeringen, Bufdir, etc.) we link to the
document root URL.
Telemetry: trace_metadata now carries retrieval_counts {raw_corpus,
filtered_website, post_filter_corpus, raw_upload, after_dedupe, after_topk}
so future regressions are diagnosable from the metadata.jsonl log alone.
The completion status pill surfaces the corpus/website/upload split.