Add Deep Research tool — agent + rank/rerank RAG

New surface at /deep-research.php where the user pastes a question or
uploads PDF/DOCX/TXT case files and a LLM-orchestrated agent researches
the Do Better Norge legal corpus from 3-5 angles, with hybrid retrieval,
cross-encoder rerank, and synthesis that emits an inline-[n]-cited
markdown brief plus a numbered sources panel.

Uploaded documents are chunked + embedded in memory only (nomic-embed-text
via LiteLLM) and searched alongside the shared corpus during the same
request — never persisted to disk, DB, or Qdrant.

Reuses ClientRagPipeline::searchAll (hybrid + rerank), dbnV6 slice
helpers, and the existing extract.php text-extraction logic via a new
dbnToolsExtractUploadedFile() helper. Also adds dbnToolsCallGpuLlm()
helper in bootstrap.php — fixes a latent bug where LegalTools.php
was already calling that name with no definition.

Search.php is unchanged.
This commit is contained in:
2026-05-15 10:30:47 +02:00
parent 55e11cb649
commit 4cbe0a4ac4
10 changed files with 2119 additions and 125 deletions
+162
View File
@@ -0,0 +1,162 @@
<?php
declare(strict_types=1);
$toolName = 'deep-research';
$toolTitle = 'Deep Research';
$toolKind = 'Agent + Rank/Rerank RAG';
$toolBadge = 'family-legal';
$extraScripts = ['assets/js/deep-research.js'];
require_once __DIR__ . '/includes/layout.php';
?>
<form id="deepResearchForm" class="tool-form deep-research" enctype="multipart/form-data">
<div class="lang-switcher" id="drLangSwitcher" role="group" aria-label="UI language">
<button type="button" class="lang-btn is-active" data-lang="en">&#127468;&#127463; EN</button>
<button type="button" class="lang-btn" data-lang="no">&#127475;&#127476; NO</button>
</div>
<div class="control-row" id="drEngineControl">
<span class="control-label">Engine</span>
<label><input type="radio" name="drEngine" value="azure_mini" checked> Azure gpt-4o-mini &#9733; <small class="control-hint">(fast)</small></label>
<label><input type="radio" name="drEngine" value="azure_full"> Azure gpt-4o <small class="control-hint">(best)</small></label>
<label><input type="radio" name="drEngine" value="gpu"> GPU (cuttlefish) <small class="control-hint">(local)</small></label>
</div>
<p class="upload-hint">Azure engines use your BNL Azure credits. GPU runs qwen2.5:14b via LiteLLM on cuttlefish.</p>
<div class="dr-slice-section">
<p class="control-label">Corpus slices</p>
<p class="upload-hint">Select which slices of the Do Better Norge legal corpus the agent searches. Toggle Broader Legal on when the question reaches beyond family law.</p>
<div class="dr-slice-grid">
<button type="button" class="dr-slice is-on" data-slice="family_core" aria-pressed="true">
<div class="dr-slice__head">
<span class="dr-slice__title">Family Law Core</span>
<span class="dr-slice__badge">on</span>
</div>
<p class="dr-slice__tagline">Barneloven, custody, samvær, mediation</p>
</button>
<button type="button" class="dr-slice is-on" data-slice="child_welfare" aria-pressed="true">
<div class="dr-slice__head">
<span class="dr-slice__title">Child Welfare</span>
<span class="dr-slice__badge">on</span>
</div>
<p class="dr-slice__tagline">Barnevern, omsorgsovertakelse, foster care</p>
</button>
<button type="button" class="dr-slice is-on" data-slice="echr_hague" aria-pressed="true">
<div class="dr-slice__head">
<span class="dr-slice__title">ECHR and Hague</span>
<span class="dr-slice__badge">on</span>
</div>
<p class="dr-slice__tagline">Article 8, EMD, HCCH, cross-border family</p>
</button>
<button type="button" class="dr-slice" data-slice="broader_legal" aria-pressed="false">
<div class="dr-slice__head">
<span class="dr-slice__title">Broader Legal Support</span>
<span class="dr-slice__badge">off</span>
</div>
<p class="dr-slice__tagline">Arbeidsmiljøloven, NOUer, statutes, government background</p>
</button>
</div>
</div>
<details class="advanced-panel" id="drAdvanced">
<summary class="advanced-toggle">Advanced controls</summary>
<div class="dr-control-grid">
<div class="dr-control-card">
<label>Sub-questions <span id="drSubQValue" class="dr-control-value">4</span></label>
<input type="range" id="drSubQ" min="3" max="5" step="1" value="4">
<small>How many angles the agent expands the question into before retrieval.</small>
</div>
<div class="dr-control-card">
<label>Chunks / sub-Q <span id="drChunkLimitValue" class="dr-control-value">6</span></label>
<input type="range" id="drChunkLimit" min="4" max="10" step="1" value="6">
<small>How many corpus chunks the hybrid retriever pulls per sub-question.</small>
</div>
<div class="dr-control-card">
<label>Similarity floor <span id="drSimValue" class="dr-control-value">0.30</span></label>
<input type="range" id="drSim" min="0.20" max="0.60" step="0.05" value="0.30">
<small>Minimum cosine similarity for uploaded-doc chunks to count as a match.</small>
</div>
<div class="dr-control-card">
<label>Sources kept <span id="drTopKValue" class="dr-control-value">12</span></label>
<input type="range" id="drTopK" min="8" max="14" step="1" value="12">
<small>Top sources kept after dedupe + rerank to feed synthesis.</small>
</div>
<div class="dr-control-card">
<label>Temperature <span id="drTempValue" class="dr-control-value">0.15</span></label>
<input type="range" id="drTemp" min="0.05" max="0.40" step="0.05" value="0.15">
<small>Synthesis creativity. Keep low for grounded legal briefs.</small>
</div>
</div>
</details>
<div class="upload-zone" id="drUploadZone" role="region" aria-label="File upload">
<input type="file" id="drUploadInput" multiple accept=".pdf,.docx,.txt" aria-label="Choose files">
<div id="drUploadPrompt" class="upload-prompt">
<span class="upload-icon" aria-hidden="true">&#8679;</span>
<p>Drop up to 5 case files here, or <label for="drUploadInput" class="upload-browse">browse</label></p>
<p class="upload-hint"><strong>PDF</strong>, <strong>DOCX</strong>, <strong>TXT</strong> &mdash; chunked + embedded in memory only, never stored.</p>
</div>
<div id="drUploadFileInfo" class="upload-file is-hidden">
<ul id="drUploadFileList" class="upload-file-list"></ul>
<button type="button" id="drUploadClear" class="upload-clear">&times; Clear</button>
</div>
</div>
<label class="input-label" for="drInput">Question or pasted text</label>
<textarea id="drInput" name="drInput" rows="8" placeholder="Describe the legal question, paste case notes, or both. The agent will research the corpus from 35 angles."></textarea>
<div class="form-footer">
<p id="drStatus" class="form-status" role="status" aria-live="polite"></p>
<button id="drRunButton" type="submit">Run deep research</button>
</div>
</form>
<section id="drResults" class="results deep-research-results" aria-live="polite">
<div class="empty-state">
<h3>Ready</h3>
<p>Pick slices, drop a case file or paste a question, then run. The agent will expand the question, retrieve from the corpus + your upload, rerank, and synthesise a cited brief.</p>
</div>
</section>
<!-- Source modal -->
<div id="drSourceModal" class="dr-source-modal is-hidden" role="dialog" aria-modal="true" aria-labelledby="drSourceModalTitle">
<div class="dr-source-modal__dialog">
<header class="dr-source-modal__head">
<div>
<p class="eyebrow" id="drSourceModalEyebrow">Source</p>
<h3 id="drSourceModalTitle"></h3>
</div>
<button type="button" id="drSourceModalClose" class="upload-clear" aria-label="Close">&times;</button>
</header>
<div class="dr-source-modal__body">
<aside class="dr-source-modal__meta" id="drSourceModalMeta"></aside>
<div class="dr-source-modal__text" id="drSourceModalText"></div>
</div>
</div>
</div>
<!-- Hidden stubs so tools.js element refs don't crash on this page -->
<div class="is-hidden" id="languageControl" aria-hidden="true"><input type="radio" name="language" value="en" checked></div>
<div class="is-hidden" id="redactionControl" aria-hidden="true"></div>
<div class="is-hidden" id="audioZone" aria-hidden="true">
<input type="file" id="audioInput" style="display:none">
<div id="audioPrompt"></div>
<div id="audioFileInfo"><ol id="audioQueueList"></ol><button type="button" id="audioClear"></button></div>
</div>
<div class="is-hidden" id="diarizeControl" aria-hidden="true">
<input type="checkbox" id="diarizeCheck">
<input type="number" id="numSpeakersInput">
</div>
<div class="is-hidden" id="transcribeLangControl" aria-hidden="true"><input type="radio" name="transcribeLang" value="no" checked></div>
<div class="is-hidden" id="vocabControl" aria-hidden="true">
<div id="vocabPresets"></div>
<textarea id="initPromptInput"></textarea>
</div>
<div class="is-hidden" id="aliasSection" aria-hidden="true">
<button type="button" id="addAliasRow"></button>
<div id="aliasRows"></div>
</div>
<div class="is-hidden" id="exemptSection" aria-hidden="true">
<button type="button" id="addExemptRow"></button>
<div id="exemptRows"></div>
</div>
<?php require_once __DIR__ . '/includes/layout_footer.php'; ?>