Files
dobetternorge-tools/deep-research.php
T
daveadmin 4cbe0a4ac4 Add Deep Research tool — agent + rank/rerank RAG
New surface at /deep-research.php where the user pastes a question or
uploads PDF/DOCX/TXT case files and a LLM-orchestrated agent researches
the Do Better Norge legal corpus from 3-5 angles, with hybrid retrieval,
cross-encoder rerank, and synthesis that emits an inline-[n]-cited
markdown brief plus a numbered sources panel.

Uploaded documents are chunked + embedded in memory only (nomic-embed-text
via LiteLLM) and searched alongside the shared corpus during the same
request — never persisted to disk, DB, or Qdrant.

Reuses ClientRagPipeline::searchAll (hybrid + rerank), dbnV6 slice
helpers, and the existing extract.php text-extraction logic via a new
dbnToolsExtractUploadedFile() helper. Also adds dbnToolsCallGpuLlm()
helper in bootstrap.php — fixes a latent bug where LegalTools.php
was already calling that name with no definition.

Search.php is unchanged.
2026-05-15 10:30:47 +02:00

163 lines
11 KiB
PHP
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<?php
declare(strict_types=1);
$toolName = 'deep-research';
$toolTitle = 'Deep Research';
$toolKind = 'Agent + Rank/Rerank RAG';
$toolBadge = 'family-legal';
$extraScripts = ['assets/js/deep-research.js'];
require_once __DIR__ . '/includes/layout.php';
?>
<form id="deepResearchForm" class="tool-form deep-research" enctype="multipart/form-data">
<div class="lang-switcher" id="drLangSwitcher" role="group" aria-label="UI language">
<button type="button" class="lang-btn is-active" data-lang="en">&#127468;&#127463; EN</button>
<button type="button" class="lang-btn" data-lang="no">&#127475;&#127476; NO</button>
</div>
<div class="control-row" id="drEngineControl">
<span class="control-label">Engine</span>
<label><input type="radio" name="drEngine" value="azure_mini" checked> Azure gpt-4o-mini &#9733; <small class="control-hint">(fast)</small></label>
<label><input type="radio" name="drEngine" value="azure_full"> Azure gpt-4o <small class="control-hint">(best)</small></label>
<label><input type="radio" name="drEngine" value="gpu"> GPU (cuttlefish) <small class="control-hint">(local)</small></label>
</div>
<p class="upload-hint">Azure engines use your BNL Azure credits. GPU runs qwen2.5:14b via LiteLLM on cuttlefish.</p>
<div class="dr-slice-section">
<p class="control-label">Corpus slices</p>
<p class="upload-hint">Select which slices of the Do Better Norge legal corpus the agent searches. Toggle Broader Legal on when the question reaches beyond family law.</p>
<div class="dr-slice-grid">
<button type="button" class="dr-slice is-on" data-slice="family_core" aria-pressed="true">
<div class="dr-slice__head">
<span class="dr-slice__title">Family Law Core</span>
<span class="dr-slice__badge">on</span>
</div>
<p class="dr-slice__tagline">Barneloven, custody, samvær, mediation</p>
</button>
<button type="button" class="dr-slice is-on" data-slice="child_welfare" aria-pressed="true">
<div class="dr-slice__head">
<span class="dr-slice__title">Child Welfare</span>
<span class="dr-slice__badge">on</span>
</div>
<p class="dr-slice__tagline">Barnevern, omsorgsovertakelse, foster care</p>
</button>
<button type="button" class="dr-slice is-on" data-slice="echr_hague" aria-pressed="true">
<div class="dr-slice__head">
<span class="dr-slice__title">ECHR and Hague</span>
<span class="dr-slice__badge">on</span>
</div>
<p class="dr-slice__tagline">Article 8, EMD, HCCH, cross-border family</p>
</button>
<button type="button" class="dr-slice" data-slice="broader_legal" aria-pressed="false">
<div class="dr-slice__head">
<span class="dr-slice__title">Broader Legal Support</span>
<span class="dr-slice__badge">off</span>
</div>
<p class="dr-slice__tagline">Arbeidsmiljøloven, NOUer, statutes, government background</p>
</button>
</div>
</div>
<details class="advanced-panel" id="drAdvanced">
<summary class="advanced-toggle">Advanced controls</summary>
<div class="dr-control-grid">
<div class="dr-control-card">
<label>Sub-questions <span id="drSubQValue" class="dr-control-value">4</span></label>
<input type="range" id="drSubQ" min="3" max="5" step="1" value="4">
<small>How many angles the agent expands the question into before retrieval.</small>
</div>
<div class="dr-control-card">
<label>Chunks / sub-Q <span id="drChunkLimitValue" class="dr-control-value">6</span></label>
<input type="range" id="drChunkLimit" min="4" max="10" step="1" value="6">
<small>How many corpus chunks the hybrid retriever pulls per sub-question.</small>
</div>
<div class="dr-control-card">
<label>Similarity floor <span id="drSimValue" class="dr-control-value">0.30</span></label>
<input type="range" id="drSim" min="0.20" max="0.60" step="0.05" value="0.30">
<small>Minimum cosine similarity for uploaded-doc chunks to count as a match.</small>
</div>
<div class="dr-control-card">
<label>Sources kept <span id="drTopKValue" class="dr-control-value">12</span></label>
<input type="range" id="drTopK" min="8" max="14" step="1" value="12">
<small>Top sources kept after dedupe + rerank to feed synthesis.</small>
</div>
<div class="dr-control-card">
<label>Temperature <span id="drTempValue" class="dr-control-value">0.15</span></label>
<input type="range" id="drTemp" min="0.05" max="0.40" step="0.05" value="0.15">
<small>Synthesis creativity. Keep low for grounded legal briefs.</small>
</div>
</div>
</details>
<div class="upload-zone" id="drUploadZone" role="region" aria-label="File upload">
<input type="file" id="drUploadInput" multiple accept=".pdf,.docx,.txt" aria-label="Choose files">
<div id="drUploadPrompt" class="upload-prompt">
<span class="upload-icon" aria-hidden="true">&#8679;</span>
<p>Drop up to 5 case files here, or <label for="drUploadInput" class="upload-browse">browse</label></p>
<p class="upload-hint"><strong>PDF</strong>, <strong>DOCX</strong>, <strong>TXT</strong> &mdash; chunked + embedded in memory only, never stored.</p>
</div>
<div id="drUploadFileInfo" class="upload-file is-hidden">
<ul id="drUploadFileList" class="upload-file-list"></ul>
<button type="button" id="drUploadClear" class="upload-clear">&times; Clear</button>
</div>
</div>
<label class="input-label" for="drInput">Question or pasted text</label>
<textarea id="drInput" name="drInput" rows="8" placeholder="Describe the legal question, paste case notes, or both. The agent will research the corpus from 35 angles."></textarea>
<div class="form-footer">
<p id="drStatus" class="form-status" role="status" aria-live="polite"></p>
<button id="drRunButton" type="submit">Run deep research</button>
</div>
</form>
<section id="drResults" class="results deep-research-results" aria-live="polite">
<div class="empty-state">
<h3>Ready</h3>
<p>Pick slices, drop a case file or paste a question, then run. The agent will expand the question, retrieve from the corpus + your upload, rerank, and synthesise a cited brief.</p>
</div>
</section>
<!-- Source modal -->
<div id="drSourceModal" class="dr-source-modal is-hidden" role="dialog" aria-modal="true" aria-labelledby="drSourceModalTitle">
<div class="dr-source-modal__dialog">
<header class="dr-source-modal__head">
<div>
<p class="eyebrow" id="drSourceModalEyebrow">Source</p>
<h3 id="drSourceModalTitle"></h3>
</div>
<button type="button" id="drSourceModalClose" class="upload-clear" aria-label="Close">&times;</button>
</header>
<div class="dr-source-modal__body">
<aside class="dr-source-modal__meta" id="drSourceModalMeta"></aside>
<div class="dr-source-modal__text" id="drSourceModalText"></div>
</div>
</div>
</div>
<!-- Hidden stubs so tools.js element refs don't crash on this page -->
<div class="is-hidden" id="languageControl" aria-hidden="true"><input type="radio" name="language" value="en" checked></div>
<div class="is-hidden" id="redactionControl" aria-hidden="true"></div>
<div class="is-hidden" id="audioZone" aria-hidden="true">
<input type="file" id="audioInput" style="display:none">
<div id="audioPrompt"></div>
<div id="audioFileInfo"><ol id="audioQueueList"></ol><button type="button" id="audioClear"></button></div>
</div>
<div class="is-hidden" id="diarizeControl" aria-hidden="true">
<input type="checkbox" id="diarizeCheck">
<input type="number" id="numSpeakersInput">
</div>
<div class="is-hidden" id="transcribeLangControl" aria-hidden="true"><input type="radio" name="transcribeLang" value="no" checked></div>
<div class="is-hidden" id="vocabControl" aria-hidden="true">
<div id="vocabPresets"></div>
<textarea id="initPromptInput"></textarea>
</div>
<div class="is-hidden" id="aliasSection" aria-hidden="true">
<button type="button" id="addAliasRow"></button>
<div id="aliasRows"></div>
</div>
<div class="is-hidden" id="exemptSection" aria-hidden="true">
<button type="button" id="addExemptRow"></button>
<div id="exemptRows"></div>
</div>
<?php require_once __DIR__ . '/includes/layout_footer.php'; ?>