dobetternorge-tools

daveadmin/dobetternorge-tools

Fork 0

Commit Graph

Author	SHA1	Message	Date
daveadmin	7bccd8c010	Expand corpus slices to 8: split ECHR/Hague, add Norwegian Courts, Bufdir, DBN Resources - Replace combined echr_hague slice with echr (Art.8+9, HUDOC, NIM) and hague (INCADAT, cross-border abduction) as separate toggles; echr defaults ON, hague defaults OFF - Add norwegian_courts slice: Domstol (src 5,26) + Rettspraksis.no (src 33, 482 docs) - Add bufdir_guidance slice: Barneombudet (19), Bufdir (20), Statsforvalteren (31) - Add dbn_resources slice: DBN website pages (flashcards, resource directory), defaults OFF - Replace isWebsiteChunk() with slice-aware shouldExcludeChunk(): always strips EU AI Act chunks (EUR-Lex source 7 leaks through when Qdrant runs unconstrained) and DBN website pages unless dbn_resources slice is explicitly ON - Update SLICE_DEFS in advocate.js and deep-research.js to match all 8 slices - Backward compat: echr_hague key in incoming requests fans out to echr+hague Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 16:01:05 +02:00
daveadmin	a1a7f442a7	Deep Research: NDJSON streaming so the connection survives long runs Previously the endpoint returned a single JSON object at the end. Apache+ PHP-FPM buffers the entire body until PHP exits, so a 160s azure_full run caused the browser to drop the fetch as "Failed to fetch" while the server was still synthesising — the response then arrived to a dead socket. Switch to application/x-ndjson with one event per line. The endpoint emits 'progress', 'start', 'step' (running/complete/warning/error), 'subq', and a final 'final' event carrying the full result payload. Output buffering is explicitly disabled so each line flushes through Apache as soon as the agent emits it. DbnDeepResearchAgent::run() now accepts an optional ?callable $emit and fires step:running before each step + step:complete after, plus a subq event per sub-question retrieval round. JS reads response.body as a stream, splits on newlines, updates the trace panel live, and renders the final result when the final event arrives. Status pill shows live progress detail (e.g. "Synthesising with Azure gpt-4o — this is the slowest step…"). Engine row in the form now shows expected duration per engine (~15-45s mini, ~60-180s full, ~30-90s GPU) so users know what they're in for before clicking Run.	2026-05-15 10:47:35 +02:00
daveadmin	4cbe0a4ac4	Add Deep Research tool — agent + rank/rerank RAG New surface at /deep-research.php where the user pastes a question or uploads PDF/DOCX/TXT case files and a LLM-orchestrated agent researches the Do Better Norge legal corpus from 3-5 angles, with hybrid retrieval, cross-encoder rerank, and synthesis that emits an inline-[n]-cited markdown brief plus a numbered sources panel. Uploaded documents are chunked + embedded in memory only (nomic-embed-text via LiteLLM) and searched alongside the shared corpus during the same request — never persisted to disk, DB, or Qdrant. Reuses ClientRagPipeline::searchAll (hybrid + rerank), dbnV6 slice helpers, and the existing extract.php text-extraction logic via a new dbnToolsExtractUploadedFile() helper. Also adds dbnToolsCallGpuLlm() helper in bootstrap.php — fixes a latent bug where LegalTools.php was already calling that name with no definition. Search.php is unchanged.	2026-05-15 10:30:47 +02:00

Author

SHA1

Message

Date

daveadmin

7bccd8c010

Expand corpus slices to 8: split ECHR/Hague, add Norwegian Courts, Bufdir, DBN Resources

- Replace combined echr_hague slice with echr (Art.8+9, HUDOC, NIM) and hague (INCADAT,
  cross-border abduction) as separate toggles; echr defaults ON, hague defaults OFF
- Add norwegian_courts slice: Domstol (src 5,26) + Rettspraksis.no (src 33, 482 docs)
- Add bufdir_guidance slice: Barneombudet (19), Bufdir (20), Statsforvalteren (31)
- Add dbn_resources slice: DBN website pages (flashcards, resource directory), defaults OFF
- Replace isWebsiteChunk() with slice-aware shouldExcludeChunk(): always strips EU AI Act
  chunks (EUR-Lex source 7 leaks through when Qdrant runs unconstrained) and DBN website
  pages unless dbn_resources slice is explicitly ON
- Update SLICE_DEFS in advocate.js and deep-research.js to match all 8 slices
- Backward compat: echr_hague key in incoming requests fans out to echr+hague

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-15 16:01:05 +02:00

daveadmin

a1a7f442a7

Deep Research: NDJSON streaming so the connection survives long runs

Previously the endpoint returned a single JSON object at the end. Apache+
PHP-FPM buffers the entire body until PHP exits, so a 160s azure_full run
caused the browser to drop the fetch as "Failed to fetch" while the server
was still synthesising — the response then arrived to a dead socket.

Switch to application/x-ndjson with one event per line. The endpoint emits
'progress', 'start', 'step' (running/complete/warning/error), 'subq', and a
final 'final' event carrying the full result payload. Output buffering is
explicitly disabled so each line flushes through Apache as soon as the
agent emits it.

DbnDeepResearchAgent::run() now accepts an optional ?callable $emit and
fires step:running before each step + step:complete after, plus a subq
event per sub-question retrieval round.

JS reads response.body as a stream, splits on newlines, updates the
trace panel live, and renders the final result when the final event
arrives. Status pill shows live progress detail (e.g. "Synthesising with
Azure gpt-4o — this is the slowest step…").

Engine row in the form now shows expected duration per engine
(~15-45s mini, ~60-180s full, ~30-90s GPU) so users know what they're in
for before clicking Run.

2026-05-15 10:47:35 +02:00

daveadmin

4cbe0a4ac4

Add Deep Research tool — agent + rank/rerank RAG

New surface at /deep-research.php where the user pastes a question or
uploads PDF/DOCX/TXT case files and a LLM-orchestrated agent researches
the Do Better Norge legal corpus from 3-5 angles, with hybrid retrieval,
cross-encoder rerank, and synthesis that emits an inline-[n]-cited
markdown brief plus a numbered sources panel.

Uploaded documents are chunked + embedded in memory only (nomic-embed-text
via LiteLLM) and searched alongside the shared corpus during the same
request — never persisted to disk, DB, or Qdrant.

Reuses ClientRagPipeline::searchAll (hybrid + rerank), dbnV6 slice
helpers, and the existing extract.php text-extraction logic via a new
dbnToolsExtractUploadedFile() helper. Also adds dbnToolsCallGpuLlm()
helper in bootstrap.php — fixes a latent bug where LegalTools.php
was already calling that name with no definition.

Search.php is unchanged.

2026-05-15 10:30:47 +02:00

3 Commits