Deep Research: NDJSON streaming so the connection survives long runs

Previously the endpoint returned a single JSON object at the end. Apache+
PHP-FPM buffers the entire body until PHP exits, so a 160s azure_full run
caused the browser to drop the fetch as "Failed to fetch" while the server
was still synthesising — the response then arrived to a dead socket.

Switch to application/x-ndjson with one event per line. The endpoint emits
'progress', 'start', 'step' (running/complete/warning/error), 'subq', and a
final 'final' event carrying the full result payload. Output buffering is
explicitly disabled so each line flushes through Apache as soon as the
agent emits it.

DbnDeepResearchAgent::run() now accepts an optional ?callable $emit and
fires step:running before each step + step:complete after, plus a subq
event per sub-question retrieval round.

JS reads response.body as a stream, splits on newlines, updates the
trace panel live, and renders the final result when the final event
arrives. Status pill shows live progress detail (e.g. "Synthesising with
Azure gpt-4o — this is the slowest step…").

Engine row in the form now shows expected duration per engine
(~15-45s mini, ~60-180s full, ~30-90s GPU) so users know what they're in
for before clicking Run.
This commit is contained in:
2026-05-15 10:47:35 +02:00
parent 4cbe0a4ac4
commit a1a7f442a7
4 changed files with 296 additions and 77 deletions
+4 -4
View File
@@ -16,11 +16,11 @@ require_once __DIR__ . '/includes/layout.php';
<div class="control-row" id="drEngineControl">
<span class="control-label">Engine</span>
<label><input type="radio" name="drEngine" value="azure_mini" checked> Azure gpt-4o-mini &#9733; <small class="control-hint">(fast)</small></label>
<label><input type="radio" name="drEngine" value="azure_full"> Azure gpt-4o <small class="control-hint">(best)</small></label>
<label><input type="radio" name="drEngine" value="gpu"> GPU (cuttlefish) <small class="control-hint">(local)</small></label>
<label><input type="radio" name="drEngine" value="azure_mini" checked> Azure gpt-4o-mini &#9733; <small class="control-hint">(~15-45s)</small></label>
<label><input type="radio" name="drEngine" value="azure_full"> Azure gpt-4o <small class="control-hint">(best · ~60-180s)</small></label>
<label><input type="radio" name="drEngine" value="gpu"> GPU (cuttlefish) <small class="control-hint">(local · ~30-90s)</small></label>
</div>
<p class="upload-hint">Azure engines use your BNL Azure credits. GPU runs qwen2.5:14b via LiteLLM on cuttlefish.</p>
<p class="upload-hint">Azure mini is the default and finishes fastest. Azure full is the most thorough but can take 1-3 minutes. GPU keeps everything inside the BNL fleet. Live progress shown in the right-hand reasoning panel.</p>
<div class="dr-slice-section">
<p class="control-label">Corpus slices</p>