Deep Research: NDJSON streaming so the connection survives long runs

Previously the endpoint returned a single JSON object at the end. Apache+ PHP-FPM buffers the entire body until PHP exits, so a 160s azure_full run caused the browser to drop the fetch as "Failed to fetch" while the server was still synthesising — the response then arrived to a dead socket. Switch to application/x-ndjson with one event per line. The endpoint emits 'progress', 'start', 'step' (running/complete/warning/error), 'subq', and a final 'final' event carrying the full result payload. Output buffering is explicitly disabled so each line flushes through Apache as soon as the agent emits it. DbnDeepResearchAgent::run() now accepts an optional ?callable $emit and fires step:running before each step + step:complete after, plus a subq event per sub-question retrieval round. JS reads response.body as a stream, splits on newlines, updates the trace panel live, and renders the final result when the final event arrives. Status pill shows live progress detail (e.g. "Synthesising with Azure gpt-4o — this is the slowest step…"). Engine row in the form now shows expected duration per engine (~15-45s mini, ~60-180s full, ~30-90s GPU) so users know what they're in for before clicking Run.
2026-05-15 10:47:35 +02:00
parent 4cbe0a4ac4
commit a1a7f442a7
4 changed files with 296 additions and 77 deletions
@@ -16,11 +16,11 @@ require_once __DIR__ . '/includes/layout.php';

                    <div class="control-row" id="drEngineControl">
                        <span class="control-label">Engine</span>
-                        <label><input type="radio" name="drEngine" value="azure_mini" checked> Azure gpt-4o-mini &#9733; <small class="control-hint">(fast)</small></label>
-                        <label><input type="radio" name="drEngine" value="azure_full"> Azure gpt-4o <small class="control-hint">(best)</small></label>
-                        <label><input type="radio" name="drEngine" value="gpu"> GPU (cuttlefish) <small class="control-hint">(local)</small></label>
+                        <label><input type="radio" name="drEngine" value="azure_mini" checked> Azure gpt-4o-mini &#9733; <small class="control-hint">(~15-45s)</small></label>
+                        <label><input type="radio" name="drEngine" value="azure_full"> Azure gpt-4o <small class="control-hint">(best · ~60-180s)</small></label>
+                        <label><input type="radio" name="drEngine" value="gpu"> GPU (cuttlefish) <small class="control-hint">(local · ~30-90s)</small></label>
                    </div>
-                    <p class="upload-hint">Azure engines use your BNL Azure credits. GPU runs qwen2.5:14b via LiteLLM on cuttlefish.</p>
+                    <p class="upload-hint">Azure mini is the default and finishes fastest. Azure full is the most thorough but can take 1-3 minutes. GPU keeps everything inside the BNL fleet. Live progress shown in the right-hand reasoning panel.</p>

                    <div class="dr-slice-section">
                        <p class="control-label">Corpus slices</p>