fix(tools): parse-harden Do Better Legal ask against leaky fine-tune output

The dbn-legal-agent-v3 fine-tune (Track 1 / family) emits a labelled-prose
template — duplicate `answer:` prefixes, markdown-escaped underscores (`\_`),
and a trailing raw JSON blob — rather than the strict JSON the Azure/gpt-4o
path produces via response_format. decodeJsonObject() returned null on that
invalid JSON, so ask() dumped the entire raw blob into `answer`.

Fix at the parse layer (no upstream response_format change, to avoid fighting
the fine-tune's training):
- dbnToolsRepairJsonText(): strip fences, drop only invalid `\_`/`\*` escapes,
  then balanced-brace scan collecting every top-level {...} (longest first) to
  recover an appended JSON object. Shared by both gateways' decodeJsonObject(),
  so all JSON tools benefit.
- dbnToolsParseLabeledFields(): parse labelled-prose into real fields when no
  JSON decodes, tolerating escaped key names and collapsing duplicate prefixes.
- ask() null-fallback now builds clean structured fields from the parsed prose
  instead of dumping raw; what_remains_uncertain becomes a proper list.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-06-02 17:36:35 +02:00
parent 7fcd317205
commit c84ed2ed78
4 changed files with 136 additions and 44 deletions
+1 -20
View File
@@ -148,26 +148,7 @@ final class DbnAzureOpenAiGateway
public function decodeJsonObject(string $content): ?array
{
$content = trim($content);
$content = (string)preg_replace('/^```(?:json)?\s*\n?/i', '', $content);
$content = (string)preg_replace('/\n?```\s*$/', '', $content);
$content = trim($content);
$decoded = json_decode($content, true);
if (is_array($decoded)) {
return $decoded;
}
$start = strpos($content, '{');
$end = strrpos($content, '}');
if ($start !== false && $end !== false && $end > $start) {
$candidate = substr($content, $start, $end - $start + 1);
$decoded = json_decode($candidate, true);
if (is_array($decoded)) {
return $decoded;
}
}
return null;
return dbnToolsRepairJsonText($content);
}
private function postJson(string $url, array $payload, int $timeout): array