feat(transcribe): GPT cleanup pass + advanced options i18n
Adds optional post-transcription cleanup via GPT-4o/GPT-4o-mini to fix mishearing errors, punctuation, and domain terms. Speaker role labelling now accepts a deployment param. Adds i18n strings for advanced options panel (task, VAD filter, Whisper model, AI cleanup) in all four languages. Updates BvjAnalyzerAgent and DeepResearchAgent. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
+131
-62
@@ -493,7 +493,7 @@ PROMPT;
|
||||
private function extractParties(string $docText, string $language): array
|
||||
{
|
||||
$locale = dbnToolsLanguageName($language);
|
||||
$excerpt = mb_substr($docText, 0, 12000, 'UTF-8');
|
||||
$excerpt = mb_substr($docText, 0, 20000, 'UTF-8');
|
||||
|
||||
$prompt = <<<PROMPT
|
||||
You are analysing a Norwegian child welfare (Barnevernet) document.
|
||||
@@ -502,15 +502,16 @@ Identify ALL named parties — every person or institution referred to by name o
|
||||
Respond in {$locale}. Return a JSON object with a single key "parties" containing an array of objects.
|
||||
Each object must have these four fields:
|
||||
- "name": full name or institution name (string)
|
||||
- "role": their role in the case, e.g. Biological mother, Child, Barnevernarbeider, Saksbehandler, Melder, Politi, Lege, Advokat, Foster carer, Rusklinikk
|
||||
- "role": their role in the case, e.g. Biological mother, Biological father, Child, Barnevernarbeider, Saksbehandler, Leder, Melder, Politi, Lege, Psykolog, Advokat, Talsperson for barnet, Tilsynsfører, Sakkyndig, Foster carer (fosterforelder), Rusklinikk, Statsforvalter
|
||||
- "organization": employer or institution if mentioned, otherwise null
|
||||
- "relationship_to_child": relationship to the child in the document, e.g. Mother, Father, Caseworker, Melder, or null
|
||||
- "relationship_to_child": relationship to the child in the document, e.g. Mother, Father, Sibling, Caseworker, Melder, Supervisor, or null
|
||||
|
||||
Rules:
|
||||
- Include every named person and named institution — even peripheral ones.
|
||||
- Include Barnevernvakta (bvv) as an institution even if no individual caseworkers are named.
|
||||
- If a name appears to be redacted or anonymised (e.g. "mor", "far", "barnet", initials like "A.B."), include them with role inferred from context.
|
||||
- Do not invent parties not present in the text.
|
||||
- Maximum 20 parties.
|
||||
- Maximum 25 parties.
|
||||
|
||||
Document text:
|
||||
{$excerpt}
|
||||
@@ -520,14 +521,14 @@ PROMPT;
|
||||
$raw = $this->azure->chatText([
|
||||
['role' => 'system', 'content' => 'You return valid JSON only. No markdown fences.'],
|
||||
['role' => 'user', 'content' => $prompt],
|
||||
], ['json' => true, 'temperature' => 0.05, 'max_tokens' => 1500, 'timeout' => 40]);
|
||||
], ['json' => true, 'temperature' => 0.05, 'max_tokens' => 2000, 'timeout' => 45]);
|
||||
$json = $this->azure->decodeJsonObject($raw);
|
||||
if (is_array($json) && is_array($json['parties'] ?? null)) {
|
||||
return array_slice($json['parties'], 0, 20);
|
||||
return array_slice($json['parties'], 0, 25);
|
||||
}
|
||||
// Fallback: model returned an array at root level instead of {parties:[...]}
|
||||
if (is_array($json) && isset($json[0]['name'])) {
|
||||
return array_slice($json, 0, 20);
|
||||
return array_slice($json, 0, 25);
|
||||
}
|
||||
error_log('BVJ extractParties unexpected structure: ' . substr($raw, 0, 300));
|
||||
} catch (Throwable $e) {
|
||||
@@ -541,7 +542,7 @@ PROMPT;
|
||||
private function extractTimeline(string $docText, string $language): array
|
||||
{
|
||||
$locale = dbnToolsLanguageName($language);
|
||||
$excerpt = mb_substr($docText, 0, 12000, 'UTF-8');
|
||||
$excerpt = mb_substr($docText, 0, 20000, 'UTF-8');
|
||||
|
||||
$prompt = <<<PROMPT
|
||||
Build a chronological timeline from this Norwegian child welfare (Barnevernet) document in {$locale}.
|
||||
@@ -557,14 +558,24 @@ IMPORTANT — Norwegian date and time formats to recognise:
|
||||
- Diary/log format: lines beginning with a date or time are always events.
|
||||
- Two-digit years: interpret as 20YY (20 → 2020, 21 → 2021).
|
||||
|
||||
Barnevernet-specific events that are ALWAYS high significance:
|
||||
- Akuttvedtak (emergency placement) under §4-6 or §4-25
|
||||
- Omsorgsovertakelse (care order) under §4-12
|
||||
- Police involvement or assistance (politibistand)
|
||||
- Formal decision (vedtak) or court order (kjennelse)
|
||||
- Deadline breaches: bekymringsmelding not processed within 7 days; investigation not opened within 6 weeks
|
||||
- Forhandlingsmøte (negotiation hearing) or Fylkesnemnda hearing
|
||||
- Supervised contact visits (samvær) being reduced or denied
|
||||
- Placement in foster care or institution (fosterhjem, institusjon)
|
||||
|
||||
For each event provide:
|
||||
- "date": ISO 8601 date (YYYY-MM-DD) if determinable, otherwise best-effort description
|
||||
- "time_of_day": HH:MM if present, otherwise null
|
||||
- "actor": person, institution, or party involved
|
||||
- "action": concise description (≤ 80 chars) of what happened
|
||||
- "significance": high (acute measure, removal, police involvement, formal decision) | medium (home visit, phone call, meeting) | low (minor update, note)
|
||||
- "significance": high (acute measure, removal, police involvement, formal decision, statutory deadline breach) | medium (home visit, phone call, meeting, assessment) | low (minor update, note)
|
||||
|
||||
Sort chronologically. Maximum 30 events.
|
||||
Sort chronologically. Maximum 40 events.
|
||||
|
||||
Document text:
|
||||
{$excerpt}
|
||||
@@ -579,10 +590,10 @@ PROMPT;
|
||||
$raw = $this->azure->chatText([
|
||||
['role' => 'system', 'content' => 'You return valid JSON only. No markdown fences.'],
|
||||
['role' => 'user', 'content' => $prompt],
|
||||
], ['json' => true, 'temperature' => 0.05, 'max_tokens' => 3000, 'timeout' => 45]);
|
||||
], ['json' => true, 'temperature' => 0.05, 'max_tokens' => 4000, 'timeout' => 55]);
|
||||
$json = $this->azure->decodeJsonObject($raw);
|
||||
if (is_array($json) && is_array($json['events'] ?? null)) {
|
||||
return array_slice($json['events'], 0, 30);
|
||||
return array_slice($json['events'], 0, 40);
|
||||
}
|
||||
} catch (Throwable $e) {
|
||||
error_log('BVJ extractTimeline failed: ' . $e->getMessage());
|
||||
@@ -600,52 +611,84 @@ PROMPT;
|
||||
int $count,
|
||||
string $language
|
||||
): array {
|
||||
$locale = dbnToolsLanguageName($language);
|
||||
$docType = $docMeta['doc_type'] ?? 'BVJ document';
|
||||
$roleStr = $advocateRole !== '' ? $advocateRole : 'the affected party';
|
||||
$locale = dbnToolsLanguageName($language);
|
||||
$docType = $docMeta['doc_type'] ?? 'BVJ document';
|
||||
$docDate = $docMeta['doc_date'] ?? 'unknown date';
|
||||
$authority = $docMeta['issuing_authority'] ?? 'the municipality';
|
||||
$roleStr = $advocateRole !== '' ? $advocateRole : 'the affected party';
|
||||
|
||||
// Summarise the top events to give the model context
|
||||
// Summarise high-significance events first, then others
|
||||
$highEvents = array_values(array_filter($timelineEvents, fn($e) => ($e['significance'] ?? '') === 'high'));
|
||||
$otherEvents = array_values(array_filter($timelineEvents, fn($e) => ($e['significance'] ?? '') !== 'high'));
|
||||
$topEvents = array_slice(array_merge($highEvents, $otherEvents), 0, 12);
|
||||
$eventSummary = '';
|
||||
$highEvents = array_filter($timelineEvents, fn($e) => ($e['significance'] ?? '') === 'high');
|
||||
$topEvents = array_slice(array_merge(array_values($highEvents),
|
||||
array_values(array_filter($timelineEvents, fn($e) => ($e['significance'] ?? '') !== 'high'))), 0, 8);
|
||||
foreach ($topEvents as $ev) {
|
||||
$eventSummary .= sprintf("- %s: %s (%s)\n", $ev['date'] ?? '?', $ev['action'] ?? '', $ev['actor'] ?? '');
|
||||
$sig = ($ev['significance'] ?? 'low') === 'high' ? '[HIGH] ' : '';
|
||||
$eventSummary .= sprintf("- %s %s%s (%s)\n",
|
||||
$ev['date'] ?? '?', $sig, $ev['action'] ?? '', $ev['actor'] ?? '');
|
||||
}
|
||||
|
||||
// Summarise parties
|
||||
$partyList = '';
|
||||
foreach (array_slice($parties, 0, 8) as $p) {
|
||||
$partyList .= sprintf("- %s (%s)\n", $p['name'] ?? '', $p['role'] ?? '');
|
||||
foreach (array_slice($parties, 0, 10) as $p) {
|
||||
$org = !empty($p['organization']) ? ' at ' . $p['organization'] : '';
|
||||
$partyList .= sprintf("- %s (%s%s)\n", $p['name'] ?? '?', $p['role'] ?? '?', $org);
|
||||
}
|
||||
|
||||
$angleGuidance = match (true) {
|
||||
$count >= 5 => <<<ANGLES
|
||||
Cover these five distinct legal angles (one per question):
|
||||
1. Statutory rights and obligations under Barnevernloven (e.g. §4-2, §4-6, §4-12) specific to the measures taken
|
||||
2. ECHR Article 8 proportionality and procedural safeguards — cite the specific measures and dates from this case
|
||||
3. Procedural obligations BVV must fulfil (advance notice, documentation, hearing rights) — anchor to documented events
|
||||
4. Bufdir/Statsforvalter guidance on investigation standards and thresholds for intervention
|
||||
5. Norwegian appellate court decisions on comparable measures and family circumstances
|
||||
ANGLES,
|
||||
$count === 4 => <<<ANGLES
|
||||
Cover these four distinct legal angles (one per question):
|
||||
1. Statutory rights under Barnevernloven anchored to the specific measures and dates in this case
|
||||
2. ECHR Article 8 — proportionality of the specific intervention and any procedural violations
|
||||
3. BVV's procedural obligations — documentation, notice, and hearing rights — as evidenced by the timeline
|
||||
4. Bufdir guidance and Norwegian court decisions on comparable fact patterns
|
||||
ANGLES,
|
||||
default => <<<ANGLES
|
||||
Cover three distinct legal angles (one per question):
|
||||
1. Statutory rights under Barnevernloven for the specific type of measure documented
|
||||
2. ECHR Article 8 proportionality and procedural safeguards
|
||||
3. BVV's procedural obligations and whether the documented timeline shows any breach
|
||||
ANGLES,
|
||||
};
|
||||
|
||||
$prompt = <<<PROMPT
|
||||
You are a Norwegian family-law research assistant building a case for: {$roleStr}.
|
||||
|
||||
A {$docType} has been uploaded. Key events:
|
||||
Case facts extracted from the uploaded document:
|
||||
- Document type: {$docType}
|
||||
- Date: {$docDate}
|
||||
- Issuing authority: {$authority}
|
||||
- Key events (chronological):
|
||||
{$eventSummary}
|
||||
Key parties:
|
||||
- Key parties:
|
||||
{$partyList}
|
||||
|
||||
Generate exactly {$count} targeted sub-questions to research the legal corpus for arguments that SUPPORT {$roleStr}'s position. Each question should explore a different angle:
|
||||
1. Statutory rights and obligations (Barnevernloven, Barneloven)
|
||||
2. ECHR Article 8 and 9 precedents vs Norway
|
||||
3. Procedural requirements BVV must follow (notice, documentation, proportionality)
|
||||
4. Bufdir guidance on case handling standards
|
||||
5. Norwegian court decisions on similar fact patterns
|
||||
Generate exactly {$count} sub-questions to search the Norwegian legal corpus for arguments that SUPPORT {$roleStr}'s position.
|
||||
|
||||
{$angleGuidance}
|
||||
|
||||
CRITICAL: Every question MUST embed specific facts from this case — use the actual authority name, document date, type of measure, and parties where relevant. Generic questions ("What are parental rights?") are useless for retrieval. Specific questions ("What notice requirements must {$authority} meet before issuing an emergency placement under Barnevernloven §4-6?") are highly effective.
|
||||
|
||||
Return JSON only in {$locale}:
|
||||
{
|
||||
"sub_questions": [
|
||||
{"id":"q1","question":"...","rationale":"how this angle strengthens {$roleStr}'s position (≤ 120 chars)"}
|
||||
{"id":"q1","question":"...","rationale":"why this angle strengthens {$roleStr}'s position (≤ 120 chars)"}
|
||||
]
|
||||
}
|
||||
|
||||
Rules:
|
||||
- Exactly {$count} sub-questions, no more no fewer.
|
||||
- Every question must be answerable from Norwegian family-law, child-welfare, or ECHR sources.
|
||||
- Each question must cover a DIFFERENT legal angle.
|
||||
- Questions must be self-contained without needing the raw document.
|
||||
- Exactly {$count} sub-questions.
|
||||
- Each question targets a DIFFERENT legal angle.
|
||||
- Include specific case details (authority, date, measure type) in each question.
|
||||
- Questions must be self-contained and answerable from Norwegian family-law, child-welfare, or ECHR sources.
|
||||
- Respond in {$locale}.
|
||||
PROMPT;
|
||||
|
||||
@@ -734,16 +777,16 @@ PROMPT;
|
||||
|
||||
// Build parties summary (top 8)
|
||||
$partiesSummary = '';
|
||||
foreach (array_slice($parties, 0, 8) as $i => $p) {
|
||||
foreach (array_slice($parties, 0, 12) as $i => $p) {
|
||||
$org = $p['organization'] ? ' (' . $p['organization'] . ')' : '';
|
||||
$rel = $p['relationship_to_child'] ? ' — rel: ' . $p['relationship_to_child'] : '';
|
||||
$partiesSummary .= sprintf("%d. %s — %s%s%s\n", $i + 1, $p['name'] ?? '', $p['role'] ?? '', $org, $rel);
|
||||
}
|
||||
|
||||
// Build timeline summary (top 15 most significant events)
|
||||
// Build timeline summary (top 20 most significant events)
|
||||
$highEvents = array_values(array_filter($timelineEvents, fn($e) => ($e['significance'] ?? '') === 'high'));
|
||||
$otherEvents = array_values(array_filter($timelineEvents, fn($e) => ($e['significance'] ?? '') !== 'high'));
|
||||
$topEvents = array_slice(array_merge($highEvents, $otherEvents), 0, 15);
|
||||
$topEvents = array_slice(array_merge($highEvents, $otherEvents), 0, 20);
|
||||
$timelineSummary = '';
|
||||
foreach ($topEvents as $ev) {
|
||||
$time = $ev['time_of_day'] ? ' kl.' . $ev['time_of_day'] : '';
|
||||
@@ -783,14 +826,17 @@ PROMPT;
|
||||
? "\n== ADDITIONAL CONTEXT FROM ADVOCATE ==\n{$additionalNotes}\n"
|
||||
: '';
|
||||
|
||||
$docExcerpt = mb_substr($docText, 0, 3000, 'UTF-8');
|
||||
$docExcerpt = mb_substr($docText, 0, 8000, 'UTF-8');
|
||||
|
||||
$prompt = <<<PROMPT
|
||||
You are Do Better Norge Legal Tools producing a structured Barnevernet case analysis brief.
|
||||
You are representing: {$roleStr}
|
||||
You are Do Better Norge Legal Tools. Produce a structured Barnevernet case analysis for: {$roleStr}.
|
||||
|
||||
HALLUCINATION RULES — READ FIRST:
|
||||
- You may ONLY cite statute sections (§), ECHR article numbers, ECHR application numbers, case names, and Bufdir/Statsforvalter circular references that appear verbatim in the numbered corpus sources below.
|
||||
- Do NOT cite statute sections, case names, or ECHR applications from your training memory — they may be misremembered or no longer in force.
|
||||
- If no source supports a claim, omit the claim rather than invent support.
|
||||
- Every factual legal claim in advocacy_brief MUST end with at least one [n] or [DOC] citation. Unsupported claims are a liability for the client.
|
||||
|
||||
Ground every claim in the numbered corpus sources below using [n] markers, OR in the uploaded document using [DOC].
|
||||
Do NOT invent statutes, paragraph numbers, case names, ECHR applications, dates, or parties.
|
||||
Return valid JSON only. No markdown fences.
|
||||
|
||||
== DOCUMENT METADATA ==
|
||||
@@ -805,51 +851,74 @@ Child: {$childInfo}
|
||||
== TIMELINE (from document) ==
|
||||
{$timelineSummary}
|
||||
|
||||
== CORPUS SOURCES ({$sourceCount} numbered) ==
|
||||
== CORPUS SOURCES ({$sourceCount} numbered — cite as [n]) ==
|
||||
{$sourcesText}
|
||||
{$notesSection}
|
||||
{$subQText}
|
||||
|
||||
== DOCUMENT EXCERPT (first 3000 chars — use [DOC] to cite) ==
|
||||
== DOCUMENT EXCERPT (first 8000 chars — cite as [DOC]) ==
|
||||
{$docExcerpt}
|
||||
|
||||
Return JSON in {$locale}:
|
||||
== ADVOCACY BRIEF FORMAT ==
|
||||
Write the advocacy_brief as a Markdown document with these sections:
|
||||
|
||||
## Case Overview
|
||||
Summarise what happened: document type, issuing authority, key events from the timeline. Every factual statement must cite [DOC].
|
||||
|
||||
## {$roleStr}'s Core Legal Position
|
||||
The strongest statutory and ECHR arguments in favour of {$roleStr}. Cite [n] for each legal point. Only cite statutes and cases that appear in the corpus sources above.
|
||||
|
||||
## Procedural Compliance Issues
|
||||
Where BVV/the authority may have failed their own procedural obligations. Ground each point in a specific documented action from [DOC] and the applicable statute or guidance from [n].
|
||||
|
||||
## Client Strengths
|
||||
3-6 factual and legal advantages for {$roleStr}, each anchored with [n] or [DOC].
|
||||
|
||||
## Counter-Arguments and Responses
|
||||
The most likely opposing arguments and how to rebut them. Cite [n] for rebuttal sources.
|
||||
|
||||
## Recommended Next Steps
|
||||
2-4 concrete legal actions {$roleStr} should take now.
|
||||
|
||||
End with one line: "*This brief is AI-assisted and for discussion purposes only — verify all legal references with a qualified Norwegian family-law lawyer.*"
|
||||
|
||||
Target length: 600-1000 words.
|
||||
|
||||
== JSON OUTPUT ==
|
||||
{
|
||||
"advocacy_brief": "Partisan legal brief in Markdown. Structure:\n## Case Overview\n(What happened according to [DOC] — doc type, authority, key events)\n\n## {$roleStr}'s Core Legal Position\n(Strongest statutory and ECHR arguments — cite [n] and [DOC])\n\n## Procedural Compliance Issues\n(Where BVV may have failed their own procedural obligations — cite [DOC][n])\n\n## Client Strengths\n(Factual and legal advantages for {$roleStr} — cite [n][DOC])\n\n## Counter-Arguments and Responses\n(Likely opposing arguments and how to rebut — cite [n])\n\n## Recommended Next Steps\n(Concrete legal actions)\n\nEnd with a one-line disclaimer. Length: 500-1000 words.",
|
||||
"advocacy_brief": "<the Markdown brief following the format above>",
|
||||
|
||||
"procedural_red_flags": [
|
||||
{
|
||||
"description": "Concise description of the potential procedural violation",
|
||||
"legal_basis": "Statute or ECHR article potentially violated, e.g. Barnevernloven §6-1, ECHR Art.8",
|
||||
"severity": "high",
|
||||
"legal_basis": "Statute or ECHR article from a corpus source — e.g. Barnevernloven §4-2 [3]",
|
||||
"severity": "high|medium|low",
|
||||
"source_refs": ["[n]", "[DOC]"],
|
||||
"what_to_check": "Specific document text or action requiring legal verification"
|
||||
"what_to_check": "Exact document text or action to verify with a lawyer"
|
||||
}
|
||||
],
|
||||
|
||||
"client_strengths": ["3-6 items anchored with [n] or [DOC]"],
|
||||
"opposing_weaknesses": ["2-5 vulnerabilities in BVV or opposing party position — omit if unsupported by sources"],
|
||||
"what_we_found": "2-sentence plain-language summary of the most critical finding",
|
||||
"what_remains_uncertain": ["3-5 specific gaps — missing information, unclear authority, conflicting sources"],
|
||||
"next_practical_step": "The single most important concrete legal action for {$roleStr}"
|
||||
"client_strengths": ["3-6 items, each ending with [n] or [DOC]"],
|
||||
"opposing_weaknesses": ["2-5 documented vulnerabilities in BVV or opposing position — OMIT if not supported by at least one [n]"],
|
||||
"what_we_found": "2-sentence plain-language summary of the single most critical finding",
|
||||
"what_remains_uncertain": ["3-5 specific information gaps or legal questions that need clarification"],
|
||||
"next_practical_step": "The single most important concrete legal action for {$roleStr} to take within the next 7 days"
|
||||
}
|
||||
|
||||
Rules:
|
||||
- Every factual claim in advocacy_brief must end with [n] or [DOC].
|
||||
- procedural_red_flags must be grounded in documented BVV actions — no speculation.
|
||||
- severity: high = likely violation of a codified right; medium = procedural irregularity; low = best-practice gap.
|
||||
- If no corpus source supports a claimed weakness, omit it from opposing_weaknesses.
|
||||
- Cite statute sections and ECHR articles as they appear in the corpus excerpts.
|
||||
- severity: high = likely violation of a codified statutory right or ECHR guarantee; medium = procedural irregularity; low = best-practice gap only.
|
||||
- procedural_red_flags must be grounded in documented BVV actions visible in [DOC] or the timeline.
|
||||
- If fewer than 2 corpus sources support opposing_weaknesses, return an empty array.
|
||||
- Respond in {$locale}.
|
||||
PROMPT;
|
||||
|
||||
$sysPrompt = 'You return valid JSON only. No markdown fences.';
|
||||
$sysPrompt = 'You return valid JSON only. No markdown fences. Every legal citation must come from the provided corpus sources, not from training memory.';
|
||||
|
||||
$messages = [
|
||||
['role' => 'system', 'content' => $sysPrompt],
|
||||
['role' => 'user', 'content' => $prompt],
|
||||
];
|
||||
$opts = ['json' => true, 'temperature' => $temperature, 'max_tokens' => 3000, 'timeout' => 200];
|
||||
$opts = ['json' => true, 'temperature' => $temperature, 'max_tokens' => 4500, 'timeout' => 240];
|
||||
|
||||
$deployLabel = match ($engine) {
|
||||
'gpu' => 'GPU (cuttlefish)',
|
||||
|
||||
Reference in New Issue
Block a user