81ba20c44a
3 new public pages matching the korrespond doc pattern: timeline-about.php (marketing), timeline-guide.php (user guide), timeline-tech.php (tech showcase). Hero images + 4 screenshots in assets/images/timeline/. Doc links added to timeline.php and preview.php?tool=timeline. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
420 lines
20 KiB
PHP
420 lines
20 KiB
PHP
<?php
|
|
declare(strict_types=1);
|
|
require_once __DIR__ . '/includes/bootstrap.php';
|
|
|
|
$uiLang = dbnToolsCurrentLanguage();
|
|
$isAuthed = dbnToolsIsAuthenticated();
|
|
$langPath = '/timeline-tech.php';
|
|
$toolsLogin = 'https://dobetternorge.no/tools-login.php?return=' . urlencode('/timeline.php');
|
|
$registerUrl = 'https://dobetternorge.no/register.php';
|
|
?>
|
|
<!doctype html>
|
|
<html lang="<?= htmlspecialchars($uiLang) ?>">
|
|
<head>
|
|
<meta charset="utf-8">
|
|
<meta name="viewport" content="width=device-width, initial-scale=1">
|
|
<title>How Timeline works — 3-pass extraction, Norwegian date recognition, fine-tuned LLM</title>
|
|
<meta name="description" content="Technical deep-dive: how Timeline uses a 3-pass pipeline to extract and classify temporal events from Norwegian legal documents, with 12+ date format recognition and confidence scoring.">
|
|
<meta name="robots" content="index, follow">
|
|
<link rel="canonical" href="https://tools.dobetternorge.no/timeline-tech.php">
|
|
<meta property="og:title" content="How Timeline works — AI date extraction pipeline">
|
|
<meta property="og:description" content="Rule-based pre-pass, LLM extraction, and post-processing. Norwegian date format recognition, event classification schema, multi-engine support.">
|
|
<meta name="theme-color" content="#00205B">
|
|
<link rel="preconnect" href="https://fonts.googleapis.com">
|
|
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
|
|
<link rel="stylesheet" href="https://fonts.googleapis.com/css2?family=Crimson+Pro:wght@400;600;700&family=IBM+Plex+Sans:wght@400;500;600;700&family=IBM+Plex+Mono:wght@400;500&display=swap">
|
|
<link rel="stylesheet" href="assets/css/tools.css">
|
|
</head>
|
|
<body class="kdoc-page">
|
|
|
|
<header class="lt-nav">
|
|
<a href="https://dobetternorge.no" class="lt-nav__brand">
|
|
<picture>
|
|
<source srcset="assets/images/logo-header.webp" type="image/webp">
|
|
<img class="lt-nav__logo" src="assets/images/logo-header.png" alt="Do Better Norge" width="140" height="36" loading="eager">
|
|
</picture>
|
|
<span class="lt-nav__badge">Legal Tools</span>
|
|
</a>
|
|
<div class="lt-nav__right">
|
|
<nav class="shell-lang-switcher" aria-label="Language">
|
|
<?php foreach (dbnToolsSupportedLanguages() as $langCode): ?>
|
|
<a href="<?= htmlspecialchars($langPath . '?lang=' . $langCode) ?>" class="<?= $langCode === $uiLang ? 'is-active' : '' ?>"><?= htmlspecialchars(dbnToolsLanguageLabel($langCode)) ?></a>
|
|
<?php endforeach; ?>
|
|
</nav>
|
|
<?php if ($isAuthed): ?>
|
|
<a href="/timeline.php" class="lt-nav__cta lt-nav__cta--enter">Open Timeline →</a>
|
|
<?php else: ?>
|
|
<a href="<?= htmlspecialchars($toolsLogin) ?>" class="lt-nav__cta">Sign in</a>
|
|
<?php endif; ?>
|
|
</div>
|
|
</header>
|
|
|
|
<nav class="kdoc-doc-nav" aria-label="Timeline documentation">
|
|
<div class="kdoc-doc-nav__inner">
|
|
<a href="/timeline-about.php">About</a>
|
|
<a href="/timeline-guide.php">User guide</a>
|
|
<a href="/timeline-tech.php" class="is-active">How it works</a>
|
|
<?php if ($isAuthed): ?><a href="/timeline.php">← Open the tool</a><?php endif; ?>
|
|
</div>
|
|
</nav>
|
|
|
|
<!-- Hero -->
|
|
<section class="kdoc-hero" style="background: linear-gradient(rgba(5,15,40,0.85),rgba(5,15,40,0.92)), url('assets/images/timeline/hero-tech.png') center/cover no-repeat;">
|
|
<div class="kdoc-hero__inner">
|
|
<p class="kdoc-hero__kicker">Technical Showcase · How the AI reads time</p>
|
|
<h1 class="kdoc-hero__title">How Timeline knows when things happened.</h1>
|
|
<p class="kdoc-hero__sub">A full walkthrough of the 3-pass extraction pipeline, Norwegian date format recognition, event classification schema, multi-engine architecture, and the fine-tuned dbn-legal-agent model.</p>
|
|
|
|
<div class="kdoc-hero__stats">
|
|
<div class="kdoc-hero__stat">
|
|
<strong>12+</strong>
|
|
<span>date formats</span>
|
|
</div>
|
|
<div class="kdoc-hero__stat">
|
|
<strong>5</strong>
|
|
<span>event types</span>
|
|
</div>
|
|
<div class="kdoc-hero__stat">
|
|
<strong>3</strong>
|
|
<span>pipeline passes</span>
|
|
</div>
|
|
<div class="kdoc-hero__stat">
|
|
<strong>3</strong>
|
|
<span>engine options</span>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
</section>
|
|
|
|
<!-- Architecture overview -->
|
|
<div class="kdoc-section">
|
|
<p class="kdoc-section__eyebrow">Architecture</p>
|
|
<h2 class="kdoc-section__title">Three passes. Each with a distinct job.</h2>
|
|
<p class="kdoc-section__sub">The pipeline is intentionally sequential — Pass 1 is rule-based and near-instant; Pass 2 is the LLM extraction; Pass 3 post-processes and scores the output.</p>
|
|
|
|
<div class="kdoc-pipeline">
|
|
<div class="kdoc-pipeline__pass">
|
|
<span class="kdoc-pipeline__pass-badge kdoc-pipeline__pass-badge--mini">Pass 1 · PHP / regex</span>
|
|
<h3 class="kdoc-pipeline__pass-title">Detect & normalise known formats</h3>
|
|
<p class="kdoc-pipeline__pass-body">A deterministic pattern-matching pass runs before any LLM call. It scans the full input for dates matching 12+ Norwegian formats and normalises them to ISO 8601:</p>
|
|
<ul>
|
|
<li><code>dd.mm.yyyy</code> → <code>YYYY-MM-DD</code></li>
|
|
<li><code>d. månedsnavn yyyy</code> → resolved calendar date</li>
|
|
<li>Diary-format lines (starting with a date + colon) → auto-tagged as events</li>
|
|
<li>Two-digit years → always interpreted as <code>20YY</code></li>
|
|
</ul>
|
|
<p class="kdoc-pipeline__pass-body" style="margin-top:0.7rem;">Normalised anchors are injected into the LLM prompt to reduce hallucinated or misread dates.</p>
|
|
</div>
|
|
<div class="kdoc-pipeline__arrow-down" aria-hidden="true">→</div>
|
|
<div class="kdoc-pipeline__pass">
|
|
<span class="kdoc-pipeline__pass-badge">Pass 2 · gpt-4o-mini / gpt-4o / dbn-legal-agent</span>
|
|
<h3 class="kdoc-pipeline__pass-title">Extract, classify & score</h3>
|
|
<p class="kdoc-pipeline__pass-body">The LLM reads the full document alongside the pre-pass anchors. For every temporal reference it returns a structured JSON event object:</p>
|
|
<ul>
|
|
<li><code>date</code> — resolved ISO date, or verbatim string if unresolvable</li>
|
|
<li><code>date_type</code> — <code>absolute</code> | <code>relative</code> | <code>recurring</code> | <code>conditional</code> | <code>period</code></li>
|
|
<li><code>confidence</code> — <code>high</code> | <code>medium</code> | <code>low</code></li>
|
|
<li><code>actor</code> — attributed entity (from source text, not inferred)</li>
|
|
<li><code>description</code> — one-sentence event summary</li>
|
|
<li><code>source_excerpt</code> — verbatim text fragment (max 200 chars)</li>
|
|
</ul>
|
|
<p class="kdoc-pipeline__pass-body" style="margin-top:0.7rem;">The prompt explicitly instructs the model not to invent dates or actors not present in the source. Temperature is set to 0.1 for deterministic output.</p>
|
|
</div>
|
|
<div class="kdoc-pipeline__arrow-down" aria-hidden="true">→</div>
|
|
<div class="kdoc-pipeline__pass">
|
|
<span class="kdoc-pipeline__pass-badge kdoc-pipeline__pass-badge--optional">Pass 3 · PHP post-processor</span>
|
|
<h3 class="kdoc-pipeline__pass-title">Filter, sort & assemble</h3>
|
|
<p class="kdoc-pipeline__pass-body">PHP applies all active filters before returning the result:</p>
|
|
<ul>
|
|
<li><strong>Focus filter</strong> — strips events not matching the requested focus mode (deadlines / hearings / CPS)</li>
|
|
<li><strong>Confidence filter</strong> — removes LOW-confidence events if requested</li>
|
|
<li><strong>Background filter</strong> — strips background/narrative events if unchecked</li>
|
|
<li><strong>Date-type filter</strong> — strips relative/recurring events if unchecked</li>
|
|
</ul>
|
|
<p class="kdoc-pipeline__pass-body" style="margin-top:0.7rem;">The post-processor then assembles the <code>what_remains_uncertain</code> list and the <code>next_practical_step</code> recommendation.</p>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
|
|
<!-- Date recognition -->
|
|
<section class="kdoc-section--alt">
|
|
<div class="kdoc-section">
|
|
<p class="kdoc-section__eyebrow">Date recognition</p>
|
|
<h2 class="kdoc-section__title">12+ Norwegian date formats, all recognised.</h2>
|
|
<p class="kdoc-section__sub">Norwegian legal documents use a wide variety of date notations. The Pass 1 pre-pass recognises all of these deterministically; the LLM handles the rest in Pass 2.</p>
|
|
|
|
<table class="kdoc-table">
|
|
<thead>
|
|
<tr>
|
|
<th>Format</th>
|
|
<th>Example</th>
|
|
<th>Notes</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr>
|
|
<td><code>dd.mm.yyyy</code></td>
|
|
<td>30.07.2015</td>
|
|
<td>Standard Norwegian numeric</td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>dd.mm.yy</code></td>
|
|
<td>09.04.25</td>
|
|
<td>Two-digit year → always 20YY</td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>d. månedsnavn yyyy</code></td>
|
|
<td>3. mars 2024</td>
|
|
<td>Written month in bokmål/nynorsk</td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>d. månedsnavn</code></td>
|
|
<td>15. januar</td>
|
|
<td>Year inferred by proximity scanning</td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>yyyy-mm-dd</code></td>
|
|
<td>2024-03-12</td>
|
|
<td>ISO 8601</td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>månedsnavn yyyy</code></td>
|
|
<td>mars 2024</td>
|
|
<td>Month + year only</td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>yyyy</code></td>
|
|
<td>2024</td>
|
|
<td>Year-only reference</td>
|
|
</tr>
|
|
<tr>
|
|
<td>Season + year</td>
|
|
<td>høsten 2023</td>
|
|
<td>Seasonal reference → Q3/Q4</td>
|
|
</tr>
|
|
<tr>
|
|
<td>Diary-format line</td>
|
|
<td>18.09.2025: Møte avholdt</td>
|
|
<td>Date + colon → auto-tagged as event</td>
|
|
</tr>
|
|
<tr>
|
|
<td>Relative reference</td>
|
|
<td>tre uker etter vedtaket</td>
|
|
<td>Anchored to nearest resolved event</td>
|
|
</tr>
|
|
<tr>
|
|
<td>Recurring pattern</td>
|
|
<td>hver mandag</td>
|
|
<td>Classified as <code>recurring</code></td>
|
|
</tr>
|
|
<tr>
|
|
<td>Period / range</td>
|
|
<td>fra mars til juni 2024</td>
|
|
<td>Yields <code>start_date</code> + <code>end_date</code></td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</section>
|
|
|
|
<!-- Classification schema -->
|
|
<div class="kdoc-section">
|
|
<p class="kdoc-section__eyebrow">Classification schema</p>
|
|
<h2 class="kdoc-section__title">Five event types. Three confidence levels.</h2>
|
|
|
|
<h3 style="font-family:'Crimson Pro',serif; font-size:1.15rem; font-weight:700; margin:0 0 0.8rem; color:var(--dbn-blue);">date_type values</h3>
|
|
<table class="kdoc-table">
|
|
<thead>
|
|
<tr>
|
|
<th>date_type</th>
|
|
<th>Definition</th>
|
|
<th>Example</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr>
|
|
<td><code>absolute</code></td>
|
|
<td>A specific, resolvable calendar date</td>
|
|
<td><em>30.07.2015</em> → 2015-07-30</td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>relative</code></td>
|
|
<td>A date expressed relative to another event</td>
|
|
<td><em>tre uker etter vedtaket</em></td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>recurring</code></td>
|
|
<td>A pattern that repeats on a schedule</td>
|
|
<td><em>each Monday</em>, <em>every 6 months</em></td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>conditional</code></td>
|
|
<td>A date contingent on a condition being met</td>
|
|
<td><em>if no response within 14 days</em></td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>period</code></td>
|
|
<td>A date range or duration with start and end</td>
|
|
<td><em>fra mars til juni 2024</em></td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
|
|
<h3 style="font-family:'Crimson Pro',serif; font-size:1.15rem; font-weight:700; margin:2rem 0 0.8rem; color:var(--dbn-blue);">confidence levels</h3>
|
|
<table class="kdoc-table">
|
|
<thead>
|
|
<tr>
|
|
<th>confidence</th>
|
|
<th>Meaning</th>
|
|
<th>Visual in timeline</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr>
|
|
<td><code>high</code></td>
|
|
<td>Date is explicitly and unambiguously stated in the source text</td>
|
|
<td>Green badge</td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>medium</code></td>
|
|
<td>Date is inferred, approximate, or stated with slight ambiguity</td>
|
|
<td>Amber badge</td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>low</code></td>
|
|
<td>Date is implied, undated, or extracted from a degraded/ambiguous passage</td>
|
|
<td>Grey badge</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
|
|
<h3 style="font-family:'Crimson Pro',serif; font-size:1.15rem; font-weight:700; margin:2rem 0 0.8rem; color:var(--dbn-blue);">Actor attribution rules</h3>
|
|
<table class="kdoc-table">
|
|
<thead>
|
|
<tr>
|
|
<th>Rule</th>
|
|
<th>Example</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr>
|
|
<td>Named entity in the same sentence</td>
|
|
<td><em>“Trude [saksbehandler] ringte 14. mars”</em> → actor: Trude</td>
|
|
</tr>
|
|
<tr>
|
|
<td>Role label without a name</td>
|
|
<td><em>“Barnevernet fattet vedtak”</em> → actor: Barnevernet</td>
|
|
</tr>
|
|
<tr>
|
|
<td>No clear attribution in sentence</td>
|
|
<td>actor: <code>[unattributed]</code></td>
|
|
</tr>
|
|
<tr>
|
|
<td>Document-level default</td>
|
|
<td>If no per-event actor, defaults to the document sender/issuing body</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
|
|
<!-- Multi-engine -->
|
|
<section class="kdoc-section--alt">
|
|
<div class="kdoc-section">
|
|
<p class="kdoc-section__eyebrow">Engines</p>
|
|
<h2 class="kdoc-section__title">Three engines, one structured output.</h2>
|
|
<p class="kdoc-section__sub">All engines return the same JSON schema — the post-processor handles all three identically. Engine choice affects speed, quality, and privacy only.</p>
|
|
|
|
<table class="kdoc-table">
|
|
<thead>
|
|
<tr>
|
|
<th>Engine</th>
|
|
<th>Model</th>
|
|
<th>Latency</th>
|
|
<th>Best for</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr>
|
|
<td>Azure gpt-4o-mini ★</td>
|
|
<td><code>gpt-4o-mini</code> (Azure West Europe)</td>
|
|
<td>~15 s</td>
|
|
<td>Default. Fast, cost-efficient, handles most legal documents well.</td>
|
|
</tr>
|
|
<tr>
|
|
<td>Azure gpt-4o</td>
|
|
<td><code>gpt-4o</code> (Azure West Europe)</td>
|
|
<td>~45 s</td>
|
|
<td>Complex documents, overlapping events, poor-quality or dense source text.</td>
|
|
</tr>
|
|
<tr>
|
|
<td>GPU / cuttlefish</td>
|
|
<td><code>dbn-legal-agent</code> via LiteLLM proxy</td>
|
|
<td>~25 s</td>
|
|
<td>Maximum privacy. Entirely local. Fine-tuned on Norwegian legal corpus.</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</section>
|
|
|
|
<!-- Fine-tuned LLM -->
|
|
<div class="kdoc-section">
|
|
<p class="kdoc-section__eyebrow">Fine-tuned model</p>
|
|
<h2 class="kdoc-section__title">dbn-legal-agent: trained on Norwegian legal text.</h2>
|
|
|
|
<div class="kdoc-finetune">
|
|
<span class="kdoc-finetune__badge">QLoRA fine-tune</span>
|
|
<h3 class="kdoc-finetune__title">dbn-legal-agent</h3>
|
|
<p class="kdoc-finetune__body">A QLoRA (Quantized Low-Rank Adaptation) fine-tune trained on Norwegian child-welfare and administrative law text — case notes, court decisions, Barnevernet correspondence, Fylkesnemnda decisions, and Statsforvalter rulings. The model has internalised the temporal patterns of Norwegian legal proceedings: the procedural sequence of an omsorgsovertakelse, the typical timeline of a tiltaksplan review cycle, what <em>akutt</em> means as a temporal signal, how Fylkesnemnda milestones are ordered.</p>
|
|
<p class="kdoc-finetune__body" style="margin-top:0.8rem;">In the Timeline GPU engine, dbn-legal-agent runs as the primary extraction model via the LiteLLM proxy on cuttlefish. The structured JSON output schema is identical to the Azure engines — the same post-processing pipeline applies regardless of which engine produced the extraction. No Azure API calls are made when the GPU engine is selected.</p>
|
|
<div class="kdoc-finetune__chips">
|
|
<span class="kdoc-finetune__chip">QLoRA</span>
|
|
<span class="kdoc-finetune__chip">Norwegian legal corpus</span>
|
|
<span class="kdoc-finetune__chip">case notes</span>
|
|
<span class="kdoc-finetune__chip">court decisions</span>
|
|
<span class="kdoc-finetune__chip">Barnevernet</span>
|
|
<span class="kdoc-finetune__chip">Fylkesnemnda</span>
|
|
<span class="kdoc-finetune__chip">LiteLLM proxy</span>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
|
|
<!-- Privacy by design -->
|
|
<section class="kdoc-section--alt">
|
|
<div class="kdoc-section">
|
|
<p class="kdoc-section__eyebrow">Privacy & security</p>
|
|
<h2 class="kdoc-section__title">Your documents never leave your session.</h2>
|
|
|
|
<div class="kdoc-privacy">
|
|
<p class="kdoc-privacy__title">Privacy by design</p>
|
|
<ul>
|
|
<li>All uploaded files are extracted to text <strong>in memory</strong> using PHP's in-process file handlers. The raw binary is never written to disk on the server.</li>
|
|
<li>Session context (pasted text, uploaded content, extracted timeline events) is scoped to your authenticated session and discarded when the session ends.</li>
|
|
<li>Azure OpenAI (<code>gpt-4o</code>, <code>gpt-4o-mini</code>) is configured on the <strong>West Europe</strong> region. Data processed via Azure OpenAI is not used for model training under the default enterprise agreement.</li>
|
|
<li>The GPU/cuttlefish engine processes entirely locally — no data leaves your network. The LiteLLM proxy on cuttlefish receives your document text and returns structured JSON; nothing is forwarded to an external API.</li>
|
|
<li>Telemetry logged: tool name, engine, focus mode, event count, latency. <strong>No document text, case references, actor names, or extracted events are logged.</strong></li>
|
|
</ul>
|
|
</div>
|
|
</div>
|
|
</section>
|
|
|
|
<!-- CTA -->
|
|
<section class="kdoc-cta-strip">
|
|
<h2 class="kdoc-cta-strip__title">See it work on your case.</h2>
|
|
<p class="kdoc-cta-strip__sub">Free for Do Better Norge members. All engines available to every member.</p>
|
|
<div class="kdoc-hero__ctas">
|
|
<?php if ($isAuthed): ?>
|
|
<a href="/timeline.php" class="kdoc-btn-primary">Open Timeline →</a>
|
|
<?php else: ?>
|
|
<a href="<?= htmlspecialchars($toolsLogin) ?>" class="kdoc-btn-primary">Sign in to use Timeline →</a>
|
|
<a href="<?= htmlspecialchars($registerUrl) ?>" class="kdoc-btn-secondary">Register free</a>
|
|
<?php endif; ?>
|
|
<a href="/timeline-guide.php" class="kdoc-btn-secondary">User guide</a>
|
|
</div>
|
|
</section>
|
|
|
|
<?php require_once __DIR__ . '/includes/footer.php'; ?>
|
|
<script src="assets/js/tools.js" defer></script>
|
|
</body>
|
|
</html>
|