How Timeline works — 3-pass extraction, Norwegian date recognition, fine-tuned LLM

Architecture

Three passes. Each with a distinct job.

The pipeline is intentionally sequential — Pass 1 is rule-based and near-instant; Pass 2 is the LLM extraction; Pass 3 post-processes and scores the output.

Pass 1 · PHP / regex

Detect & normalise known formats

A deterministic pattern-matching pass runs before any LLM call. It scans the full input for dates matching 12+ Norwegian formats and normalises them to ISO 8601:

dd.mm.yyyy → YYYY-MM-DD
d. månedsnavn yyyy → resolved calendar date
Diary-format lines (starting with a date + colon) → auto-tagged as events
Two-digit years → always interpreted as 20YY

Normalised anchors are injected into the LLM prompt to reduce hallucinated or misread dates.

Pass 2 · gpt-4o-mini / gpt-4o / dbn-legal-agent

Extract, classify & score

The LLM reads the full document alongside the pre-pass anchors. For every temporal reference it returns a structured JSON event object:

date — resolved ISO date, or verbatim string if unresolvable
date_type — absolute | relative | recurring | conditional | period
confidence — high | medium | low
actor — attributed entity (from source text, not inferred)
description — one-sentence event summary
source_excerpt — verbatim text fragment (max 200 chars)

The prompt explicitly instructs the model not to invent dates or actors not present in the source. Temperature is set to 0.1 for deterministic output.

Pass 3 · PHP post-processor

Filter, sort & assemble

PHP applies all active filters before returning the result:

Focus filter — strips events not matching the requested focus mode (deadlines / hearings / CPS)
Confidence filter — removes LOW-confidence events if requested
Background filter — strips background/narrative events if unchecked
Date-type filter — strips relative/recurring events if unchecked

The post-processor then assembles the what_remains_uncertain list and the next_practical_step recommendation.

Date recognition

12+ Norwegian date formats, all recognised.

Norwegian legal documents use a wide variety of date notations. The Pass 1 pre-pass recognises all of these deterministically; the LLM handles the rest in Pass 2.

Format	Example	Notes
`dd.mm.yyyy`	30.07.2015	Standard Norwegian numeric
`dd.mm.yy`	09.04.25	Two-digit year → always 20YY
`d. månedsnavn yyyy`	3. mars 2024	Written month in bokmål/nynorsk
`d. månedsnavn`	15. januar	Year inferred by proximity scanning
`yyyy-mm-dd`	2024-03-12	ISO 8601
`månedsnavn yyyy`	mars 2024	Month + year only
`yyyy`	2024	Year-only reference
Season + year	høsten 2023	Seasonal reference → Q3/Q4
Diary-format line	18.09.2025: Møte avholdt	Date + colon → auto-tagged as event
Relative reference	tre uker etter vedtaket	Anchored to nearest resolved event
Recurring pattern	hver mandag	Classified as `recurring`
Period / range	fra mars til juni 2024	Yields `start_date` + `end_date`

Classification schema

Five event types. Three confidence levels.

date_type values

date_type	Definition	Example
`absolute`	A specific, resolvable calendar date	30.07.2015 → 2015-07-30
`relative`	A date expressed relative to another event	tre uker etter vedtaket
`recurring`	A pattern that repeats on a schedule	each Monday, every 6 months
`conditional`	A date contingent on a condition being met	if no response within 14 days
`period`	A date range or duration with start and end	fra mars til juni 2024

confidence levels

confidence	Meaning	Visual in timeline
`high`	Date is explicitly and unambiguously stated in the source text	Green badge
`medium`	Date is inferred, approximate, or stated with slight ambiguity	Amber badge
`low`	Date is implied, undated, or extracted from a degraded/ambiguous passage	Grey badge

Actor attribution rules

Rule	Example
Named entity in the same sentence	“Trude [saksbehandler] ringte 14. mars” → actor: Trude
Role label without a name	“Barnevernet fattet vedtak” → actor: Barnevernet
No clear attribution in sentence	actor: `[unattributed]`
Document-level default	If no per-event actor, defaults to the document sender/issuing body

Engines

Three engines, one structured output.

All engines return the same JSON schema — the post-processor handles all three identically. Engine choice affects speed, quality, and privacy only.

Engine	Model	Latency	Best for
Azure gpt-4o-mini ★	`gpt-4o-mini` (Azure West Europe)	~15 s	Default. Fast, cost-efficient, handles most legal documents well.
Azure gpt-4o	`gpt-4o` (Azure West Europe)	~45 s	Complex documents, overlapping events, poor-quality or dense source text.
GPU / cuttlefish	`dbn-legal-agent` via LiteLLM proxy	~25 s	Maximum privacy. Entirely local. Fine-tuned on Norwegian legal corpus.

Fine-tuned model

dbn-legal-agent: trained on Norwegian legal text.

QLoRA fine-tune

dbn-legal-agent

A QLoRA (Quantized Low-Rank Adaptation) fine-tune trained on Norwegian child-welfare and administrative law text — case notes, court decisions, Barnevernet correspondence, Fylkesnemnda decisions, and Statsforvalter rulings. The model has internalised the temporal patterns of Norwegian legal proceedings: the procedural sequence of an omsorgsovertakelse, the typical timeline of a tiltaksplan review cycle, what akutt means as a temporal signal, how Fylkesnemnda milestones are ordered.

In the Timeline GPU engine, dbn-legal-agent runs as the primary extraction model via the LiteLLM proxy on cuttlefish. The structured JSON output schema is identical to the Azure engines — the same post-processing pipeline applies regardless of which engine produced the extraction. No Azure API calls are made when the GPU engine is selected.

QLoRA Norwegian legal corpus case notes court decisions Barnevernet Fylkesnemnda LiteLLM proxy

Privacy & security

Your documents never leave your session.

Privacy by design

All uploaded files are extracted to text in memory using PHP's in-process file handlers. The raw binary is never written to disk on the server.
Session context (pasted text, uploaded content, extracted timeline events) is scoped to your authenticated session and discarded when the session ends.
Azure OpenAI (gpt-4o, gpt-4o-mini) is configured on the West Europe region. Data processed via Azure OpenAI is not used for model training under the default enterprise agreement.
The GPU/cuttlefish engine processes entirely locally — no data leaves your network. The LiteLLM proxy on cuttlefish receives your document text and returns structured JSON; nothing is forwarded to an external API.
Telemetry logged: tool name, engine, focus mode, event count, latency. No document text, case references, actor names, or extracted events are logged.

How Timeline knows when things happened.