Technical Showcase · How the AI reads time
A full walkthrough of the 3-pass extraction pipeline, Norwegian date format recognition, event classification schema, multi-engine architecture, and the fine-tuned dbn-legal-agent model.
Architecture
The pipeline is intentionally sequential — Pass 1 is rule-based and near-instant; Pass 2 is the LLM extraction; Pass 3 post-processes and scores the output.
A deterministic pattern-matching pass runs before any LLM call. It scans the full input for dates matching 12+ Norwegian formats and normalises them to ISO 8601:
dd.mm.yyyy → YYYY-MM-DDd. månedsnavn yyyy → resolved calendar date20YYNormalised anchors are injected into the LLM prompt to reduce hallucinated or misread dates.
The LLM reads the full document alongside the pre-pass anchors. For every temporal reference it returns a structured JSON event object:
date — resolved ISO date, or verbatim string if unresolvabledate_type — absolute | relative | recurring | conditional | periodconfidence — high | medium | lowactor — attributed entity (from source text, not inferred)description — one-sentence event summarysource_excerpt — verbatim text fragment (max 200 chars)The prompt explicitly instructs the model not to invent dates or actors not present in the source. Temperature is set to 0.1 for deterministic output.
PHP applies all active filters before returning the result:
The post-processor then assembles the what_remains_uncertain list and the next_practical_step recommendation.
Date recognition
Norwegian legal documents use a wide variety of date notations. The Pass 1 pre-pass recognises all of these deterministically; the LLM handles the rest in Pass 2.
| Format | Example | Notes |
|---|---|---|
dd.mm.yyyy |
30.07.2015 | Standard Norwegian numeric |
dd.mm.yy |
09.04.25 | Two-digit year → always 20YY |
d. månedsnavn yyyy |
3. mars 2024 | Written month in bokmål/nynorsk |
d. månedsnavn |
15. januar | Year inferred by proximity scanning |
yyyy-mm-dd |
2024-03-12 | ISO 8601 |
månedsnavn yyyy |
mars 2024 | Month + year only |
yyyy |
2024 | Year-only reference |
| Season + year | høsten 2023 | Seasonal reference → Q3/Q4 |
| Diary-format line | 18.09.2025: Møte avholdt | Date + colon → auto-tagged as event |
| Relative reference | tre uker etter vedtaket | Anchored to nearest resolved event |
| Recurring pattern | hver mandag | Classified as recurring |
| Period / range | fra mars til juni 2024 | Yields start_date + end_date |
Classification schema
| date_type | Definition | Example |
|---|---|---|
absolute |
A specific, resolvable calendar date | 30.07.2015 → 2015-07-30 |
relative |
A date expressed relative to another event | tre uker etter vedtaket |
recurring |
A pattern that repeats on a schedule | each Monday, every 6 months |
conditional |
A date contingent on a condition being met | if no response within 14 days |
period |
A date range or duration with start and end | fra mars til juni 2024 |
| confidence | Meaning | Visual in timeline |
|---|---|---|
high |
Date is explicitly and unambiguously stated in the source text | Green badge |
medium |
Date is inferred, approximate, or stated with slight ambiguity | Amber badge |
low |
Date is implied, undated, or extracted from a degraded/ambiguous passage | Grey badge |
| Rule | Example |
|---|---|
| Named entity in the same sentence | “Trude [saksbehandler] ringte 14. mars” → actor: Trude |
| Role label without a name | “Barnevernet fattet vedtak” → actor: Barnevernet |
| No clear attribution in sentence | actor: [unattributed] |
| Document-level default | If no per-event actor, defaults to the document sender/issuing body |
Engines
All engines return the same JSON schema — the post-processor handles all three identically. Engine choice affects speed, quality, and privacy only.
| Engine | Model | Latency | Best for |
|---|---|---|---|
| Azure gpt-4o-mini ★ | gpt-4o-mini (Azure West Europe) |
~15 s | Default. Fast, cost-efficient, handles most legal documents well. |
| Azure gpt-4o | gpt-4o (Azure West Europe) |
~45 s | Complex documents, overlapping events, poor-quality or dense source text. |
| GPU / cuttlefish | dbn-legal-agent via LiteLLM proxy |
~25 s | Maximum privacy. Entirely local. Fine-tuned on Norwegian legal corpus. |
Fine-tuned model
A QLoRA (Quantized Low-Rank Adaptation) fine-tune trained on Norwegian child-welfare and administrative law text — case notes, court decisions, Barnevernet correspondence, Fylkesnemnda decisions, and Statsforvalter rulings. The model has internalised the temporal patterns of Norwegian legal proceedings: the procedural sequence of an omsorgsovertakelse, the typical timeline of a tiltaksplan review cycle, what akutt means as a temporal signal, how Fylkesnemnda milestones are ordered.
In the Timeline GPU engine, dbn-legal-agent runs as the primary extraction model via the LiteLLM proxy on cuttlefish. The structured JSON output schema is identical to the Azure engines — the same post-processing pipeline applies regardless of which engine produced the extraction. No Azure API calls are made when the GPU engine is selected.
Privacy & security
Privacy by design
gpt-4o, gpt-4o-mini) is configured on the West Europe region. Data processed via Azure OpenAI is not used for model training under the default enterprise agreement.Free for Do Better Norge members. All engines available to every member.