Corpus health

Vector index

Collection
bnl_chunks
Dimensions
768 (nomic-embed-text)
Similarity
Cosine
RAG strategy
Hybrid vector + keyword
Reciprocal rank fusion
Private boost
1.5×
Temporal mode
legal_conservative
Chunk target
600 words · 75 overlap
Vector DB
Qdrant on Colin Docker
10.0.2.10:6333
Hybrid search
Azure AI Search
bnl-legal-search
West Europe · Basic SKU
Indexed passages
Source documents
Active scrapers
Last ingested

Coverage

Legal categories

Family Law

Barneloven, child custody (foreldreansvar), samvær, mediation (mekling), separation and divorce proceedings.

Child Welfare

Barnevernloven, omsorgsovertakelse, emergency care orders, foster placement, CPS (barnevernet) case law.

Labour Law

Arbeidsmiljøloven, collective agreements (tariffavtaler), Arbeidsretten rulings, dismissal, sick leave obligations.

Social Welfare

NAV guidance on sykepenger, dagpenger, AAP, uføretrygd, alderspensjon, yrkesskade and social assistance.

Tax Law

Skatteetaten's Skatte-ABC, binding advance rulings (BFU), Skatteklagenemnda decisions, income and capital tax.

Administrative Law

Sivilombudet reports, Forvaltningsloven, procedural rights, official complaints, Stortinget oversight.

Consumer & Housing

HTU (rental disputes), Finansklagenemnda, Forbrukertilsynet, Forbrukerrådet, Pakkereisenemnda decisions.

Immigration & International

UNE (Utlendingsnemnda) decisions, ECHR Art. 8 family rights, EMD case law, Hague Convention (cross-border child abduction).

Government Documents

NOUer, Stortingsmeldinger, government white papers and regulatory guidance from Regjeringen.no.

Data sources

Active scrapers

Source Type Category Lang Schedule Status
Loading sources…

Software

AI stack

Reasoning LLMs

  • Azure gpt-4o-mini ★ default — fast, cost-efficient
  • Azure gpt-4o — highest quality
  • GPU qwen2.5:14b — local, private
  • GPU qwen3:14b — reasoning mode
  • GPU dbn-legal-agent — Norwegian law fine-tune (QLoRA on qwen2.5:7b, NorwAI-24B distillation)

All routed via LiteLLM on Colin · 10.0.1.10:4000

Transcription

  • GPU Whisper large-v3 ★ primary
    Cuttlefish · RTX 3060 12 GB VRAM
  • API OpenAI Whisper API
  • Azure AI Speech nb-NO (Norway East)

Speaker diarization · VAD silence filter · beam size 5 · vocabulary presets (barnerett, mediation)

Embeddings

  • nomic-embed-text — 768-dim dense vectors
  • Ollama on Chloe 10.0.1.11:11434
  • Cosine similarity in Qdrant

All documents chunked and embedded before indexing; chunks stored in both Qdrant (vector) and MariaDB (keyword fallback)

Vector & Hybrid Search

  • Qdrant bnl_chunks · ~220 K vectors
    Colin Docker · 10.0.2.10:6333
  • Azure AI Search bnl-legal-search
    Basic SKU · West Europe · hybrid keyword + semantic
  • Reciprocal rank fusion (vector + keyword)
  • Private corpus boosted 1.5×

Chunking pipeline

  • Heading-aware semantic splitting
  • 600-word target · 75-word overlap
  • 50-word minimum chunk
  • SHA-256 deduplication
  • PDF, DOCX, HTML text extraction
  • Temporal metadata (valid_from / valid_until)

Legal temporal reranking: legal_conservative — surfaces current versions first

How it works

Ingestion pipeline

🌐 Source gov websites, APIs, PDFs
🕷 Scraper HTTP / API / PDF
📝 Text extract PDF, DOCX, HTML
TextChunker 600w · 75w overlap
🔢 Embed nomic · 768-dim
Qdrant cosine upsert
🤖 LiteLLM RAG + LLM
🔍 Your tool Ask, Search, Research…