Commit Graph

36 Commits

Author SHA1 Message Date
daveadmin 3f7d4eef13 feat(tools): add letter length + summary depth controls; harden korrespond §-discipline
- Summarize: new depth param (brief/standard/detailed) with depth-aware prompt
  instructions and coverage mandate; wired through API + JS
- Korrespond: new letter length param (concise/standard/detailed) injected as
  Lengde: instruction in draft pass; wired through API + JS
- Korrespond draft prompt: add §-discipline rule (cite only directly relevant §§)
  plus Opphevet guard (aligned with dobetterlegal-tools)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-04 13:44:02 +02:00
daveadmin 8b99ceec3b feat(rag): add doc-summary pre-filtering to DbnLegalToolsService::search
Before chunk retrieval, embed the query against bnl_doc_summaries Qdrant
collection to identify the most semantically relevant documents. The
resulting document IDs are passed as shared_doc_ids to searchAll(),
narrowing the shared-corpus chunk search to those documents only.

Applied to the 'shared' and 'both' scope paths (not 'private', which
has no shared corpus). Non-fatal: on any error preFilterDocIds stays
empty and search falls back to current unfiltered chunk retrieval.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-03 10:15:57 +02:00
daveadmin c84ed2ed78 fix(tools): parse-harden Do Better Legal ask against leaky fine-tune output
The dbn-legal-agent-v3 fine-tune (Track 1 / family) emits a labelled-prose
template — duplicate `answer:` prefixes, markdown-escaped underscores (`\_`),
and a trailing raw JSON blob — rather than the strict JSON the Azure/gpt-4o
path produces via response_format. decodeJsonObject() returned null on that
invalid JSON, so ask() dumped the entire raw blob into `answer`.

Fix at the parse layer (no upstream response_format change, to avoid fighting
the fine-tune's training):
- dbnToolsRepairJsonText(): strip fences, drop only invalid `\_`/`\*` escapes,
  then balanced-brace scan collecting every top-level {...} (longest first) to
  recover an appended JSON object. Shared by both gateways' decodeJsonObject(),
  so all JSON tools benefit.
- dbnToolsParseLabeledFields(): parse labelled-prose into real fields when no
  JSON decodes, tolerating escaped key names and collapsing duplicate prefixes.
- ask() null-fallback now builds clean structured fields from the parsed prose
  instead of dumping raw; what_remains_uncertain becomes a proper list.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-02 17:36:35 +02:00
daveadmin 7fcd317205 feat(tools): reposition as Do Better Legal two-track Norwegian-law MCP
De-family-ify shared JSON tools (persona-aware routing + neutral base
prompt), make the verification review pick its engine per track
(family/child-welfare -> dbn-legal-agent-v3, others -> gpt-4o interim),
and route product-name strings through dbnToolsProductName(). Rebrand the
MCP/tools surface (mcp.php + i18n mcp_* strings) to Do Better Legal.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-02 07:45:17 +02:00
daveadmin 662fbf7d6d feat(tools): persona-driven multi-domain corpus + model routing
Generalize the family-locked legal tools into caveauAI persona profiles
(client 57 chat profiles, resolved in-process via the chat_profiles bridge).
Each tool accepts an optional `profile` slug that scopes the corpus package(s),
search method, system prompt and synthesis model; omitting it falls back to the
family-legal package so existing behaviour is unchanged.

- dbnToolsResolvePersona / dbnToolsListPersonas / dbnToolsBootChatProfiles in
  bootstrap.php; new api/personas.php + dbn.list_personas MCP tool.
- LegalTools search/ask/corpusContextForSummarize and the BvjAnalyzer /
  LegalAnalysis / translate paths take the persona's packages + prompt + model.
- Persona <select> on ask/search/summarize (populated from api/personas.php).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 20:49:58 +02:00
daveadmin 5a0ef89dca feat(mcp): expose corpus_search, korrespond_refine, extract_text tools
Restores the 3 tools (manifest + invoke arms + invokeExtract helper),
the citation-atom RAG lever in LegalTools/corpus-search, and the catalog
icons. These were live on prod via rsync but uncommitted, so a git-pull
deploy reverted the manifest from 22 to 19 tools.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 16:45:41 +02:00
daveadmin 234ab7278b Timeline: group same-date/actor events, clean badges, Bedrock routing
- renderTimeline(): group consecutive same-date+actor events into one card
  with a bullet list; single events keep their current layout
- Date format: YYYY-MM-DD → "1 Jun 2023" (3-letter month, international)
- Time shown in header when available
- Remove date_type badge; confidence badge replaced by amber ⚠ flag on
  low-confidence events only (high/medium border colour still shows)
- LegalTools.php: resolve azure_full/azure_mini to Bedrock Sonnet/Haiku
  when DbnBedrockGateway is active; claude_sonnet/claude_haiku also handled
- timeline.php + api/timeline.php: engine labels updated (Claude Haiku/Sonnet);
  claude_haiku + claude_sonnet added to valid engine list
- i18n engine labels updated in all 4 languages

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 23:21:35 +02:00
daveadmin 8a11001bff Add AWS Bedrock three-tier gateway routing (LiteLLM via Colin)
Routes AI tools across three tiers based on task complexity:
- Azure GPT-4o-mini always: redact, translate, timeline-basic, search-legal (mechanical tasks)
- Claude Haiku 4.5 (Bedrock): ask, summarize, timeline-deep, citations (Norwegian nuance)
- Claude Sonnet 4.6 (Bedrock): korrespond, legal-analysis, deep-research, barnevernet-analyze,
  discrepancy-find, advocate (public-facing legal output)

No AWS credentials in app — credentials live in LiteLLM on Colin (same as nova-lite).
Rollback: DBN_BEDROCK_ENABLED=false in .env, no code push needed.

Includes extended thinking support for Pro deep-research via chatWithThinking().
Claude Opus 4.7 constant added for future premium tier (needs litellm_config.yaml entry).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 15:22:48 +02:00
daveadmin 17ad54cf36 Add chunked timeline routing 2026-05-25 12:34:41 +02:00
daveadmin 3ad8f4843c Harden timeline quick extraction 2026-05-25 11:14:21 +02:00
daveadmin 983c423740 Fix nova-lite JSON: drop response_format, strip markdown fences
nova-lite ignores json_object constraint and returns {} empty; without
it, it wraps output in ```json fences. Strip fences before decodeJsonObject.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 10:51:24 +02:00
daveadmin f00d3d68e5 Add Quick mode (nova-lite/Bedrock) as 3rd tier for timeline tool
Timeline now offers Quick/Standard/Deep: nova_lite routes to Amazon
Bedrock nova-lite via LiteLLM (1 credit, ~2s faster), azure_mini stays
gpt-4o-mini (1 credit), azure_full stays gpt-4o (2 credits, Pro only).
ToolModels tier rules: free→nova_lite only, plus→quick/standard,
pro→all three.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 10:26:07 +02:00
daveadmin d47024ed67 timeline: remove GPU, add SSE status updates, DOCX export, single-file, engine-aware credits
- Remove GPU/cuttlefish engine from timeline.php, api/timeline.php, LegalTools.php, tools.js (all 4 langs)
- Add engine-aware credit cost: gpt-4o-mini=1 credit, gpt-4o=2 credits (matches redact pattern)
- Remove multiple attribute from file input (single document only)
- New api/timeline-stream.php: SSE endpoint emitting status events + final result
- New api/timeline-download.php: DOCX export of timeline events
- LegalTools::timeline() gains ?callable $onProgress for live status updates
- tools.js: spinner on run, SSE streaming fetch, Export to Word button
- Save to My Docs was already wired (showSaveResultButton at line 1136)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 09:32:28 +02:00
daveadmin 56cd87dd7b redact: UX overhaul — engine simplification, credits, spinner, save-to-docs, badges
- Remove GPU/regex engine options; keep only azure_mini (1 credit) and azure_full (2 credits)
- Variable credit cost: engine-aware pre-check and charge in api/redact.php; PricingCatalog base = 1
- Fix ATTORNEY not preserved when keepOfficials=true: add to LLM prompt, generic-tag, pseudonym regexes
- Replace Azure credits hint with per-engine credit cost text (all 4 languages)
- Single-file upload only (was: up to 5); simplify status messages
- Clear previous redaction output and show pulsing spinner when a new run starts
- Add "Save to My Docs" button in redact output panel (corpus-save.js path)
- corpus-save.js: capture source_doc_ids from button dataset, pass in POST payload
- api/save-to-corpus.php: accept source_doc_ids, store first as source_url=corpus-doc:{id}
- doc-picker.js: show "✂ Redacted" badge for documents saved from the redact tool
- CSS: .redact-working spinner, doc-item__badge--redact pill styles

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 08:18:51 +02:00
daveadmin b21bfb2f1d Add NOK pricing catalog, credit ledger, success-based charging, and tier-gated model routing
- PricingCatalog.php: single source of truth for plans (free/plus/pro), top-ups,
  Stripe price env keys, tool costs (0–6 credits), STT variable billing, feature limits
- FreeTier.php: monthly-first credit deduction, ledger (user_tool_credit_ledger),
  STT reservation/settle/release, monthly reset, trial logic
- StripeClient.php: canonical SKUs (plus/pro/topup_100/300/1000), legacy aliases kept
- stripe-checkout.php: subscription vs payment mode, trial gating, catalog metadata
- stripe-webhook.php: idempotent via stripe_events, handles subscription lifecycle +
  invoice.paid renewal + one-time topup credit grants
- All API tools: success-based credit deduction (check before, charge after)
- transcribe.php: file-size heuristic reservation, settle from actual provider duration
- ask.php + LegalTools.php: ToolModels engine resolution — Pro gets gpt-4o
- KorrespondAgent.php + korrespond.php: tier-gated draft deployment —
  Free/Plus gets gpt-4o-mini, Pro gets gpt-4o
- pricing.php: NOK-only, plan cards, top-up packs, Organisation contact card,
  tool cost table, separate monthly/prepaid balance display
- 003_pricing_credit_catalog.sql: ledger and STT reservation tables

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-24 13:42:27 +02:00
daveadmin e768662efe Add Summarize Document tool — engine selector, file upload, optional corpus enrichment
- summarize.php: full custom inline form (replaces tool_form.php wrapper) with
  lang switcher, azure_mini/azure_full/gpu engine selector, 8 corpus-slice
  toggles (all off by default), doc picker, file upload zone, and textarea
- api/summarize.php: rewritten to streaming NDJSON (matches barnevernet pattern);
  accepts JSON payload with text, language, engine, slices[], doc_ids[]
- includes/LegalTools.php: adds corpusContextForSummarize() (keyword search via
  ClientRagPipeline) and summarizeWithContext() (engine-aware LLM call with
  optional corpus prepend); returns structured JSON matching existing summarize format
- assets/js/summarize.js: self-contained IIFE handling file upload via
  api/extract.php, slice toggles, NDJSON stream reader, result rendering,
  and trace panel update
- includes/i18n.php: adds 'summarize' to nav in all 4 languages (EN/NO/UK/PL),
  inserted after 'redact' in the tool order with icon 'SZ'

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-23 23:25:40 +02:00
daveadmin b014638f39 feat(corpus): add save-to-corpus + private corpus search scope
- POST /api/save-to-corpus.php — saves tool output text to user's default CaveauAI corpus via ClientRagPipeline
- api/case/upload.php — dual-writes uploaded PDFs to CaveauAI client_documents (best-effort)
- assets/js/corpus-save.js — shared <dialog> handler for .js-save-corpus buttons on all tool pages
- includes/layout_footer.php — injects corpus-save.js + shared save dialog markup
- korrespond/deep-research/barnevernet/discrepancy JS — save-to-corpus buttons on output sections
- api/search.php + LegalTools::search() — corpus_scope param ('shared'|'private'|'both'), merges personal CaveauAI corpus with shared legal library when 'both'
- includes/tool_form.php + assets/js/tools.js — corpus scope radio toggle shown on search tab
- api/user-docs.php — add POST upload method for non-SSO authenticated users

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 17:50:32 +02:00
daveadmin 28932297b3 Add user context notes field to timeline tool
Adds an optional textarea below the main text input where users can provide
clarifications to guide the LLM — e.g. year anchors, actor aliases, or focus
instructions. Notes are injected into the prompt as a clearly delimited block
and translated across all four UI languages (en/no/uk/pl).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 12:36:37 +02:00
daveadmin 59b39ff85b feat(redact): tag highlighting, inventory panel, before/after toggle, gpt-4o upgrade
- CSS: colour-coded [TAG] spans by entity type (person=pink, org=blue,
  place=green, date=amber, id=purple)
- Inventory panel: collapsible list showing tag → original text mappings
  with occurrence counts, sourced from new redaction_map API response key
- Before/after toggle: Redacted / Original view-switch buttons wired to
  lastOriginalText captured at submission time
- One-click gpt-4o upgrade button when mini or GPU engine was used
- Backend: redaction_map built from applied LLM entities (tag → originals
  + occurrence count via substr_count on final text)
- renderResults now calls setupRedactViewToggle() after DOM is written

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 08:22:41 +02:00
daveadmin e32ee60e78 feat(timeline): tighten prompt for accuracy — year inference, month names, actor normalization, confidence calibration
- Add 4-step year inference rule for DD.MM. entries (scan backward/forward for anchor year)
- Add Norwegian month-name formats (18. september, den 18. september 2025, etc.) with month lookup table
- Add $relativeInstruction to tell LLM upfront when relative dates are excluded (not just PHP-filtered post-hoc)
- Define confidence calibration criteria explicitly (high/medium/low)
- Improve source_excerpt guidance: most diagnostic phrase, not just any verbatim phrase
- Add actor normalization for Norwegian institutions (Barnevernstjenesten, Fylkesnemnda, Statsforvalteren, etc.)
- Add deduplication rule for events appearing across multiple documents
- Add end_date field for date_type=period events
- Improve what_we_found schema hint to require count/range/actors/gaps
- Increase max_tokens to 8000 for azure_full (gpt-4o) to avoid truncation on large documents
- Tighten system prompt with Norwegian CPS legal chain context

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 07:11:31 +02:00
daveadmin 13572e9dfb feat: extract and display event times on timeline (kl. HH:MM etc.)
Prompt now instructs the model to extract time of day (HH:MM) when
present in Norwegian formats: kl. 14:30, kl 09.00, 14:30, 14.30.
renderTimeline shows time as a muted inline annotation next to the date.
CSV export gains a Time column after Date.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 23:03:20 +02:00
daveadmin a3d46f9756 feat: Legal Tools v1 — multilingual landing, dashboard, SSO bridge
- Public landing page at / for unauthenticated users (EN/NO/UK/PL)
- Authenticated / shows Case Workbench dashboard with manifesto strip,
  stats, and launched-tool grid (Transcribe, Timeline, BVJ, Advocate,
  Deep Research, Corpus)
- Added includes/i18n.php with full 4-language translation layer
- Extended layout.php to Case Workbench shell with tool rail, lang switcher
- AI output language normalization extended to en/no/uk/pl in PHP agents
- SSO token validation in bootstrap.php / index.php (dobetternorge.no bridge)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 22:53:27 +02:00
daveadmin 55e11cb649 Azure: route azure_mini engine to gpt-4o-mini explicitly
The .env default DBN_AZURE_OPENAI_CHAT_DEPLOYMENT is gpt-4o, so the
azure_mini branch (which just called ->chat() without withDeployment)
was silently hitting gpt-4o too. Both UI engine options resolved to
the same model, and timed out together on long Norwegian documents.

Fix: explicitly route azure_mini → gpt-4o-mini in both timeline and
redact paths.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 09:38:55 +02:00
daveadmin 85c3cee719 Azure: raise chat timeout 45s → 90s default; timeline uses 120s
Timeline was using no explicit timeout, falling back to the gateway's
45s default, which timed out on long Norwegian legal documents.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 02:09:02 +02:00
daveadmin f183678f35 Redact: catch soft dates (years, month+year, ranges, prepositions)
Adds Nordic-pack regex patterns for:
- DD.MM.YYYY / DD/MM/YYYY / YYYY-MM-DD
- Year ranges (2011/2012, 2018-2019)
- Month + year (Norwegian + English, with optional day)
- Year preceded by temporal preposition (i 2015, fra 2019, rundt 2018)

Also renames the entity toggle from "Dates of birth" to "Dates" (broader
scope) in all four languages, and expands the LLM prompt so soft date
references in free text are caught even when regex misses them.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 01:58:35 +02:00
daveadmin cdd0fb970b fix(timeline): explicit Norwegian date format recognition in prompt
Add DD.MM.YY, D.M., diary-line format instructions so the model doesn't
skip short Norwegian dates like 18.09.25 or 6.1. Two-digit years always
treated as 20YY. Lines starting with date+colon are always events.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 01:10:16 +02:00
daveadmin 7690ed17ee feat(timeline): full form UI with engine selection and advanced settings
Add 4-language switcher (EN/NO/UK/PL), engine choice (Azure mini/full,
GPU/cuttlefish), and expandable Advanced panel (Focus, Confidence filter,
Date types) to timeline.php. Wire new params through api/timeline.php and
LegalTools::timeline() with engine routing, focus-aware prompt injection,
and confidence/date-type post-filters. Add TIMELINE_I18N to tools.js with
improved renderTimeline() confidence colour-coding and new CSS classes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 00:59:12 +02:00
daveadmin 8c12d5e778 Redact tool: rich UI, multilingual, engine choice, output formats
- Custom inline form (EN/NO/UK/PL lang switcher) replacing generic stub
- Engine selector: Azure gpt-4o-mini (default), gpt-4o, GPU cuttlefish, regex-only
- Entity type toggles: names, organisations, places, dates of birth
- Output formats: contextual role tags, generic [PERSON], Norwegian pseudonyms
- Keep officials mode: judges/experts kept as [JUDGE: Andersen] format
- Exempt names list: specific names excluded from redaction
- Hint paragraphs explaining each option in all four languages
- Backend: engine routing, callGpuLlm(), applyGenericTags(), applyPseudonymization()
- AzureOpenAiGateway: withDeployment() clone pattern for per-call model override

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 00:20:16 +02:00
daveadmin bddafea049 Timeline: document upload, upgraded prompt, CSV export, date_type badge 2026-05-13 08:10:40 +02:00
daveadmin 634a4fa154 Raise MAX_PASTE_CHARS to 128K and redaction max_tokens to 8000 2026-05-13 07:41:41 +02:00
daveadmin 95685862ab Redact: multi-doc upload, contextual person naming, aliases
- Extract limit raised from 32K to 128K chars per file (long legal docs now fit)
- Redact API body/text limits raised (400KB / 128K chars) to match
- Upload zone accepts multiple files (up to 5); extracted text concatenated with
  doc separator and combined before redaction; shows per-file char counts
- LLM redact pass now infers contextual person roles (FATHER, MOTHER, CHILD,
  ATTORNEY, JUDGE, etc.) instead of generic [PERSON] for all names; same
  individual gets consistent tag throughout the document
- Tag validation widened to allow any [A-Za-z0-9_- ] pattern (not just the
  five hardcoded tags), supporting contextual and alias tags
- Alias UI added to Redact mode: user maps real names to bracketed aliases
  (e.g. "David Jr" -> [Junior]); aliases injected into LLM system prompt as
  override instructions; max 20 aliases, 100 chars each
- max_tokens raised from 2000 to 4000; timeout from 60s to 90s for larger docs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-13 07:17:02 +02:00
daveadmin 3c8d7ebc34 feat: pass temporal_mode and as_of_date through DBN search API
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-12 18:45:54 +02:00
daveadmin 1f4f01bda3 Add public showcase landing, doc summary cards, and chunk toggle
- index.php: public showcase landing page (hero, how-it-works, capabilities,
  evidence mock, login form) visible to unauthenticated visitors; full OG/SEO
  meta; app shell hidden behind auth as before
- tools.css: showcase section styles (gradient hero, step cards, capability
  grid, CTA button, evidence mock, footer)
- LegalTools.php: sourceFromChunk() batch-fetches doc_summaries from RAG DB
  for non-private chunks; excerpt shows doc summary when available, falls back
  to raw chunk text; chunk_text field always carries the raw excerpt
- tools.js: renderEvidenceItem() shows doc summary as card body; adds a
  collapsible "View chunk" toggle when summary differs from raw chunk text

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-12 08:37:36 +02:00
daveadmin 62dbb8d900 Gate tools login with Caveau access 2026-05-08 17:12:38 +02:00
daveadmin 9b22947eb2 Two-pass PII redaction with multi-country pattern packs
Pass 1: deterministic regex with Nordic/European/ECHR/Global packs
covering fødselsnummer, Swedish personnummer, Danish/Finnish CPR,
UK NI, French INSEE, IBAN, EU phones, ECHR application numbers, DOB,
and national ID label patterns.

Pass 2: LLM semantic scan (Azure OpenAI) finds names, orgs, places
and identifying descriptions missed by regex. Runs on pre-redacted
text so no raw PII reaches the LLM.

Adds region selector (Nordic/European/ECHR/Global) to the Redact UI.
Falls back gracefully when Azure is not yet configured.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-07 01:27:52 +02:00
daveadmin 2d8d1c7409 Initial release: Do Better Norge Legal Tools Hub
Five MVP tools (Ask, Search, Summarize, Timeline, Redact) with
email+password auth, Azure OpenAI gateway, evidence trail panel,
and process-and-forget privacy default.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-07 00:01:07 +02:00