Add corpus explorer: search bar (Hybrid/BM25/Vector), category drill-down, source row expand

- api/corpus-search.php: new endpoint with three search modes (hybrid RAG, BM25 keyword, Qdrant vector)
- api/corpus-documents.php: paginated document browser by category or source name
- corpus.php: search bar with mode+language pills, Browse docs button on each category card with drill-down panel, expand toggle on each source row showing doc count and scraper class
- tools.css: all new corpus interactive styles appended

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-05-15 11:55:54 +02:00
parent 785de04f05
commit 38255669a9
4 changed files with 962 additions and 42 deletions
+363 -42
View File
@@ -37,6 +37,7 @@ $reasoningPanelOverride = ob_get_clean();
require_once __DIR__ . '/includes/layout.php';
?>
<!-- STATS BAR -->
<div class="corpus-stats-bar" id="corpusStatsBar">
<div class="corpus-stat" id="statChunks">
<span class="corpus-stat__value is-loading">—</span>
@@ -56,6 +57,28 @@ require_once __DIR__ . '/includes/layout.php';
</div>
</div>
<!-- CORPUS SEARCH -->
<div class="corpus-search-box">
<div class="corpus-search-row">
<input type="search" id="corpusSearchInput" class="corpus-search-input"
placeholder="Search 220 K passages — try «samvær», «arbeidsgiver», «barnevernloven»…"
autocomplete="off" spellcheck="false">
<button id="corpusSearchBtn" class="primary-button" type="button">Search</button>
</div>
<div class="corpus-search-controls">
<div class="search-modes" role="group" aria-label="Search mode">
<button class="mode-pill is-active" data-mode="hybrid" type="button">Hybrid</button>
<button class="mode-pill" data-mode="bm25" type="button">BM25</button>
<button class="mode-pill" data-mode="vector" type="button">Vector</button>
</div>
<div class="lang-pills" role="group" aria-label="Language">
<button class="mode-pill is-active" data-lang="en" type="button">EN</button>
<button class="mode-pill" data-lang="no" type="button">NO</button>
</div>
</div>
</div>
<div id="corpusSearchResults" class="corpus-search-results" hidden></div>
<!-- COVERAGE -->
<div class="corpus-section">
<p class="eyebrow">Coverage</p>
@@ -68,6 +91,7 @@ require_once __DIR__ . '/includes/layout.php';
</div>
<h4>Family Law</h4>
<p>Barneloven, child custody (foreldreansvar), samvær, mediation (mekling), separation and divorce proceedings.</p>
<button class="cat-browse-btn" data-cat="family-law" type="button">Browse docs →</button>
</div>
<div class="category-card" data-category="child-welfare">
<div class="category-card__top">
@@ -76,6 +100,7 @@ require_once __DIR__ . '/includes/layout.php';
</div>
<h4>Child Welfare</h4>
<p>Barnevernloven, omsorgsovertakelse, emergency care orders, foster placement, CPS (barnevernet) case law.</p>
<button class="cat-browse-btn" data-cat="child-welfare" type="button">Browse docs →</button>
</div>
<div class="category-card" data-category="labour-law">
<div class="category-card__top">
@@ -84,6 +109,7 @@ require_once __DIR__ . '/includes/layout.php';
</div>
<h4>Labour Law</h4>
<p>Arbeidsmiljøloven, collective agreements (tariffavtaler), Arbeidsretten rulings, dismissal, sick leave obligations.</p>
<button class="cat-browse-btn" data-cat="labour-law" type="button">Browse docs →</button>
</div>
<div class="category-card" data-category="social-welfare">
<div class="category-card__top">
@@ -92,6 +118,7 @@ require_once __DIR__ . '/includes/layout.php';
</div>
<h4>Social Welfare</h4>
<p>NAV guidance on sykepenger, dagpenger, AAP, uføretrygd, alderspensjon, yrkesskade and social assistance.</p>
<button class="cat-browse-btn" data-cat="social-welfare" type="button">Browse docs →</button>
</div>
<div class="category-card" data-category="tax-law">
<div class="category-card__top">
@@ -100,6 +127,7 @@ require_once __DIR__ . '/includes/layout.php';
</div>
<h4>Tax Law</h4>
<p>Skatteetaten's Skatte-ABC, binding advance rulings (BFU), Skatteklagenemnda decisions, income and capital tax.</p>
<button class="cat-browse-btn" data-cat="tax-law" type="button">Browse docs →</button>
</div>
<div class="category-card" data-category="administrative-law">
<div class="category-card__top">
@@ -108,6 +136,7 @@ require_once __DIR__ . '/includes/layout.php';
</div>
<h4>Administrative Law</h4>
<p>Sivilombudet reports, Forvaltningsloven, procedural rights, official complaints, Stortinget oversight.</p>
<button class="cat-browse-btn" data-cat="administrative-law" type="button">Browse docs →</button>
</div>
<div class="category-card" data-category="consumer-law">
<div class="category-card__top">
@@ -116,6 +145,7 @@ require_once __DIR__ . '/includes/layout.php';
</div>
<h4>Consumer &amp; Housing</h4>
<p>HTU (rental disputes), Finansklagenemnda, Forbrukertilsynet, Forbrukerrådet, Pakkereisenemnda decisions.</p>
<button class="cat-browse-btn" data-cat="consumer-law" type="button">Browse docs →</button>
</div>
<div class="category-card" data-category="immigration-law">
<div class="category-card__top">
@@ -124,6 +154,7 @@ require_once __DIR__ . '/includes/layout.php';
</div>
<h4>Immigration &amp; International</h4>
<p>UNE (Utlendingsnemnda) decisions, ECHR Art. 8 family rights, EMD case law, Hague Convention (cross-border child abduction).</p>
<button class="cat-browse-btn" data-cat="immigration-law" type="button">Browse docs →</button>
</div>
<div class="category-card" data-category="government-documents">
<div class="category-card__top">
@@ -132,6 +163,22 @@ require_once __DIR__ . '/includes/layout.php';
</div>
<h4>Government Documents</h4>
<p>NOUer, Stortingsmeldinger, government white papers and regulatory guidance from Regjeringen.no.</p>
<button class="cat-browse-btn" data-cat="government-documents" type="button">Browse docs →</button>
</div>
</div>
<!-- DRILL-DOWN PANEL -->
<div id="corpusDrillPanel" class="corpus-drill-panel" hidden>
<div class="drill-header">
<div>
<p class="eyebrow" id="drillEyebrow">Category</p>
<h3 id="drillTitle">Documents</h3>
</div>
<button class="drill-close-btn" id="drillCloseBtn" type="button" aria-label="Close">✕</button>
</div>
<div id="drillDocList" class="doc-list"></div>
<div class="doc-list__more-wrap" id="drillMoreWrap" hidden>
<button class="doc-list__more" id="drillMoreBtn" type="button">Load more</button>
</div>
</div>
</div>
@@ -144,6 +191,7 @@ require_once __DIR__ . '/includes/layout.php';
<table class="sources-table" id="sourcesTable">
<thead>
<tr>
<th></th>
<th>Source</th>
<th>Type</th>
<th>Category</th>
@@ -153,7 +201,7 @@ require_once __DIR__ . '/includes/layout.php';
</tr>
</thead>
<tbody id="sourcesTableBody">
<tr class="sources-skeleton"><td colspan="6">Loading sources…</td></tr>
<tr class="sources-skeleton"><td colspan="7">Loading sources…</td></tr>
</tbody>
</table>
</div>
@@ -280,6 +328,13 @@ require_once __DIR__ . '/includes/layout.php';
<script>
(function () {
'use strict';
// ── Utilities ────────────────────────────────────────────────────────────
function esc(s) {
return String(s ?? '').replace(/&/g,'&amp;').replace(/</g,'&lt;').replace(/>/g,'&gt;').replace(/"/g,'&quot;');
}
function fmt(n) {
if (n === null || n === undefined) return '—';
return Number(n).toLocaleString('en');
@@ -293,6 +348,17 @@ require_once __DIR__ . '/includes/layout.php';
} catch (e) { return s; }
}
function highlight(text, query) {
if (!query) return esc(text);
const safe = esc(text);
const safeQ = query.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
return safe.replace(new RegExp(safeQ.replace(/&amp;/g,'&').replace(/&lt;/g,'<'), 'gi'),
m => '<mark>' + esc(m) + '</mark>');
}
function setLoaded(el) { el.classList.remove('is-loading'); }
// ── Authority / schedule label maps ─────────────────────────────────────
const authorityLabels = {
case_law: { label: 'Case law', cls: 'badge--teal' },
guidance: { label: 'Guidance', cls: 'badge--amber' },
@@ -305,13 +371,9 @@ require_once __DIR__ . '/includes/layout.php';
};
const scheduleLabels = {
daily: 'Daily',
weekly: 'Weekly',
monthly: 'Monthly',
manual: 'Manual',
daily: 'Daily', weekly: 'Weekly', monthly: 'Monthly', manual: 'Manual',
};
// Category slug → element id map (for live counts)
const catIds = {
'family-law': 'cat-family-law',
'family_law': 'cat-family-law',
@@ -338,9 +400,20 @@ require_once __DIR__ . '/includes/layout.php';
'procurement-law': 'cat-administrative-law',
};
function setLoaded(el) {
el.classList.remove('is-loading');
}
const catLabels = {
'family-law': 'Family Law',
'child-welfare': 'Child Welfare',
'labour-law': 'Labour Law',
'social-welfare': 'Social Welfare',
'tax-law': 'Tax Law',
'administrative-law': 'Administrative Law',
'consumer-law': 'Consumer & Housing',
'immigration-law': 'Immigration & International',
'government-documents': 'Government Documents',
};
// ── STATS + SOURCES table load ───────────────────────────────────────────
let cachedSources = [];
fetch('/api/corpus-stats.php', { credentials: 'same-origin' })
.then(r => r.json())
@@ -353,12 +426,11 @@ require_once __DIR__ . '/includes/layout.php';
const elSrc = document.querySelector('#statSources .corpus-stat__value');
const elUpd = document.querySelector('#statUpdated .corpus-stat__value');
if (elChunks) { elChunks.textContent = fmt(s.total_chunks); setLoaded(elChunks); }
if (elDocs) { elDocs.textContent = fmt(s.total_docs); setLoaded(elDocs); }
if (elChunks) { elChunks.textContent = fmt(s.total_chunks); setLoaded(elChunks); }
if (elDocs) { elDocs.textContent = fmt(s.total_docs); setLoaded(elDocs); }
if (elSrc) { elSrc.textContent = fmt(s.active_sources); setLoaded(elSrc); }
if (elUpd) { elUpd.textContent = fmtDate(s.last_updated); setLoaded(elUpd); }
// Category counts
(s.by_category || []).forEach(row => {
const elId = catIds[row.category];
if (!elId) return;
@@ -368,47 +440,296 @@ require_once __DIR__ . '/includes/layout.php';
el.textContent = fmt(cur + parseInt(row.doc_count, 10));
setLoaded(el);
});
// Zero out remaining loading badges
document.querySelectorAll('.category-card__count.is-loading').forEach(el => {
el.textContent = '0';
setLoaded(el);
el.textContent = '0'; setLoaded(el);
});
// Sources table
const tbody = document.getElementById('sourcesTableBody');
if (!tbody) return;
tbody.innerHTML = '';
(data.sources || []).forEach(src => {
const auth = authorityLabels[src.authority_type] || { label: src.authority_type || '—', cls: 'badge--muted' };
const sched = scheduleLabels[src.schedule] || (src.schedule || 'Manual');
const langFlag = src.language === 'no' ? '🇳🇴' : src.language === 'en' ? '🇬🇧' : (src.language || '—');
const statusHtml = src.is_active
? '<span class="status-active">● Active</span>'
: '<span class="status-inactive">○ Inactive</span>';
const nameHtml = src.url
? `<a href="${escHtml(src.url)}" target="_blank" rel="noopener">${escHtml(src.name)}</a>`
: escHtml(src.name);
const tr = document.createElement('tr');
tr.innerHTML = `
<td class="source-name">${nameHtml}</td>
<td><span class="source-badge ${escHtml(auth.cls)}">${escHtml(auth.label)}</span></td>
<td><span class="source-cat">${escHtml(src.category || '—')}</span></td>
<td>${langFlag}</td>
<td>${escHtml(sched)}</td>
<td>${statusHtml}</td>`;
tbody.appendChild(tr);
});
cachedSources = data.sources || [];
renderSourcesTable(cachedSources);
})
.catch(() => {
document.querySelectorAll('.corpus-stat__value').forEach(el => {
el.textContent = '—';
el.classList.remove('is-loading');
el.textContent = '—'; el.classList.remove('is-loading');
});
});
function escHtml(s) {
return String(s ?? '').replace(/&/g,'&amp;').replace(/</g,'&lt;').replace(/>/g,'&gt;').replace(/"/g,'&quot;');
// ── Sources table rendering ───────────────────────────────────────────────
function renderSourcesTable(sources) {
const tbody = document.getElementById('sourcesTableBody');
if (!tbody) return;
tbody.innerHTML = '';
sources.forEach((src, idx) => {
const auth = authorityLabels[src.authority_type] || { label: src.authority_type || '—', cls: 'badge--muted' };
const sched = scheduleLabels[src.schedule] || (src.schedule || 'Manual');
const langFlag = src.language === 'no' ? '🇳🇴' : src.language === 'en' ? '🇬🇧' : (src.language || '—');
const statusHtml = src.is_active
? '<span class="status-active">● Active</span>'
: '<span class="status-inactive">○ Inactive</span>';
const nameHtml = src.url
? `<a href="${esc(src.url)}" target="_blank" rel="noopener">${esc(src.name)}</a>`
: esc(src.name);
const tr = document.createElement('tr');
tr.dataset.idx = idx;
tr.innerHTML = `
<td class="source-expand-cell">
<button class="source-expand-btn" type="button" aria-expanded="false" aria-label="Expand ${esc(src.name)}">▶</button>
</td>
<td class="source-name">${nameHtml}</td>
<td><span class="source-badge ${esc(auth.cls)}">${esc(auth.label)}</span></td>
<td><span class="source-cat">${esc(src.category || '—')}</span></td>
<td>${langFlag}</td>
<td>${esc(sched)}</td>
<td>${statusHtml}</td>`;
tbody.appendChild(tr);
// Expand row (hidden)
const expandTr = document.createElement('tr');
expandTr.className = 'source-expand-row';
expandTr.hidden = true;
expandTr.dataset.name = src.name;
expandTr.innerHTML = `<td colspan="7"><div class="source-expand-inner" id="source-expand-${idx}">
<div class="source-expand-loading">Loading…</div></div></td>`;
tbody.appendChild(expandTr);
// Toggle handler
tr.querySelector('.source-expand-btn').addEventListener('click', function () {
const isOpen = expandTr.hidden === false;
if (isOpen) {
expandTr.hidden = true;
this.textContent = '▶';
this.setAttribute('aria-expanded', 'false');
} else {
expandTr.hidden = false;
this.textContent = '▼';
this.setAttribute('aria-expanded', 'true');
loadSourceExpand(idx, src, `source-expand-${idx}`);
}
});
});
}
function loadSourceExpand(idx, src, containerId) {
const container = document.getElementById(containerId);
if (!container || container.dataset.loaded) return;
container.dataset.loaded = '1';
// Fetch doc count for this source
const qs = new URLSearchParams({ source_name: src.name, limit: 1 });
fetch('/api/corpus-documents.php?' + qs, { credentials: 'same-origin' })
.then(r => r.json())
.then(data => {
const total = data.ok ? data.total : '?';
container.innerHTML = `
<div class="source-expand-grid">
<div>
<dl class="source-expand-dl">
<dt>Scraper class</dt>
<dd><code>${esc(src.scraper_class || '—')}</code></dd>
<dt>Category</dt>
<dd>${esc(src.category || '—')}</dd>
<dt>Authority type</dt>
<dd>${esc(src.authority_type || '—')}</dd>
<dt>Language</dt>
<dd>${src.language === 'no' ? '🇳🇴 Norwegian' : src.language === 'en' ? '🇬🇧 English' : esc(src.language || '—')}</dd>
<dt>Update schedule</dt>
<dd>${esc(scheduleLabels[src.schedule] || src.schedule || '—')}</dd>
<dt>Documents indexed</dt>
<dd><strong>${fmt(total)}</strong></dd>
</dl>
</div>
<div>
${src.url ? `<p class="source-expand-url"><a href="${esc(src.url)}" target="_blank" rel="noopener">${esc(src.url)}</a></p>` : ''}
${total > 0 ? `<button class="doc-list__more source-browse-btn" data-source="${esc(src.name)}" type="button">Browse ${fmt(total)} documents →</button>` : ''}
</div>
</div>`;
container.querySelectorAll('.source-browse-btn').forEach(btn => {
btn.addEventListener('click', () => openDrillBySource(src.name));
});
})
.catch(() => {
container.innerHTML = `<p class="source-expand-error">Could not load source details.</p>`;
});
}
// ── Category drill-down ───────────────────────────────────────────────────
let drillState = { category: null, sourceName: null, offset: 0, total: 0, limit: 20 };
const drillPanel = document.getElementById('corpusDrillPanel');
const drillDocList = document.getElementById('drillDocList');
const drillTitle = document.getElementById('drillTitle');
const drillEyebrow = document.getElementById('drillEyebrow');
const drillMoreWrap = document.getElementById('drillMoreWrap');
const drillMoreBtn = document.getElementById('drillMoreBtn');
const drillCloseBtn = document.getElementById('drillCloseBtn');
document.querySelectorAll('.cat-browse-btn').forEach(btn => {
btn.addEventListener('click', () => openDrillByCategory(btn.dataset.cat));
});
function openDrillByCategory(cat) {
drillState = { category: cat, sourceName: null, offset: 0, total: 0, limit: 20 };
drillEyebrow.textContent = 'Category';
drillTitle.textContent = catLabels[cat] || cat;
drillDocList.innerHTML = '<p class="drill-loading">Loading documents…</p>';
drillMoreWrap.hidden = true;
drillPanel.hidden = false;
drillPanel.scrollIntoView({ behavior: 'smooth', block: 'start' });
fetchDrillPage(false);
}
function openDrillBySource(sourceName) {
drillState = { category: null, sourceName: sourceName, offset: 0, total: 0, limit: 20 };
drillEyebrow.textContent = 'Source';
drillTitle.textContent = sourceName;
drillDocList.innerHTML = '<p class="drill-loading">Loading documents…</p>';
drillMoreWrap.hidden = true;
drillPanel.hidden = false;
drillPanel.scrollIntoView({ behavior: 'smooth', block: 'start' });
fetchDrillPage(false);
}
function fetchDrillPage(append) {
const qs = new URLSearchParams({ offset: drillState.offset, limit: drillState.limit });
if (drillState.category) qs.set('category', drillState.category);
if (drillState.sourceName) qs.set('source_name', drillState.sourceName);
fetch('/api/corpus-documents.php?' + qs, { credentials: 'same-origin' })
.then(r => r.json())
.then(data => {
if (!data.ok) {
if (!append) drillDocList.innerHTML = '<p class="drill-error">Could not load documents.</p>';
return;
}
drillState.total = data.total;
const docs = data.documents || [];
if (!append) drillDocList.innerHTML = '';
if (docs.length === 0 && !append) {
drillDocList.innerHTML = '<p class="drill-empty">No documents found in this category.</p>';
drillMoreWrap.hidden = true;
return;
}
docs.forEach(doc => {
const item = document.createElement('div');
item.className = 'doc-list__item';
const titleHtml = doc.source_url
? `<a href="${esc(doc.source_url)}" target="_blank" rel="noopener" class="doc-list__title">${esc(doc.title || '(Untitled)')}</a>`
: `<span class="doc-list__title">${esc(doc.title || '(Untitled)')}</span>`;
const langFlag = doc.language === 'no' ? '🇳🇴' : doc.language === 'en' ? '🇬🇧' : '';
item.innerHTML = `
<div class="doc-list__info">
${titleHtml}
<div class="doc-list__meta">
<span class="source-cat">${esc(doc.category || '—')}</span>
${langFlag ? `<span>${langFlag}</span>` : ''}
<span class="doc-list__date">${fmtDate(doc.updated_at)}</span>
</div>
</div>
<span class="doc-list__chunks">${fmt(doc.chunk_count)} passages</span>`;
drillDocList.appendChild(item);
});
const loaded = drillState.offset + docs.length;
drillMoreWrap.hidden = loaded >= drillState.total;
drillState.offset = loaded;
})
.catch(() => {
if (!append) drillDocList.innerHTML = '<p class="drill-error">Network error.</p>';
});
}
drillMoreBtn.addEventListener('click', () => fetchDrillPage(true));
drillCloseBtn.addEventListener('click', () => { drillPanel.hidden = true; });
// ── Search bar ────────────────────────────────────────────────────────────
let searchMode = 'hybrid';
let searchLang = 'en';
document.querySelectorAll('.search-modes .mode-pill').forEach(btn => {
btn.addEventListener('click', () => {
document.querySelectorAll('.search-modes .mode-pill').forEach(b => b.classList.remove('is-active'));
btn.classList.add('is-active');
searchMode = btn.dataset.mode;
});
});
document.querySelectorAll('.lang-pills .mode-pill').forEach(btn => {
btn.addEventListener('click', () => {
document.querySelectorAll('.lang-pills .mode-pill').forEach(b => b.classList.remove('is-active'));
btn.classList.add('is-active');
searchLang = btn.dataset.lang;
});
});
const searchInput = document.getElementById('corpusSearchInput');
const searchBtn = document.getElementById('corpusSearchBtn');
const searchResults = document.getElementById('corpusSearchResults');
function runSearch() {
const q = searchInput.value.trim();
if (q.length < 3) {
searchResults.innerHTML = '<p class="search-hint">Enter at least 3 characters.</p>';
searchResults.hidden = false;
return;
}
searchResults.hidden = false;
searchResults.innerHTML = `<p class="search-loading">Searching in <strong>${esc(searchMode)}</strong> mode…</p>`;
searchBtn.disabled = true;
fetch('/api/corpus-search.php', {
method: 'POST',
credentials: 'same-origin',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ query: q, mode: searchMode, language: searchLang, limit: 8 }),
})
.then(r => r.json())
.then(data => {
searchBtn.disabled = false;
if (!data.ok) {
searchResults.innerHTML = `<p class="search-error">Search error: ${esc(data.error?.message || 'Unknown error')}</p>`;
return;
}
const hits = data.hits || [];
if (hits.length === 0) {
searchResults.innerHTML = `<p class="search-empty">No results for <strong>${esc(q)}</strong> in ${esc(data.mode)} mode.</p>`;
return;
}
const modeLabel = { hybrid: 'Hybrid RAG', bm25: 'BM25 keyword', vector: 'Vector semantic' }[data.mode] || data.mode;
let html = `<div class="search-results-header"><span class="eyebrow">${esc(modeLabel)}</span><span class="search-results-count">${hits.length} passage${hits.length !== 1 ? 's' : ''}</span></div>`;
hits.forEach(hit => {
const score = hit.score != null ? `<span class="passage-score">${Math.round(hit.score * 100)}%</span>` : '';
const catAuth = authorityLabels[hit.category] || { label: hit.category || '—', cls: 'badge--muted' };
const titleHtml = hit.source_url
? `<a href="${esc(hit.source_url)}" target="_blank" rel="noopener" class="passage-card__title">${esc(hit.title || '(Untitled)')}</a>`
: `<span class="passage-card__title">${esc(hit.title || '(Untitled)')}</span>`;
const section = hit.section ? `<span class="passage-section">§ ${esc(hit.section)}</span>` : '';
const excerpt = highlight(hit.excerpt || '', q);
html += `
<div class="passage-card">
<div class="passage-card__meta">
<span class="source-badge ${esc(catAuth.cls)}">${esc(catAuth.label)}</span>
${section}
${score}
</div>
${titleHtml}
<p class="passage-card__excerpt">${excerpt}</p>
</div>`;
});
searchResults.innerHTML = html;
})
.catch(err => {
searchBtn.disabled = false;
searchResults.innerHTML = `<p class="search-error">Network error.</p>`;
});
}
searchBtn.addEventListener('click', runSearch);
searchInput.addEventListener('keydown', e => { if (e.key === 'Enter') runSearch(); });
})();
</script>