Provide a signal-extraction MCP server that converts RSS into deduplicated, enriched news clusters that are easy for agents to use.
/mcpjson_extract(payload, '$.timestamp'). Auto-maintained by SQLite on write. Indexed for O(log n) time-range queries. No write-path code needed.(cluster_id, entity) with index on entity. Populated in upsert_clusters(). SQL-level entity search.(cluster_id, keyword) with index on keyword. Same pattern.sha1(min_article_key) — topic-independent, order-independent, consistent across polling cycles.NEWS_CLUSTER_MAX_AGE_HOURS, default 4h)./api/v1/*) + Keywords panel + entity/keyword drill-down via junction tablesget_latest_events(topic, limit, include_articles)get_events_for_entity(entity, limit, timeframe, include_articles)get_event_summary(event_id, include_articles)detect_emerging_topics(limit, timeframe, topic, around) — returns signal_type (entity/keyword/phrase)get_news_sentiment(entity, timeframe)get_related_recent_entities(subject, timeframe, limit, include_trends)get_feeds() / toggle_feed(feed_url, enabled)get_capabilities()GET / — server info, tools listGET /health — uptime, version hashGET /api/v1/clusters — paginated, filtered by payload_ts SQL indexGET /api/v1/entities — top entities via junction table GROUP BYGET /api/v1/keywords — top keywords via junction table GROUP BYGET /api/v1/clusters/by-entity?entity=X&hours=Y — SQL entity search (NEW)GET /api/v1/clusters/by-keyword?keyword=X&hours=Y — SQL keyword search (NEW)GET /api/v1/sentiment-series — filtered by payload_ts SQL indexGET /api/v1/cluster/{cluster_id} — full detailGET /api/v1/feeds / POST /api/v1/feeds/toggle — feed managementNEWS_REFRESH_INTERVAL_SECONDS (default 300s)get_event_summaryNEWS_RETENTION_DAYS, NEWS_PRUNE_INTERVAL_HOURSCREATE TABLE clusters (
cluster_id TEXT PRIMARY KEY,
topic TEXT NOT NULL,
payload TEXT NOT NULL,
updated_at TEXT NOT NULL, -- row modification time (set on every upsert)
summary_payload TEXT,
summary_updated_at TEXT,
payload_ts GENERATED ALWAYS AS -- indexed event time (auto-maintained)
(json_extract(payload, '$.timestamp')) VIRTUAL
);
CREATE INDEX idx_clusters_payload_ts ON clusters(payload_ts);
CREATE TABLE cluster_entities (
cluster_id TEXT NOT NULL REFERENCES clusters(cluster_id) ON DELETE CASCADE,
entity TEXT NOT NULL, -- lowercased
PRIMARY KEY (cluster_id, entity)
);
CREATE INDEX idx_cluster_entities_entity ON cluster_entities(entity);
CREATE TABLE cluster_keywords (
cluster_id TEXT NOT NULL REFERENCES clusters(cluster_id) ON DELETE CASCADE,
keyword TEXT NOT NULL, -- lowercased
PRIMARY KEY (cluster_id, keyword)
);
CREATE INDEX idx_cluster_keywords_keyword ON cluster_keywords(keyword);
Keywords extracted by the LLM are now first-class search signals:
_cluster_entity_haystack() includes keywords → get_events_for_entity() matches themeskeywords[] fielddetect_emerging_topics() scores keywords with velocity/recency/source-diversity formula (signal_type: "keyword")_collect_local_related() counts keyword co-occurrenceDashboardStore has been eliminated. All of its methods were moved into SQLiteClusterStore (the single data access layer), and the REST API routes now use the shared SQLiteClusterStore instance directly.
All MCP tools (get_events_for_entity, get_news_sentiment, get_latest_events entity mode) now use SQLiteClusterStore.get_clusters_by_entity_or_keyword() which searches via junction-table SQL joins — no row-limit blind spot. The cluster_entities and cluster_keywords junction tables are indexed for O(log n) lookup across any time window.
sanitize_cluster_payload() normalizes timestamp/first_seen/last_updated to YYYY-MM-DDTHH:MM:SS+00:00. If all three missing, falls back to datetime.now().payload_ts auto-extracts from JSON on write. Indexed.payload_ts >= ? in SQL. No JSON parsing for time filtering.scripts/backfill_junction_tables.py populated junction tables from existing payloads. payload_ts was auto-populated by SQLite.What happened: DashboardStore was a thin read-only query layer that wrapped SQLiteClusterStore. The MCP tools (get_events_for_entity, get_news_sentiment, get_latest_events entity mode) did Python-side entity matching by fetching top-N clusters via payload_ts then filtering in Python. Entities in clusters beyond the limit were silently missed.
Fix applied:
DashboardStore was deleted. All its methods are now in SQLiteClusterStore.SQLiteClusterStore.get_clusters_by_entity_or_keyword() — SQL-level junction-table search with no row-limit blind spot.LEFT JOIN on cluster_entities and cluster_keywords with WHERE (ce.entity IN (...) OR ck.keyword IN (...)), which matches both named entities and thematic keywords across any time window.IN) replaced substring matching — more correct, no false positives from partial name matches.