Current version: v0.4.0 — see PROJECT.md for architecture details.
Raw news is useless to agents. Processed news is powerful.
See PROJECT.md for full schema and architecture. Key points:
payload_ts generated column for indexed time-range queriescluster_entities and cluster_keywords junction tables for O(log n) entity/keyword search| Tool | Status | Notes |
|---|---|---|
get_latest_events |
✅ | Time-filtered via payload_ts SQL index |
get_events_for_entity |
⚠️ | MCP tool still uses Python-side entity matching (top-N limit). Dashboard uses SQL junction table. Known design flaw. |
get_event_summary |
✅ | LLM-written narrative |
detect_emerging_topics |
✅ | entity/keyword/phrase signal types, velocity scoring |
get_news_sentiment |
⚠️ | Same Python-side entity matching limitation as get_events_for_entity |
get_related_recent_entities |
✅ | Co-occurrence + Google Trends blend |
get_feeds / toggle_feed |
✅ | Feed management |
detect_emerging_topics(around=...) |
✅ | Scope to entity neighborhood |
SQLiteClusterStore and DashboardStore are parallel copies. Only DashboardStore was updated with junction-table entity search. MCP tools still use Python-side entity matching with a row limit. Proposed fix: collapse into single data access layer.
get_events_for_entity and get_news_sentiment fetch top-N clusters by time then filter entities in Python. Entities in clusters beyond the limit are missed. Fix: use junction table get_clusters_by_entity().
After deploying junction table schema changes:
docker exec -it news-mcp python3 scripts/backfill_junction_tables.py
For timestamp normalization (already run on live server):
docker exec -it news-mcp python3 scripts/normalize_cluster_timestamps.py
detect_emerging_topics() results into canonical entity nodesget_emerging_entity_graph(timeframe, limit)