News MCP Server — Project Vision & Status

Current version: v0.4.0 — see PROJECT.md for architecture details.

Core Design Principle

Raw news is useless to agents. Processed news is powerful.

✅ Clusters are the unit of truth, not raw articles
✅ 100 articles → 5–10 clusters, with entities, sentiment, importance
✅ SQL-level filtering by time, entity, keyword — no full-table JSON parsing

Architecture (v0.4.0)

See PROJECT.md for full schema and architecture. Key points:

payload_ts generated column for indexed time-range queries
cluster_entities and cluster_keywords junction tables for O(log n) entity/keyword search
MCP tools and Dashboard REST API both query the same SQLite DB
Docker deployment on thinkcenter-2 (192.168.0.200:8506)

Tool Surface

Tool	Status	Notes
`get_latest_events`	✅	Time-filtered via `payload_ts` SQL index
`get_events_for_entity`	⚠️	MCP tool still uses Python-side entity matching (top-N limit). Dashboard uses SQL junction table. Known design flaw.
`get_event_summary`	✅	LLM-written narrative
`detect_emerging_topics`	✅	entity/keyword/phrase signal types, velocity scoring
`get_news_sentiment`	⚠️	Same Python-side entity matching limitation as `get_events_for_entity`
`get_related_recent_entities`	✅	Co-occurrence + Google Trends blend
`get_feeds` / `toggle_feed`	✅	Feed management
`detect_emerging_topics(around=...)`	✅	Scope to entity neighborhood

Known Design Issues

Two Stores (see PROJECT.md § "Design Flaw")

SQLiteClusterStore and DashboardStore are parallel copies. Only DashboardStore was updated with junction-table entity search. MCP tools still use Python-side entity matching with a row limit. Proposed fix: collapse into single data access layer.

MCP Tool Entity Search

get_events_for_entity and get_news_sentiment fetch top-N clusters by time then filter entities in Python. Entities in clusters beyond the limit are missed. Fix: use junction table get_clusters_by_entity().

Backfill Scripts

After deploying junction table schema changes:

docker exec -it news-mcp python3 scripts/backfill_junction_tables.py

For timestamp normalization (already run on live server):

docker exec -it news-mcp python3 scripts/normalize_cluster_timestamps.py

Future Directions (v0.5.0+)

"Emerging entity graph over time"

Collapse detect_emerging_topics() results into canonical entity nodes
Build weighted edges from co-occurrence in recent clusters
Infer communities (story neighborhoods)
Track graph evolution across refresh windows (node momentum, edge strength changes)
Agent tool: get_emerging_entity_graph(timeframe, limit)

OUTLOOK.md 2.7 KB Historia Czysty