OUTLOOK.md 2.7 KB

News MCP Server — Project Vision & Status

Current version: v0.4.0 — see PROJECT.md for architecture details.

Core Design Principle

Raw news is useless to agents. Processed news is powerful.

  • ✅ Clusters are the unit of truth, not raw articles
  • ✅ 100 articles → 5–10 clusters, with entities, sentiment, importance
  • ✅ SQL-level filtering by time, entity, keyword — no full-table JSON parsing

Architecture (v0.4.0)

See PROJECT.md for full schema and architecture. Key points:

  • payload_ts generated column for indexed time-range queries
  • cluster_entities and cluster_keywords junction tables for O(log n) entity/keyword search
  • MCP tools and Dashboard REST API both query the same SQLite DB
  • Docker deployment on thinkcenter-2 (192.168.0.200:8506)

Tool Surface

Tool Status Notes
get_latest_events Time-filtered via payload_ts SQL index
get_events_for_entity ⚠️ MCP tool still uses Python-side entity matching (top-N limit). Dashboard uses SQL junction table. Known design flaw.
get_event_summary LLM-written narrative
detect_emerging_topics entity/keyword/phrase signal types, velocity scoring
get_news_sentiment ⚠️ Same Python-side entity matching limitation as get_events_for_entity
get_related_recent_entities Co-occurrence + Google Trends blend
get_feeds / toggle_feed Feed management
detect_emerging_topics(around=...) Scope to entity neighborhood

Known Design Issues

Two Stores (see PROJECT.md § "Design Flaw")

SQLiteClusterStore and DashboardStore are parallel copies. Only DashboardStore was updated with junction-table entity search. MCP tools still use Python-side entity matching with a row limit. Proposed fix: collapse into single data access layer.

MCP Tool Entity Search

get_events_for_entity and get_news_sentiment fetch top-N clusters by time then filter entities in Python. Entities in clusters beyond the limit are missed. Fix: use junction table get_clusters_by_entity().

Backfill Scripts

After deploying junction table schema changes:

docker exec -it news-mcp python3 scripts/backfill_junction_tables.py

For timestamp normalization (already run on live server):

docker exec -it news-mcp python3 scripts/normalize_cluster_timestamps.py

Future Directions (v0.5.0+)

"Emerging entity graph over time"

  • Collapse detect_emerging_topics() results into canonical entity nodes
  • Build weighted edges from co-occurrence in recent clusters
  • Infer communities (story neighborhoods)
  • Track graph evolution across refresh windows (node momentum, edge strength changes)
  • Agent tool: get_emerging_entity_graph(timeframe, limit)