OUTLOOK.md 2.3 KB

News MCP Server — Project Vision & Status

Current version: v0.5.0

Core Design Principle

Raw news is useless to agents. Processed news is powerful.

  • Clusters are the unit of truth, not raw articles
  • 100 articles → 5–10 clusters, with entities, sentiment, importance
  • SQL-level filtering by time, entity, keyword — no full-table JSON parsing
  • Three-layer dedup: feed hash → article URL → content hash

What's new in v0.5.0

Content-change detection

Articles that update in-place at the same URL (e.g. FT's "More to come..." → real content) are now detected via content_hash comparison in seen_articles. Changed articles are re-clustered and re-enriched automatically.

Three-layer dedup

  1. Feed hash — skip entire unchanged feeds (O(1))
  2. seen_articles — skip already-processed URLs
  3. Content hash — detect in-place updates, re-process changed articles

Clustering improvements

  • Title threshold lowered: 0.87 → 0.75
  • Dual-signal tier: title ≥ 0.55 + jaccard ≥ 0.25 → merge
  • All thresholds configurable via dashboard Config page

Dashboard Config page

  • All tunable parameters in one place, grouped by category
  • Inline editing, source tracking (env/api/default)
  • Reset to defaults button
  • REST API: GET/POST /api/v1/config

Debug tool

  • debug_dedup(url, title?) — MCP tool to inspect dedup decisions, similarity signals

Architecture

See PROJECT.md for full schema and architecture details.

Tool Surface

Tool Status Notes
get_latest_events Time-filtered via payload_ts SQL index
get_events_for_entity SQL junction-table search
get_event_summary LLM-written narrative
detect_emerging_topics entity/keyword/phrase signal types
get_news_sentiment SQL junction-table search
get_related_recent_entities Co-occurrence + Google Trends blend
get_feeds / toggle_feed Feed management
debug_dedup Inspect dedup decisions (new in v0.5.0)

Deployment

Docker on thinkcenter-2 (192.168.0.200:8506):

cd ~/news-mcp && git pull && docker-compose up -d news-mcp

After schema changes, run backfill:

docker exec -it news-mcp python3 scripts/backfill_seen_articles.py