PROJECT.md 2.0 KB

Project: news-mcp

Goal

Provide a signal-extraction MCP server that converts RSS into deduplicated, enriched news clusters that are easy for agents to use.

Current architecture (v1)

  • FastMCP SSE server mounted at /mcp
  • SQLite cache for clusters + Groq summary caches
  • RSS fetch (breakingthenews.net)
  • v1 dedup via fuzzy title similarity
  • optional Ollama embeddings path for clustering (when NEWS_EMBEDDINGS_ENABLED=true)
  • configurable embedding similarity threshold (NEWS_EMBEDDING_SIMILARITY_THRESHOLD)
  • optional embeddings backfill script for precomputing cluster vectors in SQLite
  • optional merge-analysis script for threshold experiments before any DB rewrite
  • optional merge pass for destructive consolidation after threshold review
  • optional article-dedup cleanup for repeated article variants inside a cluster
  • Groq enrichment (topic/entities/sentiment/keywords)
  • Tools expose semantic queries over cached clusters

MCP tools (current)

  • get_latest_events(topic, limit)
  • get_events_for_entity(entity, limit)
  • get_event_summary(event_id)
  • detect_emerging_topics(limit)
  • get_related_entities(subject, timeframe, limit)

Refresh & caching

  • Background refresh every NEWS_REFRESH_INTERVAL_SECONDS (default 900s)
  • Feed-hash skipping to avoid redundant RSS+Groq work
  • Cluster TTL (NEWS_CLUSTERS_TTL_HOURS via CLUSTERS_TTL_HOURS)
  • Summary caching for get_event_summary

Definition of “committable”

  • Tests pass offline (dedup/storage unit tests)
  • Server exposes tool surface with valid schemas
  • Caching prevents repeated Groq calls for unchanged clusters
  • Embeddings remain optional: Ollama is tried first when enabled, otherwise the heuristic path stays active
  • Embeddings backfill script exists for older cluster rows before the server restart
  • Merge-analysis script exists to inspect candidate cluster pairs at multiple thresholds
  • Merge pass exists for destructive consolidation once thresholds look sane
  • Article-dedup cleanup exists for fixing duplicated article records already in SQLite