# Project: news-mcp ## Goal Provide a signal-extraction MCP server that converts RSS into **deduplicated, enriched news clusters** that are easy for agents to use. ## Current architecture (v1) - FastMCP SSE server mounted at `/mcp` - SQLite cache for clusters + Groq summary caches - RSS fetch (breakingthenews.net) - v1 dedup via fuzzy title similarity - optional Ollama embeddings path for clustering (when `NEWS_EMBEDDINGS_ENABLED=true`) - configurable embedding similarity threshold (`NEWS_EMBEDDING_SIMILARITY_THRESHOLD`) - optional embeddings backfill script for precomputing cluster vectors in SQLite - optional merge-analysis script for threshold experiments before any DB rewrite - optional merge pass for destructive consolidation after threshold review - optional article-dedup cleanup for repeated article variants inside a cluster - Groq enrichment (topic/entities/sentiment/keywords) - Tools expose semantic queries over cached clusters ## MCP tools (current) - `get_latest_events(topic, limit)` - `get_events_for_entity(entity, limit)` - `get_events_for_entity(entity, limit, timeframe)` - `get_event_summary(event_id)` - `detect_emerging_topics(limit)` - `get_related_entities(subject, timeframe, limit)` ## Refresh & caching - Background refresh every `NEWS_REFRESH_INTERVAL_SECONDS` (default 900s) - Feed-hash skipping to avoid redundant RSS+Groq work - Cluster TTL (`NEWS_CLUSTERS_TTL_HOURS` via `CLUSTERS_TTL_HOURS`) - Summary caching for `get_event_summary` ## Definition of “committable” - Tests pass offline (dedup/storage unit tests) - Server exposes tool surface with valid schemas - Caching prevents repeated Groq calls for unchanged clusters - Embeddings remain optional: Ollama is tried first when enabled, otherwise the heuristic path stays active - Embeddings backfill script exists for older cluster rows before the server restart - Merge-analysis script exists to inspect candidate cluster pairs at multiple thresholds - Merge pass exists for destructive consolidation once thresholds look sane - Article-dedup cleanup exists for fixing duplicated article records already in SQLite - Entity lookup now respects timeframe as the scan window, with limit acting as a cap