# 📰 News MCP Server FastMCP-based MCP server that turns news feeds into **deduplicated, enriched clusters**. ## Quick start ```bash cd news-mcp source .venv/bin/activate pip install -r requirements.txt ./run.sh ``` Default SSE mount (FastMCP): - `http://127.0.0.1:8506/mcp/sse` Health: - `http://127.0.0.1:8506/health` ## What this server provides - Fetches from one or more configured news feeds (`NEWS_FEED_URL` / `NEWS_FEED_URLS`) - Deduplicates articles into clusters (v1 fuzzy title similarity) - Enriches clusters with configurable LLM providers/models (topic/entities/sentiment/keywords) - Applies a case-insensitive entity blacklist after extraction - Caches clusters + LLM fields in SQLite ## Tools (MCP) 1) `get_latest_events(topic, limit, include_articles=false)` - `topic` is a coarse category: `crypto | macro | regulation | ai | other` - when `include_articles=true`, includes `articles[].url` + minimal fields per returned cluster 2) `get_events_for_entity(entity, limit, include_articles=false)` - substring, case-insensitive match over extracted `entities` - uses a shallow recent scan first, then falls back to a wider historical scan if needed - when `include_articles=true`, includes `articles[].url` + minimal fields per returned cluster 3) `get_event_summary(event_id, include_articles=false)` - Groq-written compressed narrative for a given `cluster_id` - when `include_articles=true`, includes the underlying `articles` list (with `url`) from the stored cluster 4) `detect_emerging_topics(limit)` - derives “emerging” signals from recent cached clusters 5) `get_news_sentiment(entity, timeframe)` - aggregates sentiment around an entity from cached enriched clusters 6) `get_related_entities(subject, timeframe, limit)` - entity-only co-occurrence neighborhood: for a given subject entity, returns related entities with aggregated `count`, `avg_importance`, and `sentiment` ### Entity aliasing The server keeps a conservative alias map in `config/entity_aliases.json` for obvious shorthands like `btc -> Bitcoin`, `eth -> Ethereum`, and `ether -> Ethereum`. Keep this map tight; it is meant to reduce false misses, not to rewrite every possible name variant. ## Configuration See `news-mcp/.env`. Key variables: - `NEWS_EXTRACT_PROVIDER`, `NEWS_EXTRACT_MODEL` - `NEWS_SUMMARY_PROVIDER`, `NEWS_SUMMARY_MODEL` - `GROQ_API_KEY`, `OPENAI_API_KEY` - `ENTITY_BLACKLIST` (comma-separated, case-insensitive exact entity match) - `NEWS_PROMPTS_DIR` (override prompt directory) - `NEWS_ENTITY_ALIASES_FILE` (override entity alias JSON file) - `NEWS_FEED_URL` (single feed fallback) - `NEWS_FEED_URLS` (comma-separated feed URLs; overrides `NEWS_FEED_URL`) - `NEWS_REFRESH_INTERVAL_SECONDS` (default 900) - `NEWS_BACKGROUND_REFRESH_ON_START` (default true) - `NEWS_BACKGROUND_REFRESH_ENABLED` (default true) - `NEWS_CLUSTERS_TTL_HOURS` - `GROQ_ENRICH_OTHER_ONLY` (default false; set true for cost control) - `NEWS_EMBEDDINGS_ENABLED` (default false; enables Ollama embeddings for clustering when wired in) - `OLLAMA_BASE_URL` / `OLLAMA_URL` (default `http://127.0.0.1:11434`) - `OLLAMA_EMBEDDING_MODEL` (default `nomic-embed-text`) - `NEWS_EMBEDDING_SIMILARITY_THRESHOLD` (default `0.885`; used when embeddings are enabled) When embeddings are enabled, news-mcp tries Ollama first and falls back to the existing heuristic clustering path if Ollama is unavailable. ## Live extraction smoke test Run a standardized, fabricated extraction test against the currently configured provider/model: ```bash ./live_tests.sh ``` The script reads `./.env`, selects OpenAI or Groq based on the configured keys, and checks that the core expected entities are extracted. ## mcporter examples (all news-mcp calls) Use your existing config path: ```bash CONFIG=/home/lucky/.openclaw/workspace/config/mcporter.json ``` Inspect server + tools: ```bash mcporter --config "$CONFIG" list news --schema ``` ### 1) Latest events ```bash mcporter --config "$CONFIG" call news.get_latest_events topic=crypto limit=10 mcporter --config "$CONFIG" call news.get_latest_events topic=macro limit=5 ``` ### 2) Events for an entity ```bash mcporter --config "$CONFIG" call news.get_events_for_entity entity=Bitcoin limit=10 mcporter --config "$CONFIG" call news.get_events_for_entity entity=ETH limit=10 mcporter --config "$CONFIG" call news.get_events_for_entity entity=ETF limit=10 ``` ### 3) Event summary (by cluster_id) ```bash # First fetch an event id mcporter --config "$CONFIG" call news.get_latest_events topic=crypto limit=1 # Then summarize it mcporter --config "$CONFIG" call news.get_event_summary event_id= ``` ### 4) Emerging topics ```bash mcporter --config "$CONFIG" call news.detect_emerging_topics limit=10 ``` ### 5) Sentiment for an entity ```bash mcporter --config "$CONFIG" call news.get_news_sentiment entity=Bitcoin timeframe=24h mcporter --config "$CONFIG" call news.get_news_sentiment entity=Ethereum timeframe=72h ``` ### 6) Related entities (co-occurrence neighborhood) ```bash mcporter --config "$CONFIG" call news.get_related_entities subject=iran timeframe=24h limit=8 mcporter --config "$CONFIG" call news.get_related_entities subject="iran war" timeframe=3d limit=8 ``` ## Blacklist enforcement (optional back-clean) If you change `ENTITY_BLACKLIST`, existing clusters in `news.sqlite` may still contain entities/keywords that would now be filtered at extraction time. For one-off cleanup, run: ```bash ./.venv/bin/python scripts/enforce_news_blacklist.py --dry-run --limit 200 ./.venv/bin/python scripts/enforce_news_blacklist.py --limit 1000 ``` This enforces `ENTITY_BLACKLIST` inside stored clusters by removing matching entries from `payload.entities` and `payload.keywords` and (if needed) setting `payload.topic = "other"`. ## Embeddings backfill (optional) If `NEWS_EMBEDDINGS_ENABLED=true`, you can precompute cluster embeddings for older rows before restarting the server: ```bash ./.venv/bin/python scripts/backfill_news_embeddings.py --dry-run --limit 200 ./.venv/bin/python scripts/backfill_news_embeddings.py --limit 1000 ``` This stores a cluster-level `embedding` and `embedding_model` inside the SQLite payload so the Ollama-first clustering path has data ready to use. ## Embedding merge analysis (optional) To inspect likely cluster merges at different cosine thresholds without writing anything back to the DB: ```bash ./.venv/bin/python scripts/analyze_cluster_embedding_merges.py --thresholds 0.82 0.85 0.88 --limit 200 ``` This prints candidate pairs per threshold so you can decide whether a merge script is worth adding next. ## Embedding merge pass (optional, destructive) After inspecting the analysis output, you can merge clusters above a chosen threshold. Start with dry-run: ```bash ./.venv/bin/python scripts/merge_cluster_embeddings.py --dry-run --threshold 0.90 ``` If the groupings look right, run wet: ```bash ./.venv/bin/python scripts/merge_cluster_embeddings.py --threshold 0.90 ``` This merges embedding-similar clusters within the same topic and removes the absorbed duplicates from SQLite. ## Article dedup cleanup (optional) Some stored clusters may contain repeated article entries for the same underlying article id / URL path. To clean existing rows: ```bash ./.venv/bin/python scripts/dedup_articles_in_clusters.py --dry-run ./.venv/bin/python scripts/dedup_articles_in_clusters.py ``` The live clustering path also deduplicates article entries when new data comes in. ```