Açıklama Yok

Lukas Goldschmidt 8b935134d7 news-mcp: add optional ollama embedding clustering 1 ay önce
config 57bb07fdd6 Improve entity lookup fallback and docs 1 ay önce
news_mcp 8b935134d7 news-mcp: add optional ollama embedding clustering 1 ay önce
prompts cdd52b9f1e Refactor news LLM extraction pipeline 1 ay önce
scripts faf04d5b2f news-mcp: add script to enforce ENTITY_BLACKLIST in stored clusters 1 ay önce
.env.example 8b935134d7 news-mcp: add optional ollama embedding clustering 1 ay önce
.gitignore 13f8f1d5ab Initialize news-mcp scaffold 1 ay önce
OUTLOOK.md 47a3eaff68 config: add ollama embedding env vars and docs 1 ay önce
PROJECT.md 8b935134d7 news-mcp: add optional ollama embedding clustering 1 ay önce
README.md 8b935134d7 news-mcp: add optional ollama embedding clustering 1 ay önce
killserver.sh a4096b9dfb news-mcp: cleanup feed naming, improve ingestion logs, add mcporter README examples 1 ay önce
live_tests.sh cdd52b9f1e Refactor news LLM extraction pipeline 1 ay önce
requirements.txt 600fcdbd55 Polish news-mcp docs + add emerging topics and tests 1 ay önce
restart.sh 13f8f1d5ab Initialize news-mcp scaffold 1 ay önce
run.sh 13f8f1d5ab Initialize news-mcp scaffold 1 ay önce
test_embedding_support.py c984d1f589 tests: add embedding support guards for clustering 1 ay önce
test_news_mcp.py 64bf700047 Release v0.1.0 1 ay önce
tests.sh 600fcdbd55 Polish news-mcp docs + add emerging topics and tests 1 ay önce

README.md

📰 News MCP Server

FastMCP-based MCP server that turns news feeds into deduplicated, enriched clusters.

Quick start

cd news-mcp
source .venv/bin/activate
pip install -r requirements.txt
./run.sh

Default SSE mount (FastMCP):

  • http://127.0.0.1:8506/mcp/sse

Health:

  • http://127.0.0.1:8506/health

What this server provides

  • Fetches from one or more configured news feeds (NEWS_FEED_URL / NEWS_FEED_URLS)
  • Deduplicates articles into clusters (v1 fuzzy title similarity)
  • Enriches clusters with configurable LLM providers/models (topic/entities/sentiment/keywords)
  • Applies a case-insensitive entity blacklist after extraction
  • Caches clusters + LLM fields in SQLite

Tools (MCP)

1) get_latest_events(topic, limit, include_articles=false)

  • topic is a coarse category: crypto | macro | regulation | ai | other
  • when include_articles=true, includes articles[].url + minimal fields per returned cluster

2) get_events_for_entity(entity, limit, include_articles=false)

  • substring, case-insensitive match over extracted entities
  • uses a shallow recent scan first, then falls back to a wider historical scan if needed
  • when include_articles=true, includes articles[].url + minimal fields per returned cluster

3) get_event_summary(event_id, include_articles=false)

  • Groq-written compressed narrative for a given cluster_id
  • when include_articles=true, includes the underlying articles list (with url) from the stored cluster

4) detect_emerging_topics(limit)

  • derives “emerging” signals from recent cached clusters

5) get_news_sentiment(entity, timeframe)

  • aggregates sentiment around an entity from cached enriched clusters

6) get_related_entities(subject, timeframe, limit)

  • entity-only co-occurrence neighborhood: for a given subject entity, returns related entities with aggregated count, avg_importance, and sentiment

Entity aliasing

The server keeps a conservative alias map in config/entity_aliases.json for obvious shorthands like btc -> Bitcoin, eth -> Ethereum, and ether -> Ethereum. Keep this map tight; it is meant to reduce false misses, not to rewrite every possible name variant.

Configuration

See news-mcp/.env. Key variables:

  • NEWS_EXTRACT_PROVIDER, NEWS_EXTRACT_MODEL
  • NEWS_SUMMARY_PROVIDER, NEWS_SUMMARY_MODEL
  • GROQ_API_KEY, OPENAI_API_KEY
  • ENTITY_BLACKLIST (comma-separated, case-insensitive exact entity match)
  • NEWS_PROMPTS_DIR (override prompt directory)
  • NEWS_ENTITY_ALIASES_FILE (override entity alias JSON file)
  • NEWS_FEED_URL (single feed fallback)
  • NEWS_FEED_URLS (comma-separated feed URLs; overrides NEWS_FEED_URL)
  • NEWS_REFRESH_INTERVAL_SECONDS (default 900)
  • NEWS_BACKGROUND_REFRESH_ON_START (default true)
  • NEWS_BACKGROUND_REFRESH_ENABLED (default true)
  • NEWS_CLUSTERS_TTL_HOURS
  • GROQ_ENRICH_OTHER_ONLY (default false; set true for cost control)
  • NEWS_EMBEDDINGS_ENABLED (default false; enables Ollama embeddings for clustering when wired in)
  • OLLAMA_BASE_URL / OLLAMA_URL (default http://127.0.0.1:11434)
  • OLLAMA_EMBEDDING_MODEL (default nomic-embed-text)

When embeddings are enabled, news-mcp tries Ollama first and falls back to the existing heuristic clustering path if Ollama is unavailable.

Live extraction smoke test

Run a standardized, fabricated extraction test against the currently configured provider/model:

./live_tests.sh

The script reads ./.env, selects OpenAI or Groq based on the configured keys, and checks that the core expected entities are extracted.

mcporter examples (all news-mcp calls)

Use your existing config path:

CONFIG=/home/lucky/.openclaw/workspace/config/mcporter.json

Inspect server + tools:

mcporter --config "$CONFIG" list news --schema

1) Latest events

mcporter --config "$CONFIG" call news.get_latest_events topic=crypto limit=10
mcporter --config "$CONFIG" call news.get_latest_events topic=macro limit=5

2) Events for an entity

mcporter --config "$CONFIG" call news.get_events_for_entity entity=Bitcoin limit=10
mcporter --config "$CONFIG" call news.get_events_for_entity entity=ETH limit=10
mcporter --config "$CONFIG" call news.get_events_for_entity entity=ETF limit=10

3) Event summary (by cluster_id)

# First fetch an event id
mcporter --config "$CONFIG" call news.get_latest_events topic=crypto limit=1

# Then summarize it
mcporter --config "$CONFIG" call news.get_event_summary event_id=<cluster_id>

4) Emerging topics

mcporter --config "$CONFIG" call news.detect_emerging_topics limit=10

5) Sentiment for an entity

mcporter --config "$CONFIG" call news.get_news_sentiment entity=Bitcoin timeframe=24h
mcporter --config "$CONFIG" call news.get_news_sentiment entity=Ethereum timeframe=72h

6) Related entities (co-occurrence neighborhood)

mcporter --config "$CONFIG" call news.get_related_entities subject=iran timeframe=24h limit=8
mcporter --config "$CONFIG" call news.get_related_entities subject="iran war" timeframe=3d limit=8

Blacklist enforcement (optional back-clean)

If you change ENTITY_BLACKLIST, existing clusters in news.sqlite may still contain entities/keywords that would now be filtered at extraction time.

For one-off cleanup, run:

./.venv/bin/python scripts/enforce_news_blacklist.py --dry-run --limit 200
./.venv/bin/python scripts/enforce_news_blacklist.py --limit 1000

This enforces ENTITY_BLACKLIST inside stored clusters by removing matching entries from payload.entities and payload.keywords and (if needed) setting payload.topic = "other". ```