Bez popisu

Lukas Goldschmidt 8317f5afd9 docs: document blacklist enforcement script před 1 měsícem
config 57bb07fdd6 Improve entity lookup fallback and docs před 1 měsícem
news_mcp 60b02963b7 detect_emerging_topics: related_entities via co-occurrence před 1 měsícem
prompts cdd52b9f1e Refactor news LLM extraction pipeline před 1 měsícem
scripts faf04d5b2f news-mcp: add script to enforce ENTITY_BLACKLIST in stored clusters před 1 měsícem
.env.example 64bf700047 Release v0.1.0 před 1 měsícem
.gitignore 13f8f1d5ab Initialize news-mcp scaffold před 1 měsícem
OUTLOOK.md 1e5f9c6936 Clean stored news summaries of HTML to match ingestion před 1 měsícem
PROJECT.md 600fcdbd55 Polish news-mcp docs + add emerging topics and tests před 1 měsícem
README.md 8317f5afd9 docs: document blacklist enforcement script před 1 měsícem
killserver.sh a4096b9dfb news-mcp: cleanup feed naming, improve ingestion logs, add mcporter README examples před 1 měsícem
live_tests.sh cdd52b9f1e Refactor news LLM extraction pipeline před 1 měsícem
requirements.txt 600fcdbd55 Polish news-mcp docs + add emerging topics and tests před 1 měsícem
restart.sh 13f8f1d5ab Initialize news-mcp scaffold před 1 měsícem
run.sh 13f8f1d5ab Initialize news-mcp scaffold před 1 měsícem
test_news_mcp.py 64bf700047 Release v0.1.0 před 1 měsícem
tests.sh 600fcdbd55 Polish news-mcp docs + add emerging topics and tests před 1 měsícem

README.md

📰 News MCP Server

FastMCP-based MCP server that turns news feeds into deduplicated, enriched clusters.

Quick start

cd news-mcp
source .venv/bin/activate
pip install -r requirements.txt
./run.sh

Default SSE mount (FastMCP):

  • http://127.0.0.1:8506/mcp/sse

Health:

  • http://127.0.0.1:8506/health

What this server provides

  • Fetches from one or more configured news feeds (NEWS_FEED_URL / NEWS_FEED_URLS)
  • Deduplicates articles into clusters (v1 fuzzy title similarity)
  • Enriches clusters with configurable LLM providers/models (topic/entities/sentiment/keywords)
  • Applies a case-insensitive entity blacklist after extraction
  • Caches clusters + LLM fields in SQLite

Tools (MCP)

1) get_latest_events(topic, limit)

  • topic is a coarse category: crypto | macro | regulation | ai | other

Optional boolean:

  • include_articles (default: false) — when true, includes articles[].url + minimal fields per returned cluster.

2) get_events_for_entity(entity, limit)

  • substring, case-insensitive match over extracted entities
  • uses a shallow recent scan first, then falls back to a wider historical scan if needed

Optional boolean:

  • include_articles (default: false) — when true, includes articles[].url + minimal fields per returned cluster.

Entity aliasing

The server keeps a conservative alias map in config/entity_aliases.json for obvious shorthands like btc -> Bitcoin, eth -> Ethereum, and ether -> Ethereum. Keep this map tight; it is meant to reduce false misses, not to rewrite every possible name variant.

3) get_event_summary(event_id)

  • Groq-written compressed narrative for a given cluster_id

Optional boolean:

  • include_articles (default: false) — when true, includes the underlying articles list (with url) from the stored cluster.

4) detect_emerging_topics(limit)

  • derives “emerging” signals from recent cached clusters

5) get_news_sentiment(entity, timeframe)

  • aggregates sentiment around an entity from cached enriched clusters

Configuration

See news-mcp/.env. Key variables:

  • NEWS_EXTRACT_PROVIDER, NEWS_EXTRACT_MODEL
  • NEWS_SUMMARY_PROVIDER, NEWS_SUMMARY_MODEL
  • GROQ_API_KEY, OPENAI_API_KEY
  • ENTITY_BLACKLIST (comma-separated, case-insensitive exact entity match)
  • NEWS_PROMPTS_DIR (override prompt directory)
  • NEWS_ENTITY_ALIASES_FILE (override entity alias JSON file)
  • NEWS_FEED_URL (single feed fallback)
  • NEWS_FEED_URLS (comma-separated feed URLs; overrides NEWS_FEED_URL)
  • NEWS_REFRESH_INTERVAL_SECONDS (default 900)
  • NEWS_BACKGROUND_REFRESH_ON_START (default true)
  • NEWS_BACKGROUND_REFRESH_ENABLED (default true)
  • NEWS_CLUSTERS_TTL_HOURS
  • GROQ_ENRICH_OTHER_ONLY (default false; set true for cost control)

Live extraction smoke test

Run a standardized, fabricated extraction test against the currently configured provider/model:

./live_tests.sh

The script reads ./.env, selects OpenAI or Groq based on the configured keys, and checks that the core expected entities are extracted.

mcporter examples (all news-mcp calls)

Use your existing config path:

CONFIG=/home/lucky/.openclaw/workspace/config/mcporter.json

Inspect server + tools:

mcporter --config "$CONFIG" list news --schema

1) Latest events

mcporter --config "$CONFIG" call news.get_latest_events topic=crypto limit=10
mcporter --config "$CONFIG" call news.get_latest_events topic=macro limit=5

2) Events for an entity

mcporter --config "$CONFIG" call news.get_events_for_entity entity=Bitcoin limit=10
mcporter --config "$CONFIG" call news.get_events_for_entity entity=ETH limit=10
mcporter --config "$CONFIG" call news.get_events_for_entity entity=ETF limit=10

3) Event summary (by cluster_id)

# First fetch an event id
mcporter --config "$CONFIG" call news.get_latest_events topic=crypto limit=1

# Then summarize it
mcporter --config "$CONFIG" call news.get_event_summary event_id=<cluster_id>

4) Emerging topics

mcporter --config "$CONFIG" call news.detect_emerging_topics limit=10

5) Sentiment for an entity

mcporter --config "$CONFIG" call news.get_news_sentiment entity=Bitcoin timeframe=24h
mcporter --config "$CONFIG" call news.get_news_sentiment entity=Ethereum timeframe=72h

## Blacklist enforcement (optional back-clean)

If you change `ENTITY_BLACKLIST`, existing clusters in `news.sqlite` may still
contain entities/keywords that would now be filtered at extraction time.

For one-off cleanup, run:

bash ./.venv/bin/python scripts/enforce_news_blacklist.py --dry-run --limit 200 ./.venv/bin/python scripts/enforce_news_blacklist.py --limit 1000


This enforces `ENTITY_BLACKLIST` inside stored clusters by removing matching
entries from `payload.entities` and `payload.keywords` and (if needed) setting
`payload.topic = "other"`.