Current version: v0.3.1 — see RELEASE_NOTES.md for changelog.
Provide structured, deduplicated, topic-aware news signals that an agent can use for reasoning about:
👉 Not a feed reader 👉 Not a headline dump 👉 A signal extraction layer
Raw news is useless to agents. Processed news is powerful.
sources/)Mix of:
Examples:
Runs periodically (e.g. every few minutes)
Steps:
normalize fields:
Same story appears across many sources.
Cluster articles by similarity:
Methods:
{
"cluster_id": "...",
"headline": "Canonical headline",
"articles": [...],
"sources": ["Reuters", "Bloomberg"],
"first_seen": "...",
"last_updated": "..."
}
👉 This is your core unit of truth, not individual articles
Adds meaning to clusters.
Examples:
👉 Keep this simple in v1 (don’t over-engineer NLP)
Heuristic:
You need short-term memory:
Optional:
we have a choice of storage possibilites including qdrant, postgresql, couchdb
Keep tools high-level and semantic
get_latest_events“What is happening right now?”
Input:
{
"topic": "crypto",
"limit": 5,
"include_articles": false
}
Output:
[
{
"headline": "...",
"summary": "...",
"entities": ["BTC"],
"sentiment": "positive",
"importance": 0.82,
"sources": ["Reuters", "CoinDesk"],
"timestamp": "...",
"articles": [
{
"title": "...",
"url": "...",
"source": "Reuters",
"timestamp": "..."
}
]
}
]
get_events_for_entity“What’s happening with X?”
{
"entity": "BTC",
"include_articles": false
}
👉 filters clusters by entity
Optional:
include_articles to include article title/url/source/timestamp in the payloadget_event_summary“Explain this event clearly”
{
"event_id": "cluster_id",
"include_articles": false
}
Output:
👉 This is where you compress multiple articles into one clean narrative
get_news_sentiment“What’s the tone around X?”
{
"entity": "BTC",
"timeframe": "24h"
}
Output:
{
"sentiment": "positive",
"score": 0.64,
"article_count": 42
}
detect_emerging_topics (very valuable)“What is gaining attention?”
Output:
[
{
"topic": "Ethereum ETF",
"trend_score": 0.91,
"related_entities": ["ETH", "BlackRock", "SEC"],
"count": 8,
"avg_importance": 0.17
}
]
get_related_entities“What entities tend to appear with X?”
{
"subject": "Iran",
"timeframe": "24h",
"limit": 10
}
Output:
[
{
"entity": "United States",
"count": 5,
"avg_importance": 0.11,
"sentiment": "negative",
"score": -0.2
}
]
👉 entity-only co-occurrence neighborhood for real-time sense-making
Avoid:
❌ Bad:
get_raw_articles()
👉 This destroys signal quality for agents
Clustering is the unit of truth, not individual articles.
Signal cascade (cheapest first, short-circuit on match):
Each new article is compared against all articles in a candidate cluster; the best signal across all members is used.
Stable cluster IDs: sha1(topic | min_article_key) — the same set of articles always maps to the same ID regardless of which article arrived first or which polling cycle created the cluster.
Cross-cycle merge: the poller loads recent clusters from the DB (controlled by NEWS_CLUSTER_MAX_AGE_HOURS, default 4h) and seeds them as merge targets before clustering. New articles can merge into clusters from previous polling cycles.
Orphan merge: a post-clustering Union-Find pass merges clusters that share article keys, catching cases where articles about the same event didn't match during the main loop.
Planned runtime order:
NEWS_EMBEDDINGS_ENABLED=true, try Ollama embeddings firstYour MCP should:
This MCP becomes powerful when combined with:
👉 News MCP provides:
causal narratives
Each tool should answer:
“What is happening, and why should I care?”
👉 Only then expose tools
Crypto MCP gives you facts News MCP gives you meaning
But only if you:
btc and trump still worksThe first version is now effectively a usable baseline. The remaining work for v0.1.x is mostly polish:
Right now detect_emerging_topics() returns a flat list of emerging topics/entities.
Next-level idea: turn it into an entity graph that an agent can reason over.
Core concept
iran, israel, donald_trump, strait_of_hormuz, etc.)Over time (the important part)
Suggested output for an eventual agent tool
get_emerging_entity_graph(timeframe, limit) returning:
This needs extra time to become a real usable MCP tool, so it’s intentionally captured here for later execution.
Normalization layer
Wildcard blacklist support
Emerging signal quality
Entity/time tracking and replay (future capability)
The endgame is not just “news search”, but a light narrative memory system:
That should stay in mind while keeping the current implementation simple.