RELEASE_NOTES.md 6.5 KB

news-mcp release notes

v0.3.1 — stable cluster IDs, cross-cycle merge, orphan dedup, multi-article signals

Highlights

  • Emerging topics rewrite (detect_emerging_topics): complete rewrite with 5 new capabilities:
    • Timeframe parameter ("4h", "24h", "3d", etc.) — controls lookback window instead of always using DEFAULT_LOOKBACK_HOURS
    • Velocity scoring — splits the window into recent vs prior half, computes velocity = (recent + 0.5) / (prior + 0.5). Entities accelerating now vs before score much higher than steady-state ones
    • Composed trend score — replaces the flat 0.25 + 0.40*imp + 0.08*count formula with a weighted combination of: velocity (35%), recency concentration (25%), source diversity (15%), sustained presence across time buckets (10%), importance (15%)
    • Topic scoping — optional topic parameter filters to a specific category before scoring
    • Entity neighborhood scoping — optional around parameter only returns entities co-occurring with the specified entity (e.g. around="Bitcoin" finds what's emerging in Bitcoin's neighborhood)
    • Richer output — each result now includes velocity, recent_count, prior_count, source_count alongside trend_score and related_entities
  • Multi-article signal comparison: _signals() now compares a new article against ALL articles in a candidate cluster (not just the seed). The best title and jaccard scores across all cluster members are used for matching.
  • Stable cluster IDs: cluster_id = sha1(topic | min_article_key) instead of sha1(topic | seed_title). The same set of articles always maps to the same ID regardless of processing order. This eliminates duplicate clusters for the same event.
  • Cross-cycle merge: the poller loads recent clusters from the DB (controlled by NEWS_CLUSTER_MAX_AGE_HOURS, default 4h) and seeds them as merge targets before clustering. New articles in poll N+1 can merge into clusters created in poll N.
  • Orphan merge: post-clustering Union-Find pass detects and merges clusters that share article keys. Catches cases where articles about the same event didn't match during the main loop (e.g. embeddings temporarily unavailable).
  • Cascade match via _is_match(): unified signal evaluation — cosine → title → jaccard → consensus. Short-circuits on first passing signal. Configurable title_threshold parameter.
  • Cluster embedding updated on merge: when a new article merges into an existing cluster, the cluster's embedding is updated to the new article's vector, improving subsequent embedding-based matching.
  • NEWS_CLUSTER_MAX_AGE_HOURS env var (default 4): controls the cross-cycle merge window. Set to 0 to disable cross-cycle merge.

Migration notes

  • No database schema changes.
  • Existing cluster IDs will change format on the next polling cycle (old rows are updated in-place via ON CONFLICT(cluster_id) once the new ID is computed). Transient enrichment cache misses may occur for one cycle.
  • Old duplicate clusters (same event, different IDs) will age out via pruning. To clean them immediately, run the article dedup cleanup script.
  • detect_emerging_topics output shape changed: count replaced by recent_count + prior_count, new fields velocity and source_count. Clients using the old count field need to switch to recent_count.

v0.3.0 — concurrent polling, enrichment retry, all-topics default

Highlights

  • Async concurrent RSS fetching — all feeds fetched in parallel with asyncio.gather + httpx, bounded by semaphore (default 10 concurrent). Previously sequential: ~40 feeds × 2-5s each = minutes. Now ~10 at a time.
  • Concurrent Ollama embeddings — embedding vectors for all articles pre-computed in parallel before the clustering loop (bounded by semaphore, default 4). Previously one-by-one during clustering.
  • Concurrent LLM enrichment — entity extraction / topic classification / sentiment calls run concurrently across all clusters, bounded by per-provider semaphore:
    • openrouter: 2 free tier
    • openai: 5
    • groq: 8
    • Override via NEWS_LLM_CONCURRENCY_<PROVIDER> env var
  • Per-cluster retry with backoff — failed LLM calls retry up to 3 times (2s, 4s, 8s backoff) before marking the cluster as failed. Failed clusters are automatically retried on the next polling cycle.
  • Cross-cycle failure recoveryget_failed_enrichment_clusters() queries the DB for clusters with enrichment_failed_at set but below the retry threshold, so transient failures self-heal.
  • LLM provider retries_call_groq and _call_openai now have the same retry logic as _call_openrouter (2 retries, exponential backoff on 429/500/502/503, empty response handling).
  • get_latest_events() default changed — omitting topic now returns clusters from all topics instead of defaulting to "crypto". Pass topic="crypto" (or macro/regulation/ai/other) to filter.
  • Configuration — all concurrency limits configurable via env vars; see config.py for NEWS_RSS_MAX_CONCURRENCY, NEWS_OLLAMA_MAX_CONCURRENCY, NEWS_LLM_CONCURRENCY_<PROVIDER>.

Migration notes

  • No database schema changes.
  • If you relied on get_latest_events() without a topic argument returning only crypto clusters, pass topic="crypto" explicitly.
  • Concurrency defaults are conservative for free-rate-limit providers. Tune up via env vars if you have paid plans.

v0.2.0 — embedding-aware clustering and richer agent tools

Highlights

  • Optional Ollama embedding path for clustering (NEWS_EMBEDDINGS_ENABLED=true)
  • Configurable Ollama base URL and embedding model
  • Tunable embedding similarity threshold (NEWS_EMBEDDING_SIMILARITY_THRESHOLD)
  • New agent tool: get_related_entities(subject, timeframe, limit)
  • Optional article payloads for get_latest_events, get_events_for_entity, and get_event_summary
  • Improved emerging-topic scoring with co-occurrence and importance weighting
  • Blacklist enforcement back-clean script for stored clusters
  • Embedding backfill script for older clusters
  • Embedding similarity analysis script for threshold tuning
  • Embedding-based merge script with dry-run and wet modes
  • Article dedup cleanup for repeated article variants inside clusters

Notes

  • Ollama embeddings are tried first when enabled; heuristic clustering remains the fallback.
  • The merge script is intentionally destructive and should be preceded by a dry run.
  • The article dedup cleanup script is safe to run after ingestion or on the historical dataset.