Async concurrent RSS fetching — all feeds fetched in parallel with asyncio.gather + httpx, bounded by semaphore (default 10 concurrent). Previously sequential: ~40 feeds × 2-5s each = minutes. Now ~10 at a time.
Concurrent Ollama embeddings — embedding vectors for all articles pre-computed in parallel before the clustering loop (bounded by semaphore, default 4). Previously one-by-one during clustering.
Concurrent LLM enrichment — entity extraction / topic classification / sentiment calls run concurrently across all clusters, bounded by per-provider semaphore:
openrouter: 2 free tier
openai: 5
groq: 8
Override via NEWS_LLM_CONCURRENCY_<PROVIDER> env var
Per-cluster retry with backoff — failed LLM calls retry up to 3 times (2s, 4s, 8s backoff) before marking the cluster as failed. Failed clusters are automatically retried on the next polling cycle.
Cross-cycle failure recovery — get_failed_enrichment_clusters() queries the DB for clusters with enrichment_failed_at set but below the retry threshold, so transient failures self-heal.
LLM provider retries — _call_groq and _call_openai now have the same retry logic as _call_openrouter (2 retries, exponential backoff on 429/500/502/503, empty response handling).
get_latest_events() default changed — omitting topic now returns clusters from all topics instead of defaulting to "crypto". Pass topic="crypto" (or macro/regulation/ai/other) to filter.
Configuration — all concurrency limits configurable via env vars; see config.py for NEWS_RSS_MAX_CONCURRENCY, NEWS_OLLAMA_MAX_CONCURRENCY, NEWS_LLM_CONCURRENCY_<PROVIDER>.
Migration notes
No database schema changes.
If you relied on get_latest_events() without a topic argument returning only crypto clusters, pass topic="crypto" explicitly.
Concurrency defaults are conservative for free-rate-limit providers. Tune up via env vars if you have paid plans.
v0.2.0 — embedding-aware clustering and richer agent tools
Highlights
Optional Ollama embedding path for clustering (NEWS_EMBEDDINGS_ENABLED=true)