|
@@ -3,7 +3,19 @@
|
|
|
## Goal
|
|
## Goal
|
|
|
Provide a signal-extraction MCP server that converts RSS into **deduplicated, enriched news clusters** that are easy for agents to use.
|
|
Provide a signal-extraction MCP server that converts RSS into **deduplicated, enriched news clusters** that are easy for agents to use.
|
|
|
|
|
|
|
|
-## Current architecture (v1)
|
|
|
|
|
|
|
+## Current architecture (v2)
|
|
|
|
|
+- FastMCP SSE server mounted at `/mcp`
|
|
|
|
|
+- SQLite cache for clusters + entity metadata + feed state + LLM summary caches
|
|
|
|
|
+- Concurrent RSS fetch (async `asyncio.gather` + `httpx`, bounded semaphore)
|
|
|
|
|
+- Composite dedup via fuzzy title + token Jaccard + Ollama embedding cosine
|
|
|
|
|
+- Concurrent Ollama embeddings (pre-computed before clustering loop)
|
|
|
|
|
+- Concurrent LLM enrichment (entity extraction, topic classification, sentiment) with per-provider semaphore
|
|
|
|
|
+- Per-cluster retry with exponential backoff (3 retries, 2s/4s/8s) + cross-cycle failure recovery
|
|
|
|
|
+- All concurrency limits configurable via env vars (`NEWS_RSS_MAX_CONCURRENCY`, `NEWS_OLLAMA_MAX_CONCURRENCY`, `NEWS_LLM_CONCURRENCY_<PROVIDER>`)
|
|
|
|
|
+- Dashboard REST API (`/api/v1/*`) for clusters, sentiment series, entity frequencies
|
|
|
|
|
+- `get_latest_events()` defaults to all topics (omit `topic` for unfiltered)
|
|
|
|
|
+
|
|
|
|
|
+## Previous: v1 architecture
|
|
|
- FastMCP SSE server mounted at `/mcp`
|
|
- FastMCP SSE server mounted at `/mcp`
|
|
|
- SQLite cache for clusters + Groq summary caches
|
|
- SQLite cache for clusters + Groq summary caches
|
|
|
- RSS fetch (breakingthenews.net)
|
|
- RSS fetch (breakingthenews.net)
|