Kaynağa Gözat

chore: bump version to v0.4.0

Lukas Goldschmidt 1 hafta önce
ebeveyn
işleme
0c71a2cc13
1 değiştirilmiş dosya ile 4 ekleme ve 3 silme
  1. 4 3
      PROJECT.md

+ 4 - 3
PROJECT.md

@@ -3,11 +3,12 @@
 ## Goal
 Provide a signal-extraction MCP server that converts RSS into **deduplicated, enriched news clusters** that are easy for agents to use.
 
-## Current architecture (v0.3.2)
+## Current architecture (v0.4.0)
 - FastMCP SSE server mounted at `/mcp`
 - SQLite cache for clusters + entity metadata + feed state + LLM summary caches
-- Concurrent RSS fetch (async `asyncio.gather` + `httpx`, bounded semaphore)
-- **Multi-signal clustering**: cosine embedding + fuzzy title + token Jaccard + consensus cascade; compares against ALL cluster articles (not just seed)
+- **payload_ts** — indexed generated column for SQL-level event-time filtering (no JSON parsing at read time)
+- **cluster_entities** and **cluster_keywords** junction tables with indexes for O(log n) entity/keyword search
+- All read paths use SQL-level filtering (no full-table JSON parsing)
 - **Stable cluster IDs**: `sha1(min_article_key)` — topic-independent, order-independent, consistent across polling cycles. The topic is excluded from the hash so that the same article always maps to the same cluster_id regardless of heuristic vs LLM-enriched topic classification.
 - **Cross-cycle merge**: poller seeds clustering with recent DB clusters (configurable `NEWS_CLUSTER_MAX_AGE_HOURS`, default 4h). Existing clusters are re-bucketed by the same heuristic topic function (`normalize_topic_from_title`) that new articles use, ensuring matching works even when the enriched topic drifted.
 - **Orphan merge**: post-clustering Union-Find pass merges clusters sharing article keys