浏览代码

fix: force re-enrichment when enriched_at is missing from cluster dict

Content-changed articles (detected via content_hash comparison) were not
getting re-enriched because _enrich_one's cache check only looked for
entities+keywords presence. The pre-seeded cluster dict loaded after
step 3c cleared enriched_at from DB still carried entities/keywords,
so the cache check passed and LLM was skipped.

Fix: require enriched_at to be present in the cache check. If missing,
always re-enrich regardless of entities/keywords. This is a reliable
signal that the cluster's content has changed and needs fresh enrichment.
Lukas Goldschmidt 6 天之前
父节点
当前提交
8e87822bad
共有 1 个文件被更改,包括 2 次插入1 次删除
  1. 2 1
      news_mcp/jobs/poller.py

+ 2 - 1
news_mcp/jobs/poller.py

@@ -431,7 +431,8 @@ class ClusterPoller:
             return c2, False
 
         # Cache check: entities + keywords already present → skip
-        if (c2.get("entities") or []) and (c2.get("keywords") or []):
+        # Exception: enriched_at missing means content changed → force re-enrich
+        if c2.get("enriched_at") and (c2.get("entities") or []) and (c2.get("keywords") or []):
             self.logger.debug("enrich skip (cached): cluster=%s topic=%s", cluster_id, topic)
             return c2, False