Bläddra i källkod

news-mcp: make entity lookup timeframe-aware

Lukas Goldschmidt 1 månad sedan
förälder
incheckning
c68451a1e8
3 ändrade filer med 9 tillägg och 9 borttagningar
  1. 2 0
      PROJECT.md
  2. 2 2
      README.md
  3. 5 7
      news_mcp/mcp_server_fastmcp.py

+ 2 - 0
PROJECT.md

@@ -20,6 +20,7 @@ Provide a signal-extraction MCP server that converts RSS into **deduplicated, en
 ## MCP tools (current)
 - `get_latest_events(topic, limit)`
 - `get_events_for_entity(entity, limit)`
+- `get_events_for_entity(entity, limit, timeframe)`
 - `get_event_summary(event_id)`
 - `detect_emerging_topics(limit)`
 - `get_related_entities(subject, timeframe, limit)`
@@ -39,3 +40,4 @@ Provide a signal-extraction MCP server that converts RSS into **deduplicated, en
 - Merge-analysis script exists to inspect candidate cluster pairs at multiple thresholds
 - Merge pass exists for destructive consolidation once thresholds look sane
 - Article-dedup cleanup exists for fixing duplicated article records already in SQLite
+- Entity lookup now respects timeframe as the scan window, with limit acting as a cap

+ 2 - 2
README.md

@@ -30,9 +30,9 @@ Health:
 - `topic` is a coarse category: `crypto | macro | regulation | ai | other`
 - when `include_articles=true`, includes `articles[].url` + minimal fields per returned cluster
 
-2) `get_events_for_entity(entity, limit, include_articles=false)`
+2) `get_events_for_entity(entity, limit, timeframe="24h", include_articles=false)`
 - substring, case-insensitive match over extracted `entities`
-- uses a shallow recent scan first, then falls back to a wider historical scan if needed
+- uses the requested timeframe as the scan window; `limit` is the cap within that window
 - when `include_articles=true`, includes `articles[].url` + minimal fields per returned cluster
 
 3) `get_event_summary(event_id, include_articles=false)`

+ 5 - 7
news_mcp/mcp_server_fastmcp.py

@@ -108,8 +108,8 @@ async def get_latest_events(topic: str = "crypto", limit: int = 5, include_artic
     return out
 
 
-@mcp.tool(description="What's happening with X? Filter latest clusters by extracted entity substring (case-insensitive).")
-async def get_events_for_entity(entity: str, limit: int = 10, include_articles: bool = False):
+@mcp.tool(description="What's happening with X? Filter clusters by extracted entity substring (case-insensitive) within a timeframe.")
+async def get_events_for_entity(entity: str, limit: int = 10, timeframe: str = "24h", include_articles: bool = False):
     limit = max(1, min(int(limit), 30))
     query = normalize_query(entity).strip().lower()
     if not query:
@@ -124,7 +124,6 @@ async def get_events_for_entity(entity: str, limit: int = 10, include_articles:
     }
     query_terms = {q for q in query_terms if q}
 
-    # Cache-first: search recent clusters across all topics.
     store = SQLiteClusterStore(DB_PATH)
 
     def _match_clusters(clusters: list[dict]) -> list[dict]:
@@ -140,10 +139,9 @@ async def get_events_for_entity(entity: str, limit: int = 10, include_articles:
     clusters = store.get_latest_clusters_all_topics(ttl_hours=CLUSTERS_TTL_HOURS, limit=limit * 5)
     hits = _match_clusters(clusters)
 
-    # If the recent slice misses, broaden the search window before giving up.
-    if not hits:
-        clusters = store.get_latest_clusters_all_topics(ttl_hours=24 * 7, limit=500)
-        hits = _match_clusters(clusters)
+    hours = _parse_timeframe_to_hours(timeframe)
+    clusters = store.get_latest_clusters_all_topics(ttl_hours=hours, limit=max(200, limit * 10))
+    hits = _match_clusters(clusters)
 
     # Compress to tool response shape.
     out = []