Kaynağa Gözat

feat: max-quality tool descriptions and get_capabilities for v0.5.0

Tool descriptions rewritten for maximum agent utility:
- Every tool now has rich descriptions with clear semantics, parameter
  details (types, defaults, ranges, examples, meanings), usage guidance,
  and output field documentation
- get_events_for_entity: documents SQL-level junction-table search,
  no row-limit blind spot, case-insensitive matching
- get_event_summary: documents all 12 output fields, internal cursor
  convention, default include_articles=true
- detect_emerging_topics: documents signal types, velocity/source_count
  interpretation, timeframe comparison technique
- get_related_recent_entities: documents output schema (MID, scores),
  include_trends flag
- get_news_sentiment: documents score range, averaging behavior
- debug_dedup: kept sparse (diagnostic only)

get_capabilities:
- Added version field (v0.5.0)
- Updated purpose to mention three-layer dedup and content-change detection
- Added content-change detection to guidance section
- Recipes: added descriptions, concrete parameter examples, 6th recipe
  (full-investigation)
- Agent tips: rewritten for clarity, added include_articles guidance
  and content-change detection awareness
- Example chains: concrete parameter values instead of placeholders
Lukas Goldschmidt 6 gün önce
ebeveyn
işleme
8f82161c86
1 değiştirilmiş dosya ile 100 ekleme ve 71 silme
  1. 100 71
      news_mcp/mcp_server_fastmcp.py

+ 100 - 71
news_mcp/mcp_server_fastmcp.py

@@ -133,87 +133,97 @@ def _tool_card(name: str, description: str, inputs: list[dict], outputs: list[st
 NEWS_TOOL_CARDS = [
     _tool_card(
         "get_feeds",
-        "List all configured RSS feeds with their enabled/disabled status.",
+        "List all configured RSS feeds with their current enabled/disabled status, last fetch item count, and timestamps. Use to discover which feeds are active before investigating content.",
         [],
         ["feeds[]: {feed_key, enabled, last_hash, last_item_count, updated_at}"],
         ["Use this to see which feeds are currently active or disabled."],
     ),
     _tool_card(
         "toggle_feed",
-        "Enable or disable a specific RSS feed by URL.",
+        "Enable or disable a specific RSS feed by URL. Changes take effect on the next background refresh cycle. Returns the updated feed state including the new enabled flag.",
         [
             {"name": "feed_url", "type": "string", "meaning": "the feed URL to toggle"},
             {"name": "enabled", "type": "boolean", "meaning": "true to enable, false to disable"},
         ],
         ["ok", "feed_key", "enabled"],
-        ["Changes take effect on the next refresh cycle."],
+        ["Changes take effect on the next refresh cycle, not immediately."],
     ),
     _tool_card(
         "get_latest_events",
-        "Get the newest deduplicated clusters for a topic or resolved entity-like query.",
+        "Get the newest deduplicated news clusters, optionally filtered by topic. Each cluster is a group of related articles with LLM-extracted entities, thematic keywords, sentiment, importance score, and source list. Use this as the primary entry point for 'what is happening now' queries. Clusters are ordered by recency (freshest first). Set include_articles=true to include the underlying article URLs and titles for attribution.",
         [
-            {"name": "topic", "type": "string", "default": "all topics", "meaning": "coarse category (crypto, macro, regulation, ai, other), entity-like topic, or omit for all topics"},
+            {"name": "topic", "type": "string", "default": "all topics", "meaning": "coarse category: crypto, macro, regulation, ai, other. Omit for all topics."},
             {"name": "limit", "type": "integer", "default": 5, "range": "1-20"},
-            {"name": "include_articles", "type": "boolean", "default": False},
+            {"name": "include_articles", "type": "boolean", "default": False, "meaning": "include article URLs and titles in the response"},
         ],
         ["headline", "summary", "entities", "keywords", "sentiment", "importance", "sources", "timestamp", "articles?"],
-        ["Use when you want the freshest clusters. Each cluster includes both named entities and LLM-curated thematic keywords describing what the story is about."],
+        ["Each cluster includes both named entities (people, places, companies with optional MID/canonical_label) and LLM-curated thematic keywords (what the story is about). Use keywords to understand subject-matter themes beyond named entities."],
     ),
     _tool_card(
         "get_events_for_entity",
-        "Search recent clusters for a person, place, company, theme, or keyword by matching entities and thematic keywords.",
+        "Search recent clusters for a person, place, company, theme, or keyword. Matches against both named entities (e.g. 'Bitcoin', 'Jerome Powell') and thematic keywords (e.g. 'rate cuts', 'AI regulation') using SQL-level junction-table search across the full time window — no row-limit blind spot. Returns full cluster objects with headline, summary, entities, keywords, sentiment, importance, sources, and optional articles. Use this for entity-centered or theme-centered deep dives.",
         [
-            {"name": "entity", "type": "string", "meaning": "entity label, phrase, or keyword to search for"},
-            {"name": "timeframe", "type": "string", "default": "24h", "examples": ["24h", "72h", "3d"]},
+            {"name": "entity", "type": "string", "meaning": "entity label, phrase, or keyword to search for. Case-insensitive."},
+            {"name": "timeframe", "type": "string", "default": "24h", "examples": ["4h", "24h", "72h", "3d", "7d"], "meaning": "lookback window. Suffix with h (hours) or d (days)."},
             {"name": "limit", "type": "integer", "default": 10, "range": "1-30"},
-            {"name": "include_articles", "type": "boolean", "default": False},
+            {"name": "include_articles", "type": "boolean", "default": False, "meaning": "include article URLs and titles"},
         ],
         ["headline", "summary", "entities", "keywords", "sentiment", "importance", "sources", "timestamp", "articles?"],
-        ["Matches against both named entities and thematic keywords. Use this for an entity-centered or theme-centered deep dive."],
+        ["Matches both named entities and thematic keywords. Use timeframe to control lookback. Results are ordered by recency."],
     ),
     _tool_card(
         "get_event_summary",
-        "Produce a concise LLM-written explanation for one cluster and key facts.",
+        "Produce a rich, LLM-written narrative for a single cluster by its cluster_id. Returns the headline, merged summary, key facts, entities, keywords, related entities, related keywords, topic, sentiment, importance score, and the full article list (included by default). This is the primary tool for full cluster drill-down. The cluster_id is an internal cursor — do not surface it in user-facing prose unless explicitly requested.",
         [
-            {"name": "event_id", "type": "string", "meaning": "cluster_id; do not surface in user-facing prose"},
-            {"name": "include_articles", "type": "boolean", "default": True},
+            {"name": "event_id", "type": "string", "meaning": "cluster_id from a previous tool call. Internal cursor — do not show to users."},
+            {"name": "include_articles", "type": "boolean", "default": True, "meaning": "include the underlying articles list (URLs, titles, sources, timestamps)"},
         ],
         ["headline", "mergedSummary", "keyFacts", "sources", "entities", "keywords", "related_entities", "related_keywords", "topic", "sentiment", "importance", "articles"],
-        ["Rich cluster drill-down. Returns LLM summary + cluster metadata + articles. Defaults to include articles."],
+        ["Rich cluster drill-down. Returns LLM summary + cluster metadata + articles. Defaults to include articles. Use after get_latest_events or get_events_for_entity to get full context on a specific cluster."],
     ),
     _tool_card(
         "detect_emerging_topics",
-        "Surface emerging entities, thematic keywords, and phrases that are accelerating in the recent window.",
+        "Surface emerging entities, thematic keywords, and headline phrases that are accelerating in the recent window. Each result includes a trend_score, velocity (acceleration), recent_count, prior_count, source_count, related_entities, related_keywords, and signal_type. Signal types: entity (named entity, highest confidence), keyword (thematic descriptor), phrase (headline bigram). High velocity + high source_count = strong signal. Use timeframe to distinguish what's hot right now (4h) vs persistently trending (3d). Use around= to scope to a specific entity's neighborhood.",
         [
             {"name": "limit", "type": "integer", "default": 10, "range": "1-20"},
-            {"name": "timeframe", "type": "string", "default": "24h", "examples": ["4h", "24h", "3d"]},
-            {"name": "topic", "type": "string", "default": "all topics", "examples": ["crypto", "macro", "regulation", "ai", "other"]},
-            {"name": "around", "type": "string", "default": "none", "meaning": "entity to scope results to its neighborhood (e.g. \"Bitcoin\")"},
+            {"name": "timeframe", "type": "string", "default": "24h", "examples": ["4h", "24h", "3d"], "meaning": "lookback window for velocity calculation"},
+            {"name": "topic", "type": "string", "default": "all topics", "examples": ["crypto", "macro", "regulation", "ai", "other"], "meaning": "scope to a specific category"},
+            {"name": "around", "type": "string", "default": "none", "meaning": "entity to scope results to its neighborhood (e.g. 'Bitcoin', 'Fed')"},
         ],
         ["topic", "trend_score", "velocity", "recent_count", "prior_count", "source_count", "related_entities", "related_keywords", "signal_type"],
-        ["Use timeframe to control lookback, topic to scope to a category, around to find what's emerging near a specific entity. Signal types: entity (named entity), keyword (thematic descriptor), phrase (headline bigram). Check velocity and source_count to distinguish real spikes from noise."],
+        ["Use timeframe to control lookback, topic to scope to a category, around to find what's emerging near a specific entity. Check velocity and source_count to distinguish real spikes from noise. Compare results at different timeframes (e.g. 4h vs 3d) to distinguish hot-right-now from persistently trending."],
     ),
     _tool_card(
         "get_news_sentiment",
-        "Estimate sentiment around an entity or keyword over a lookback window.",
+        "Estimate aggregate sentiment around an entity or keyword over a lookback window. Matches clusters via both named entities and thematic keywords using SQL-level search. Returns the sentiment label (positive/negative/neutral), numeric score (-1 to +1), and the number of matching clusters. Use after locating a cluster set or entity neighborhood to gauge overall tone.",
         [
-            {"name": "entity", "type": "string", "meaning": "entity label, phrase, or keyword to analyze"},
-            {"name": "timeframe", "type": "string", "default": "24h"},
+            {"name": "entity", "type": "string", "meaning": "entity label, phrase, or keyword to analyze. Case-insensitive."},
+            {"name": "timeframe", "type": "string", "default": "24h", "examples": ["24h", "72h", "3d"]},
         ],
         ["entity", "sentiment", "score", "cluster_count"],
-        ["Matches clusters by entities and keywords. Use after locating a cluster set or entity neighborhood."],
+        ["Matches clusters by entities and keywords. Use after locating a cluster set or entity neighborhood. Score is the average sentimentScore across matching clusters."],
     ),
     _tool_card(
         "get_related_recent_entities",
-        "Find entities and thematic keywords commonly co-occurring with a subject in recent clusters, optionally blended with Google Trends suggestions.",
+        "Find entities and thematic keywords commonly co-occurring with a subject in recent clusters, optionally blended with Google Trends suggestions. Returns related entities with normalized labels, canonical labels, MID (Wikidata ID when available), source counts, and co-occurrence scores. Use this to drill from a subject into its neighborhood — then feed the strongest related entities into get_events_for_entity for deeper investigation.",
         [
-            {"name": "subject", "type": "string", "meaning": "canonical entity or subject phrase"},
-            {"name": "timeframe", "type": "string", "default": "72h"},
+            {"name": "subject", "type": "string", "meaning": "canonical entity or subject phrase (e.g. 'Iran', 'Bitcoin', 'AI regulation')"},
+            {"name": "timeframe", "type": "string", "default": "72h", "examples": ["24h", "72h", "3d"]},
             {"name": "limit", "type": "integer", "default": 10, "range": "1-25"},
-            {"name": "include_trends", "type": "boolean", "default": True},
+            {"name": "include_trends", "type": "boolean", "default": True, "meaning": "blend local co-occurrence with Google Trends suggestions for broader coverage"},
         ],
         ["subject", "related[].normalized", "related[].canonical_label", "related[].mid", "related[].sources", "related[].scores"],
-        ["Use this to drill from a subject into related entities and themes, then feed results into get_events_for_entity."],
+        ["Use this to drill from a subject into related entities and themes, then feed results into get_events_for_entity. Set include_trends=false for local-only co-occurrence."],
+    ),
+    _tool_card(
+        "debug_dedup",
+        "Inspect dedup status for an article URL. Returns whether the article is in seen_articles, its article_key, cluster_id, and (if title provided) similarity signals against the top-10 most similar existing clusters including match decisions and active thresholds.",
+        [
+            {"name": "url", "type": "string", "meaning": "article URL to inspect"},
+            {"name": "title", "type": "string", "default": "none", "meaning": "article title for similarity signal computation"},
+        ],
+        ["seen", "article_key", "cluster_id", "first_seen", "stored_url", "similarity_signals?", "title_threshold", "jaccard_threshold"],
+        ["Diagnostic tool. Use to understand why an article was or was not deduplicated."],
     ),
 ]
 
@@ -221,53 +231,70 @@ NEWS_TOOL_CARDS = [
 NEWS_COMPOSITION_RECIPES = [
     {
         "name": "fresh-news-tail",
+        "description": "Get the latest news clusters for a topic or across all topics.",
         "steps": [
-            "get_latest_events(topic=...)",
+            "get_latest_events(topic=..., limit=5-10)",
             "optionally get_event_summary(event_id=...) for the strongest cluster",
         ],
-        "notes": ["Best for a quick tail of what is happening now. Omit topic for all topics, or pass crypto/macro/regulation/ai/other to filter."]
+        "notes": ["Omit topic for all topics. Use include_articles=true when you need source attribution."],
     },
     {
         "name": "entity-deep-dive",
+        "description": "Full investigation of an entity: clusters, sentiment, and narrative summary.",
         "steps": [
-            "get_events_for_entity(entity=...)",
-            "get_event_summary(event_id=...)",
-            "get_news_sentiment(entity=..., timeframe=...)",
+            "get_events_for_entity(entity=..., timeframe='24h', limit=10)",
+            "get_news_sentiment(entity=..., timeframe='24h')",
+            "get_event_summary(event_id=...) for the strongest cluster",
         ],
-        "notes": ["Prefer canonical entity labels when you have them; the server normalizes for you."],
+        "notes": ["Prefer canonical entity labels; the server normalizes common aliases. Increase timeframe to '3d' or '7d' for broader coverage."],
     },
     {
         "name": "subject-neighborhood",
+        "description": "Expand from a subject to its related entities and themes, then investigate the strongest connections.",
         "steps": [
-            "get_related_recent_entities(subject=...)",
-            "for each strong related entity, call get_events_for_entity(entity=...)",
+            "get_related_recent_entities(subject=..., include_trends=true)",
+            "for each strong related entity, call get_events_for_entity(entity=...) to get clusters",
         ],
-        "notes": ["Use this when you want a graph-like expansion around a subject."]
+        "notes": ["Use this when you want a graph-like expansion around a subject. Filter related entities by source_count for quality."],
     },
     {
         "name": "emerging-signal",
+        "description": "Find what's emerging and investigate the top signals.",
+        "steps": [
+            "detect_emerging_topics(limit=10, timeframe='24h')",
+            "choose a high-velocity entity or keyword from results",
+            "get_events_for_entity(entity=..., timeframe='24h')",
+            "get_news_sentiment(entity=..., timeframe='24h')",
+        ],
+        "notes": ["Use timeframe='4h' for what's hot right now, '3d' for weekly trends. Check velocity and source_count to distinguish real spikes from noise. Use around= to scope to a specific entity's neighborhood."],
+    },
+    {
+        "name": "full-investigation",
+        "description": "Complete pipeline: emerging topics → entity drill-down → sentiment → neighborhood scouting.",
         "steps": [
-            "detect_emerging_topics(limit=..., timeframe=..., topic=..., around=...)",
-            "choose a topic/entity from the results",
-            "get_events_for_entity(entity=...)",
-            "get_news_sentiment(entity=...)",
+            "detect_emerging_topics(limit=20, timeframe='3d')",
+            "pick an emerging entity/keyword; note its related_entities and related_keywords",
+            "get_event_summary(event_id=...) on the top cluster for full context including articles",
+            "get_news_sentiment(entity=...) to gauge tone around the emerging topic",
+            "detect_emerging_topics(around=<entity>, timeframe='4h') to scout its neighborhood",
         ],
-        "notes": ["Use timeframe to control lookback (e.g. \"4h\" for what's hot right now, \"3d\" for weekly trends), topic to scope to a category, around to find what's emerging near a specific entity. Check velocity and source_count to distinguish real spikes from noise."],
+        "notes": ["This is the most comprehensive recipe. Use when building a full picture of an emerging story."],
     },
 ]
 
 
 NEWS_AGENT_TIPS = [
-    "If you need a fast answer, start with get_latest_events, then summarize the strongest cluster with get_event_summary.",
-    "If a user asks about a person/place/company/theme, use get_events_for_entity before broadening to get_related_recent_entities.",
-    "Treat cluster_id as an internal cursor, not user-facing output; use it only for follow-up tool calls.",
-    "When describing clusters, keep sources and timestamps visible so the user can assess recency and provenance.",
-    "Prefer a short chain of tools over many parallel calls unless you are building a neighborhood map or comparison table.",
-    "For tricky names, rely on the server's resolver instead of inventing alias rules in the client.",
-    "Use detect_emerging_topics with timeframe=\"4h\" for what's hot right now, timeframe=\"3d\" for weekly trends. Use topic= to scope to a category, around= to find what's emerging near a specific entity. Check velocity to distinguish accelerating signals from steady-state ones. Filter by signal_type to focus on entities, keywords, or phrases. Each result also includes related_keywords for thematic context.",
-    "get_event_summary returns a rich result: headline, mergedSummary, keyFacts, entities, keywords, related_entities, related_keywords, topic, sentiment, importance, and articles (included by default). Use it for full cluster drill-down.",
-    "Each cluster contains both entities (named entities with identity resolution) and keywords (thematic descriptors). Use keywords to understand what a story is about beyond the named entities.",
-    "Use detect_emerging_topics with multiple timeframes (e.g. 4h vs 3d) and compare results to distinguish what's hot right now vs what's persistently trending. related_keywords help identify thematic neighborhoods.",
+    "Start with get_latest_events for a quick tail of what's happening. Use get_event_summary for full context on a specific cluster.",
+    "For entity/theme questions, use get_events_for_entity first, then broaden with get_related_recent_entities.",
+    "Treat cluster_id as an internal cursor for follow-up calls — do not surface it in user-facing prose unless explicitly requested.",
+    "Always preserve sources and timestamps when summarizing results so users can assess recency and provenance.",
+    "Prefer canonical entity labels when available; the server normalizes common aliases and resolves MIDs.",
+    "Use detect_emerging_topics with multiple timeframes (4h vs 3d) to distinguish hot-right-now from persistently trending. Filter by signal_type=entity for highest-confidence signals.",
+    "get_event_summary returns the richest single-cluster view: mergedSummary, keyFacts, entities, keywords, related_entities, related_keywords, and articles. Use it for full drill-down.",
+    "Each cluster has both entities (named entities with identity resolution) and keywords (thematic descriptors). Keywords capture what a story is about beyond the named entities.",
+    "For sentiment analysis, use get_news_sentiment after locating clusters. The score is averaged across all matching clusters in the timeframe.",
+    "Use include_articles=true when you need source attribution, article URLs, or timestamps for individual articles within a cluster.",
+    "The server detects in-place article content updates (e.g. stubs that get fleshed out) and automatically re-clusters and re-enriches them.",
 ]
 
 
@@ -275,47 +302,47 @@ NEWS_EXAMPLE_CHAINS = [
     {
         "task": "What is happening now?",
         "chain": [
-            "get_latest_events(topic=...)",
-            "get_event_summary(event_id=...) if one cluster looks important",
+            "get_latest_events(topic='macro', limit=5)",
+            "get_event_summary(event_id=<cluster_id>) for the most important cluster",
         ],
     },
     {
         "task": "Deep dive on an entity",
         "chain": [
-            "get_events_for_entity(entity=..., timeframe=...)",
-            "get_news_sentiment(entity=..., timeframe=...)",
-            "get_event_summary(event_id=...) for the strongest cluster",
+            "get_events_for_entity(entity='Bitcoin', timeframe='24h', limit=10)",
+            "get_news_sentiment(entity='Bitcoin', timeframe='24h')",
+            "get_event_summary(event_id=<cluster_id>) for the strongest cluster",
         ],
     },
     {
         "task": "Broaden from a subject",
         "chain": [
-            "get_related_recent_entities(subject=..., include_trends=true)",
-            "get_events_for_entity(entity=...) for the strongest related entities",
+            "get_related_recent_entities(subject='Iran', include_trends=true)",
+            "get_events_for_entity(entity=<top_related_entity>, timeframe='72h')",
         ],
     },
     {
         "task": "Find what is emerging",
         "chain": [
-            "detect_emerging_topics(limit=..., timeframe=..., topic=..., around=...) with optional scoping",
-            "get_events_for_entity(entity=...) on one or two emerging terms",
+            "detect_emerging_topics(limit=10, timeframe='24h')",
+            "get_events_for_entity(entity=<emerging_entity>, timeframe='24h')",
         ],
     },
     {
         "task": "What's heating up around a specific entity",
         "chain": [
-            "detect_emerging_topics(around=\"<entity>\", timeframe=\"4h\")",
-            "get_events_for_entity(entity=...) on the top emerging neighbor",
+            "detect_emerging_topics(around='Fed', timeframe='4h')",
+            "get_events_for_entity(entity=<top_emerging_neighbor>, timeframe='24h')",
         ],
     },
     {
         "task": "Full investigation pipeline",
         "chain": [
-            "detect_emerging_topics(limit=20, timeframe=\"3d\")",
-            "pick an emerging entity/keyword and note its related_entities and related_keywords",
-            "get_event_summary(event_id=...) on the top cluster for full context including articles",
-            "get_news_sentiment(entity=...) to gauge tone around the emerging topic",
-            "detect_emerging_topics(around=<entity>, timeframe=\"4h\") to scout its neighborhood",
+            "detect_emerging_topics(limit=20, timeframe='3d')",
+            "pick an emerging entity; note its related_entities and related_keywords",
+            "get_event_summary(event_id=<cluster_id>) for full context including articles",
+            "get_news_sentiment(entity=<entity>) to gauge tone",
+            "detect_emerging_topics(around=<entity>, timeframe='4h') to scout its neighborhood",
         ],
     },
 ]
@@ -1134,7 +1161,8 @@ async def get_capabilities():
     return {
         "server": {
             "name": "news-mcp",
-            "purpose": "Recent news clusters with entities and thematic keywords, entity/keyword drill-down, sentiment, emerging topics, and related-entity expansion.",
+            "version": "v0.5.0",
+            "purpose": "Deduplicated, enriched news clusters from RSS feeds. Three-layer dedup (feed hash → article URL → content hash). LLM enrichment with entities, keywords, sentiment, summaries. Content-change detection for in-place article updates.",
             "output_conventions": {
                 "cluster_ids": "Do not surface cluster_id in user-facing prose unless explicitly requested; treat it as internal navigation metadata.",
                 "sources": "Always preserve and display sources when summarizing a cluster or entity result.",
@@ -1152,6 +1180,7 @@ async def get_capabilities():
             "When presenting results to users, summarize the cluster; avoid exposing internal IDs unless they are needed for follow-up tool calls.",
             "For emerging topics, use detect_emerging_topics with timeframe and around parameters. Signal types: entity (named entity, highest quality), keyword (thematic descriptor), phrase (headline bigram). High velocity + high source_count = strong signal.",
             "get_events_for_entity and get_news_sentiment match both entities and thematic keywords — use keywords when the subject is a theme rather than a named entity.",
+            "The server detects in-place article content updates (e.g. stubs that get fleshed out at the same URL) and automatically re-clusters and re-enriches them.",
         ],
     }