extract_entities.prompt 4.2 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
  1. Input cluster JSON:
  2. {cluster_json}
  3. You MUST extract a news signal from the headline AND summary. Return STRICT JSON only.
  4. Task:
  5. 1) infer the best top-level topic (crypto, macro, regulation, ai, other)
  6. 2) extract concise ENTITIES (proper nouns only)
  7. 3) assign sentiment (positive/negative/neutral) + score (-1.0 to 1.0)
  8. 4) provide short KEYWORDS (thematic tags, 1-2 words, NOT proper nouns)
  9. === ENTITY RULES (strict) ===
  10. - ONLY specific named people, places, organizations, titles, products, tickers. 1-5 words.
  11. - Examples of entities: "Donald Trump", "Federal Reserve", "Bitcoin", "SEC", "ECB", "Iran", "Gaza", "Nvidia", "Apple", "ChatGPT", "Binance", "Jerome Powell", "BTC", "ETH", "Ethereum", "OPEC+", "H100", "Blackwell"
  12. - Examples of NON-entities (these are THEMES/CONCEPTS → put in KEYWORDS):
  13. "inflation", "interest rates", "rates", "euro", "dollar", "oil", "gold", "war", "election", "regulation", "sanctions", "tariffs", "AI", "crypto", "ETF", "monetary policy", "fiscal policy", "trade war", "supply chain", "recession", "growth", "employment", "unemployment", "GDP", "CPI", "PPI", "US", "United States", "EU", "Europe", "China", "eurozone", "oil prices", "stock market", "bond yields"
  14. - Do NOT include common nouns, abstract concepts, or thematic terms — even if finance/crypto related.
  15. - Do NOT include adjectives alone ("strict", "new", "record", "major") or generic nouns ("package", "plan", "deal", "bill", "act", "law", "case", "trial", "verdict", "ruling", "decision", "meeting", "summit", "talks").
  16. === KEYWORD RULES (strict) ===
  17. - Each keyword MUST be 1-2 words. PREFER 2-word phrases. Avoid single words unless they are established compound concepts (e.g. "inflation" is ok alone, "sanctions" is ok alone).
  18. - Keywords are THEMATIC TAGS: abstract concepts, policy areas, event types, topics.
  19. - Good 2-word keywords: "interest rates", "monetary policy", "securities law", "airstrikes", "missile sites", "regional escalation", "trade war", "supply chain", "recession risk", "inflation data", "ETF inflows", "institutional demand", "price surge", "AI chips", "earnings beat", "revenue growth", "chip demand", "rate cut", "eurozone inflation", "deposit rate", "monetary easing", "production cuts", "oil prices", "global supply", "demand concerns", "high-risk systems", "compliance requirements", "criminal conviction", "hush money", "falsifying records", "historic verdict", "guilty verdict", "stimulus package", "infrastructure spending", "property sector"
  20. - Bad keywords: proper nouns (these go in entities), SINGLE generic words ("unregistered", "securities", "ETFs", "inflows", "strict", "rules", "package", "economy", "oil", "prices", "cuts", "demand", "growth", "beat", "report", "data", "concerns"), verb phrases ("warns Iran", "hikes rates", "cuts rates", "sues Binance"), full headline fragments, anything over 2 words.
  21. - Return 2-4 keywords. Fewer is better than bad ones.
  22. === DECISION PROCEDURE ===
  23. For each candidate term in the text:
  24. 1. Is it a specific named person/place/org/product/ticker? → ENTITY
  25. 2. Is it a theme, topic, policy area, or event type? → KEYWORD
  26. 3. Can you form a meaningful 2-word phrase? → KEYWORD (use the phrase)
  27. 4. Unclear? Default to KEYWORD (safer to miss an entity than pollute entities with themes)
  28. === TOPIC CLASSIFICATION ===
  29. - crypto: Bitcoin, Ethereum, crypto exchanges, DeFi, tokens, mining, ETFs
  30. - macro: central banks (Fed, ECB, BoE, BoJ), interest rates, inflation, GDP, employment, fiscal/monetary policy, oil, commodities, China economy
  31. - regulation: SEC, CFTC, lawsuits, enforcement, legislation, compliance, legal rulings, EU AI Act, financial regulation
  32. - ai: AI models, chips (Nvidia, AMD), LLMs, generative AI, AI companies, AI regulation (but prefer 'regulation' if legal focus)
  33. - other: geopolitics, war, politics, elections, corporate earnings (non-AI), general business
  34. === SENTIMENT RULES ===
  35. - positive: clearly encouraging, improving, or supportive tone
  36. - negative: clearly alarming, worsening, severe, conflict, loss, risk, warning tone
  37. - neutral: factual, balanced, or mixed
  38. - sentimentScore must be a number from -1.0 to 1.0 and should reflect the sentiment label.
  39. Return STRICT JSON with EXACT keys only:
  40. { topic, entities, sentiment, sentimentScore, keywords }
  41. where topic is one of [crypto, macro, regulation, ai, other].