Преглед изворни кода

Docs: add future plan for emerging entity graph over time

Lukas Goldschmidt пре 1 месец
родитељ
комит
980c2b8996
2 измењених фајлова са 41 додато и 0 уклоњено
  1. 28 0
      OUTLOOK.md
  2. 13 0
      PROJECT.md

+ 28 - 0
OUTLOOK.md

@@ -475,6 +475,34 @@ The first version is now effectively a usable baseline. The remaining work for v
 
 ## Where v0.2.0 should lead
 
+### Future plan (worth building slowly): “Emerging entity graph over time”
+Right now `detect_emerging_topics()` returns a flat list of emerging *topics/entities*.
+Next-level idea: turn it into an **entity graph** that an agent can reason over.
+
+**Core concept**
+- Collapse/group results into canonical entity nodes (e.g. `iran`, `israel`, `donald_trump`, `strait_of_hormuz`, etc.)
+- Build weighted edges from co-occurrence in recent clusters:
+  - edge weight ~ frequency/co-occurrence strength
+  - node weight ~ trend_score + count (+ optional avg_importance)
+- Infer communities (graph grouping) so related nodes form stable “story neighborhoods”
+
+**Over time (the important part)**
+- Each refresh window produces a snapshot of the graph
+- Store snapshots / deltas to observe:
+  - rising/falling node weights (“momentum”)
+  - strengthening/weaker relations
+  - emerging communities and topic shifts
+
+**Suggested output for an eventual agent tool**
+- `get_emerging_entity_graph(timeframe, limit)` returning:
+  - grouped communities
+  - top nodes + weights
+  - top relations + direction (optional)
+  - summary of “what changed since last snapshot”
+
+This needs extra time to become a real usable MCP tool, so it’s intentionally captured here for later execution.
+
+
 1. **Normalization layer**
 
    * canonicalize acronyms and entity variants before storage / querying

+ 13 - 0
PROJECT.md

@@ -26,6 +26,19 @@ Provide a signal-extraction MCP server that converts RSS into **deduplicated, en
 - `get_related_entities(subject, timeframe, limit)`
 
 ## Refresh & caching
+
+## Future work (planned): entity graph over time
+Instead of treating `detect_emerging_topics()` as a flat list, we want a higher-level representation:
+
+- Convert emerging topic/entity co-occurrence signals into a **weighted entity graph**
+- Group the graph into **communities** (story neighborhoods)
+- Track **time evolution** across refresh windows:
+  - node “momentum” (trend_score/count changes)
+  - edge strength changes (relation tightening/weakening)
+  - community emergence/disappearance
+
+Eventual agent tool shape (later): `get_emerging_entity_graph(timeframe, limit)`.
+
 - Background refresh every `NEWS_REFRESH_INTERVAL_SECONDS` (default 900s)
 - Feed-hash skipping to avoid redundant RSS+Groq work
 - Cluster TTL (`NEWS_CLUSTERS_TTL_HOURS` via `CLUSTERS_TTL_HOURS`)