|
|
@@ -72,6 +72,43 @@ docker exec -it news-mcp python3 scripts/backfill_junction_tables.py
|
|
|
- Timestamps in cluster payloads are normalized to ISO 8601 UTC (`YYYY-MM-DDTHH:MM:SS+00:00`) at write time in `sanitize_cluster_payload()`.
|
|
|
- **Single data directory**: default `DATA_DIR` is repo-root `./data/` — used by both `run.sh` and `docker-compose up`. No env override needed.
|
|
|
|
|
|
+## Version Hash
|
|
|
+
|
|
|
+Every running server exposes a deterministic content hash via health endpoints. Use it to prove an agent that the live container was restarted with new code.
|
|
|
+
|
|
|
+**How it works:** At import time the server walks all `.py` files under `news_mcp/`, sorts them by path, and computes SHA-256 of their concatenated contents, taking the first 9 hex characters. No git dependency — works identically in Docker and native runs.
|
|
|
+
|
|
|
+**Where to find it:**
|
|
|
+- `GET /health` → `{"status":"ok","uptime":...,"version":"624993d5f"}`
|
|
|
+- `GET /api/v1/health` → `{...,"version":"624993d5f",...}`
|
|
|
+
|
|
|
+**Workflow to verify a restart:**
|
|
|
+
|
|
|
+When you tell an agent "I restarted the server" and they're unsure, have them curl the live server and compare the hash against `./version-hash.sh`:
|
|
|
+
|
|
|
+```
|
|
|
+# Agent: check what the live server currently reports
|
|
|
+curl -s http://192.168.0.200:8506/health
|
|
|
+
|
|
|
+# Agent: check what this codebase would produce
|
|
|
+bash version-hash.sh
|
|
|
+```
|
|
|
+
|
|
|
+If the two hashes match, the container is running this exact code. If they differ, the container is on a different version. Report both hashes to the user — they can then confirm or investigate.
|
|
|
+
|
|
|
+**Shell script:** `./version-hash.sh` in the repo root computes the same hash the server would use for the current codebase. Run it locally to predict what hash a freshly-built container would report:
|
|
|
+ ```
|
|
|
+ $ bash version-hash.sh
|
|
|
+ 624993d5f
|
|
|
+ ```
|
|
|
+ If the script and the live server return different hashes, the live container is running different code than this checkout.
|
|
|
+
|
|
|
+**Design rationale:** A content hash beats a git commit hash for this purpose because:
|
|
|
+ - `.git/` is excluded from Docker images — git-based hashes always return `"unknown"` in containers
|
|
|
+ - A content hash changes on any file edit without requiring a git commit first
|
|
|
+ - It is perfectly reproducible — same files always produce the same hash, on any machine
|
|
|
+ - No build step, no version file to maintain, no CI pipeline dependency
|
|
|
+
|
|
|
## Timestamp Contract
|
|
|
- `payload_ts` SQL column (VIRTUAL GENERATED) is the ONLY way to filter by event time. Use `WHERE payload_ts >= ?` in SQL. Never parse JSON timestamps in Python for time ranges.
|
|
|
- `payload.timestamp` in JSON is guaranteed `YYYY-MM-DDTHH:MM:SS+00:00` at write time (enforced by `sanitize_cluster_payload()`).
|