소스 검색

version hash

Lukas Goldschmidt 1 주 전
부모
커밋
5190469475
3개의 변경된 파일69개의 추가작업 그리고 14개의 파일을 삭제
  1. 37 0
      AGENTS.md
  2. 20 14
      news_mcp/mcp_server_fastmcp.py
  3. 12 0
      version-hash.sh

+ 37 - 0
AGENTS.md

@@ -72,6 +72,43 @@ docker exec -it news-mcp python3 scripts/backfill_junction_tables.py
 - Timestamps in cluster payloads are normalized to ISO 8601 UTC (`YYYY-MM-DDTHH:MM:SS+00:00`) at write time in `sanitize_cluster_payload()`.
 - **Single data directory**: default `DATA_DIR` is repo-root `./data/` — used by both `run.sh` and `docker-compose up`. No env override needed.
 
+## Version Hash
+
+Every running server exposes a deterministic content hash via health endpoints. Use it to prove an agent that the live container was restarted with new code.
+
+**How it works:** At import time the server walks all `.py` files under `news_mcp/`, sorts them by path, and computes SHA-256 of their concatenated contents, taking the first 9 hex characters. No git dependency — works identically in Docker and native runs.
+
+**Where to find it:**
+- `GET /health` → `{"status":"ok","uptime":...,"version":"624993d5f"}`
+- `GET /api/v1/health` → `{...,"version":"624993d5f",...}`
+
+**Workflow to verify a restart:**
+
+When you tell an agent "I restarted the server" and they're unsure, have them curl the live server and compare the hash against `./version-hash.sh`:
+
+```
+# Agent: check what the live server currently reports
+curl -s http://192.168.0.200:8506/health
+
+# Agent: check what this codebase would produce
+bash version-hash.sh
+```
+
+If the two hashes match, the container is running this exact code. If they differ, the container is on a different version. Report both hashes to the user — they can then confirm or investigate.
+
+**Shell script:** `./version-hash.sh` in the repo root computes the same hash the server would use for the current codebase. Run it locally to predict what hash a freshly-built container would report:
+    ```
+    $ bash version-hash.sh
+    624993d5f
+    ```
+    If the script and the live server return different hashes, the live container is running different code than this checkout.
+
+**Design rationale:** A content hash beats a git commit hash for this purpose because:
+ - `.git/` is excluded from Docker images — git-based hashes always return `"unknown"` in containers
+ - A content hash changes on any file edit without requiring a git commit first
+ - It is perfectly reproducible — same files always produce the same hash, on any machine
+ - No build step, no version file to maintain, no CI pipeline dependency
+
 ## Timestamp Contract
 - `payload_ts` SQL column (VIRTUAL GENERATED) is the ONLY way to filter by event time. Use `WHERE payload_ts >= ?` in SQL. Never parse JSON timestamps in Python for time ranges.
 - `payload.timestamp` in JSON is guaranteed `YYYY-MM-DDTHH:MM:SS+00:00` at write time (enforced by `sanitize_cluster_payload()`).

+ 20 - 14
news_mcp/mcp_server_fastmcp.py

@@ -1,8 +1,8 @@
 from __future__ import annotations
 
 import asyncio
+import hashlib
 import logging
-import subprocess
 
 import math
 import re
@@ -43,19 +43,25 @@ logging.basicConfig(
 
 _PROCESS_STARTED_AT = time.monotonic()
 
-_REPO_ROOT = Path(__file__).resolve().parent
-try:
-    _VERSION_HASH = (
-        subprocess.check_output(
-            ["git", "rev-parse", "--short=9", "HEAD"],
-            cwd=str(_REPO_ROOT),
-            stderr=subprocess.DEVNULL,
-        )
-        .decode()
-        .strip()
-    )
-except Exception:
-    _VERSION_HASH = "unknown"
+_PACKAGE_DIR = Path(__file__).resolve().parent
+
+
+def _compute_version_hash() -> str:
+    """SHA-256 of all .py files under news_mcp/, sorted by relative path.
+
+    Deterministic across machines and environments — no git dependency.
+    Works identically in Docker and native runs.
+    """
+    hasher = hashlib.sha256()
+    for f in sorted(_PACKAGE_DIR.rglob("*.py")):
+        try:
+            hasher.update(f.read_bytes())
+        except OSError:
+            continue
+    return hasher.hexdigest()[:9]
+
+
+_VERSION_HASH = _compute_version_hash()
 
 mcp = FastMCP(
     "news-mcp",

+ 12 - 0
version-hash.sh

@@ -0,0 +1,12 @@
+#!/usr/bin/env bash
+# Compute a deterministic content hash of all news_mcp source files.
+# Produces the same value the server exposes at GET /health via _VERSION_HASH.
+set -euo pipefail
+
+cd "$(dirname "$0")"  # always run from repo root
+
+find news_mcp -name '*.py' -type f \
+  | sort \
+  | xargs cat \
+  | sha256sum \
+  | cut -c1-9