Selaa lähdekoodia

unified db path

Lukas Goldschmidt 1 viikko sitten
vanhempi
commit
b00c95465c
2 muutettua tiedostoa jossa 10 lisäystä ja 9 poistoa
  1. 9 8
      AGENTS.md
  2. 1 1
      news_mcp/config.py

+ 9 - 8
AGENTS.md

@@ -10,7 +10,7 @@ This project spans two machines. **Always check which machine you're operating o
 | **IP** | 192.168.0.249 | 192.168.0.200 |
 | **Projects dir** | `/home/lucky/.openclaw/workspace/` | `/home/lucky/` |
 | **This repo** | `/home/lucky/.openclaw/workspace/news-mcp/` | `/home/lucky/news-mcp/` |
-| **DB path** | `news_mcp/data/news.sqlite` (usually empty/stale dev copy) | `/app/data/news.sqlite` inside Docker container (host bind-mount: `news_mcp/data/news.sqlite`) |
+| **DB path** | `data/news.sqlite` (host-side: repo-root `data/`) | `/app/data/news.sqlite` inside Docker container (host bind-mount: `data/news.sqlite`) |
 | **Server URL** | localhost:8506 | `http://192.168.0.200:8506` |
 
 - The terminal prompt **always shows the machine name** (e.g. `lucky@thinkcenter-2`, `lucky@latitude`).
@@ -18,15 +18,16 @@ This project spans two machines. **Always check which machine you're operating o
 - When the user says "the live server", "thinkcenter-2", or "remote", they mean 192.168.0.200.
 - The live server runs in Docker (`docker-compose up -d news-mcp`).
 - ssh into live: `ssh lucky@192.168.0.200`
-- The live DB is at `/app/data/news.sqlite` in the container, which bind-mounts from `news_mcp/data/news.sqlite` on the host FS (relative to repo root).
+- The live DB is at `/app/data/news.sqlite` in the container, which bind-mounts from `data/news.sqlite` on the host FS (relative to repo root).
+- **Local and Docker now use the same default path** — `./data/news.sqlite` — so `run.sh` and `docker-compose up` share the same database. The `AGENTS.md` section below ("Docker/DB path oddity") no longer applies on this machine.
 - **Do NOT run maintenance/backfill scripts against the dev DB** — it's empty/stale. Either point explicitly to the live DB or tell the user to run it.
-- **Docker/DB path oddity:** docker-compose sets `NEWS_MCP_DB_PATH=./data/news.sqlite` (relative to `working_dir=/app`), so the container's DB is at `/app/data/news.sqlite`. But the host `.env` does NOT override this, so running the same script on the host resolves `DB_PATH` to the config default (`news_mcp/data/news.sqlite`) — a different, usually empty file. The docker-compose env vars only apply inside the container.
+
 
 ## Local Environment
 - Source the repo-local `.venv` first when it exists.
 - Prefer `./tests.sh` for offline verification and `./live_tests.sh` only for provider-backed smoke checks.
 - Use `./run.sh` to start the server locally; it resolves the repo root and prefers the local Uvicorn binary.
-- The local dev copy has its own separate DB — treat it as empty/stale unless explicitly working with it.
+- The local `data/` directory contains the same DB the Docker container uses — `run.sh` and `docker-compose up` converge on `./data/news.sqlite`.
 
 ## Repo Map
 - `news_mcp/mcp_server_fastmcp.py`: MCP tool surface, startup refresh, pruning, HTTP health endpoints, REST API.
@@ -59,9 +60,8 @@ docker exec -it news-mcp python3 scripts/backfill_junction_tables.py
 ## Docker / Live Server Details
 - `docker-compose.yml` mounts `./:/app` with `working_dir: /app`
 - Data dir and DB path both hardcoded in docker-compose env: `NEWS_MCP_DB_PATH: ./data/news.sqlite`
-- Target DB on live server: `/app/data/news.sqlite` in container → `news_mcp/data/news.sqlite` on host
-- Backfill script: `scripts/normalize_cluster_timestamps.py` — always run with explicit `--db` or set `NEWS_MCP_DB_PATH`
-- The dev DB at `news_mcp/data/news.sqlite` is a separate empty file — never confuse it with the live DB
+- Target DB on live server: `/app/data/news.sqlite` in container → `data/news.sqlite` on host
+- Backfill script: `scripts/normalize_cluster_timestamps.py` — always run with explicit `--db` or set `NEWS_MCP_DB_PATH` (default now matches `./data/news.sqlite`)
 
 ## Current Contracts
 - Clusters are the unit of truth, not raw articles.
@@ -70,6 +70,7 @@ docker exec -it news-mcp python3 scripts/backfill_junction_tables.py
 - Entity aliasing is intentionally conservative; keep `config/entity_aliases.json` tight.
 - `include_articles=true` should keep responses compact and only return minimal article fields.
 - Timestamps in cluster payloads are normalized to ISO 8601 UTC (`YYYY-MM-DDTHH:MM:SS+00:00`) at write time in `sanitize_cluster_payload()`.
+- **Single data directory**: default `DATA_DIR` is repo-root `./data/` — used by both `run.sh` and `docker-compose up`. No env override needed.
 
 ## Timestamp Contract
 - `payload_ts` SQL column (VIRTUAL GENERATED) is the ONLY way to filter by event time. Use `WHERE payload_ts >= ?` in SQL. Never parse JSON timestamps in Python for time ranges.
@@ -86,6 +87,6 @@ docker exec -it news-mcp python3 scripts/backfill_junction_tables.py
 
 ## Known Pitfalls
 - `cwd=__file__` in subprocess calls fails — it's a file path, not a directory. Use `Path(__file__).resolve().parent` or `str(Path(__file__).parent)`.
-- DB file at `news_mcp/data/news.sqlite` in the dev repo is empty (4096 bytes, no tables). The real data lives on the live server in Docker.
+- The dev DB at `data/news.sqlite` was scp-copied from the live server — treat it as a snapshot, not the live DB. Never run destructive scripts against it without explicit direction.
 - Indentation in patched Python files is fragile — always verify with `py_compile.compile(path, doraise=True)` after editing.
 - `updated_at` in the DB is row modification time (set to `datetime.now()` on every upsert), NOT event time. Event time lives in `payload.timestamp`. Always filter by `payload.timestamp` in Python, never by `updated_at` in SQL, for time-range queries.

+ 1 - 1
news_mcp/config.py

@@ -6,7 +6,7 @@ from dotenv import load_dotenv
 _HERE = Path(__file__).resolve().parent.parent
 load_dotenv(_HERE / ".env")
 
-DATA_DIR = Path(os.getenv("NEWS_MCP_DATA_DIR", Path(__file__).resolve().parent / "data"))
+DATA_DIR = Path(os.getenv("NEWS_MCP_DATA_DIR", _HERE / "data"))
 DATA_DIR.mkdir(parents=True, exist_ok=True)
 DB_PATH = Path(os.getenv("NEWS_MCP_DB_PATH", str(DATA_DIR / "news.sqlite")))
 PROMPTS_DIR = Path(os.getenv("NEWS_PROMPTS_DIR", str(_HERE / "prompts")))