|
|
@@ -1,9 +1,31 @@
|
|
|
# news-mcp
|
|
|
|
|
|
+## Two-Machine Workflow
|
|
|
+
|
|
|
+This project spans two machines. **Always check which machine you're operating on.**
|
|
|
+
|
|
|
+| | Latitude (dev) | ThinkCenter-2 (live) |
|
|
|
+|---|---|---|
|
|
|
+| **Hostname** | latitude | thinkcenter-2 |
|
|
|
+| **IP** | 192.168.0.249 | 192.168.0.200 |
|
|
|
+| **Projects dir** | `/home/lucky/.openclaw/workspace/` | `/home/lucky/` |
|
|
|
+| **This repo** | `/home/lucky/.openclaw/workspace/news-mcp/` | `/home/lucky/news-mcp/` |
|
|
|
+| **DB path** | `news_mcp/data/news.sqlite` (usually empty/stale dev copy) | `/app/data/news.sqlite` inside Docker container (host bind-mount: `news_mcp/data/news.sqlite`) |
|
|
|
+| **Server URL** | localhost:8506 | `http://192.168.0.200:8506` |
|
|
|
+
|
|
|
+- The terminal prompt **always shows the machine name** (e.g. `lucky@thinkcenter-2`, `lucky@latitude`).
|
|
|
+- When commands are pasted they include the prompt — **read it** to know which machine.
|
|
|
+- When the user says "the live server", "thinkcenter-2", or "remote", they mean 192.168.0.200.
|
|
|
+- The live server runs in Docker (`docker-compose up -d news-mcp`).
|
|
|
+- ssh into live: `ssh lucky@192.168.0.200`
|
|
|
+- The live DB is at `/app/data/news.sqlite` in the container, which bind-mounts from `news_mcp/data/news.sqlite` on the host FS (relative to repo root).
|
|
|
+- **Do NOT run maintenance/backfill scripts against the dev DB** — it's empty/stale. Either point explicitly to the live DB or tell the user to run it.
|
|
|
+
|
|
|
## Local Environment
|
|
|
- Source the repo-local `.venv` first when it exists.
|
|
|
- Prefer `./tests.sh` for offline verification and `./live_tests.sh` only for provider-backed smoke checks.
|
|
|
- Use `./run.sh` to start the server locally; it resolves the repo root and prefers the local Uvicorn binary.
|
|
|
+- The local dev copy has its own separate DB — treat it as empty/stale unless explicitly working with it.
|
|
|
|
|
|
## Repo Map
|
|
|
- `news_mcp/mcp_server_fastmcp.py`: MCP tool surface, startup refresh, pruning, and HTTP health endpoints.
|
|
|
@@ -14,15 +36,30 @@
|
|
|
- `news_mcp/trends_resolution.py` and `news_mcp/related_entities.py`: local Google Trends-based entity resolution and neighborhood lookup.
|
|
|
- `news_mcp/config.py`: env-driven defaults and file paths.
|
|
|
|
|
|
+## Docker / Live Server Details
|
|
|
+- `docker-compose.yml` mounts `./:/app` with `working_dir: /app`
|
|
|
+- Data dir and DB path both hardcoded in docker-compose env: `NEWS_MCP_DB_PATH: ./data/news.sqlite`
|
|
|
+- Target DB on live server: `/app/data/news.sqlite` in container → `news_mcp/data/news.sqlite` on host
|
|
|
+- Backfill script: `scripts/normalize_cluster_timestamps.py` — always run with explicit `--db` or set `NEWS_MCP_DB_PATH`
|
|
|
+- The dev DB at `news_mcp/data/news.sqlite` is a separate empty file — never confuse it with the live DB
|
|
|
+
|
|
|
## Current Contracts
|
|
|
- Clusters are the unit of truth, not raw articles.
|
|
|
- `NEWS_DEFAULT_LOOKBACK_HOURS` controls read freshness only.
|
|
|
- `NEWS_PRUNING_ENABLED`, `NEWS_RETENTION_DAYS`, and `NEWS_PRUNE_INTERVAL_HOURS` control physical deletion.
|
|
|
- Entity aliasing is intentionally conservative; keep `config/entity_aliases.json` tight.
|
|
|
- `include_articles=true` should keep responses compact and only return minimal article fields.
|
|
|
+- Timestamps in cluster payloads are normalized to ISO 8601 UTC (`YYYY-MM-DDTHH:MM:SS+00:00`) at write time in `sanitize_cluster_payload()`.
|
|
|
|
|
|
## Editing Rules
|
|
|
- Keep changes aligned with the docs in `README.md`, `PROJECT.md`, and `OUTLOOK.md`.
|
|
|
- Prefer narrow fixes over contract changes unless the user explicitly asks to expand behavior.
|
|
|
- Do not run destructive maintenance scripts without a dry run first.
|
|
|
- If a change touches storage or pruning, verify it against a temp DB or isolated test fixture rather than the live database.
|
|
|
+- When writing infrastructure/MCP code that will run on the live server, think about the Docker context (paths, env vars, mount points).
|
|
|
+
|
|
|
+## Known Pitfalls
|
|
|
+- `cwd=__file__` in subprocess calls fails — it's a file path, not a directory. Use `Path(__file__).resolve().parent` or `str(Path(__file__).parent)`.
|
|
|
+- DB file at `news_mcp/data/news.sqlite` in the dev repo is empty (4096 bytes, no tables). The real data lives on the live server in Docker.
|
|
|
+- Indentation in patched Python files is fragile — always verify with `py_compile.compile(path, doraise=True)` after editing.
|
|
|
+- `updated_at` in the DB is row modification time (set to `datetime.now()` on every upsert), NOT event time. Event time lives in `payload.timestamp`. Always filter by `payload.timestamp` in Python, never by `updated_at` in SQL, for time-range queries.
|