# news-mcp ## Two-Machine Workflow This project spans two machines. **Always check which machine you're operating on.** | | Latitude (dev) | ThinkCenter-2 (live) | |---|---|---| | **Hostname** | latitude | thinkcenter-2 | | **IP** | 192.168.0.249 | 192.168.0.200 | | **Projects dir** | `/home/lucky/.openclaw/workspace/` | `/home/lucky/` | | **This repo** | `/home/lucky/.openclaw/workspace/news-mcp/` | `/home/lucky/news-mcp/` | | **DB path** | `news_mcp/data/news.sqlite` (usually empty/stale dev copy) | `/app/data/news.sqlite` inside Docker container (host bind-mount: `news_mcp/data/news.sqlite`) | | **Server URL** | localhost:8506 | `http://192.168.0.200:8506` | - The terminal prompt **always shows the machine name** (e.g. `lucky@thinkcenter-2`, `lucky@latitude`). - When commands are pasted they include the prompt — **read it** to know which machine. - When the user says "the live server", "thinkcenter-2", or "remote", they mean 192.168.0.200. - The live server runs in Docker (`docker-compose up -d news-mcp`). - ssh into live: `ssh lucky@192.168.0.200` - The live DB is at `/app/data/news.sqlite` in the container, which bind-mounts from `news_mcp/data/news.sqlite` on the host FS (relative to repo root). - **Do NOT run maintenance/backfill scripts against the dev DB** — it's empty/stale. Either point explicitly to the live DB or tell the user to run it. ## Local Environment - Source the repo-local `.venv` first when it exists. - Prefer `./tests.sh` for offline verification and `./live_tests.sh` only for provider-backed smoke checks. - Use `./run.sh` to start the server locally; it resolves the repo root and prefers the local Uvicorn binary. - The local dev copy has its own separate DB — treat it as empty/stale unless explicitly working with it. ## Repo Map - `news_mcp/mcp_server_fastmcp.py`: MCP tool surface, startup refresh, pruning, and HTTP health endpoints. - `news_mcp/jobs/poller.py`: feed refresh loop, clustering, enrichment, and cache writes. - `news_mcp/storage/sqlite_store.py`: SQLite schema, cluster/entity metadata, feed hashes, and prune state. - `news_mcp/dedup/cluster.py`: topic bucketing and the current fuzzy/embedding clustering path. - `news_mcp/enrichment/llm_enrich.py`: LLM extraction/summarization and blacklist filtering. - `news_mcp/trends_resolution.py` and `news_mcp/related_entities.py`: local Google Trends-based entity resolution and neighborhood lookup. - `news_mcp/config.py`: env-driven defaults and file paths. ## Docker / Live Server Details - `docker-compose.yml` mounts `./:/app` with `working_dir: /app` - Data dir and DB path both hardcoded in docker-compose env: `NEWS_MCP_DB_PATH: ./data/news.sqlite` - Target DB on live server: `/app/data/news.sqlite` in container → `news_mcp/data/news.sqlite` on host - Backfill script: `scripts/normalize_cluster_timestamps.py` — always run with explicit `--db` or set `NEWS_MCP_DB_PATH` - The dev DB at `news_mcp/data/news.sqlite` is a separate empty file — never confuse it with the live DB ## Current Contracts - Clusters are the unit of truth, not raw articles. - `NEWS_DEFAULT_LOOKBACK_HOURS` controls read freshness only. - `NEWS_PRUNING_ENABLED`, `NEWS_RETENTION_DAYS`, and `NEWS_PRUNE_INTERVAL_HOURS` control physical deletion. - Entity aliasing is intentionally conservative; keep `config/entity_aliases.json` tight. - `include_articles=true` should keep responses compact and only return minimal article fields. - Timestamps in cluster payloads are normalized to ISO 8601 UTC (`YYYY-MM-DDTHH:MM:SS+00:00`) at write time in `sanitize_cluster_payload()`. ## Editing Rules - Keep changes aligned with the docs in `README.md`, `PROJECT.md`, and `OUTLOOK.md`. - Prefer narrow fixes over contract changes unless the user explicitly asks to expand behavior. - Do not run destructive maintenance scripts without a dry run first. - If a change touches storage or pruning, verify it against a temp DB or isolated test fixture rather than the live database. - When writing infrastructure/MCP code that will run on the live server, think about the Docker context (paths, env vars, mount points). ## Known Pitfalls - `cwd=__file__` in subprocess calls fails — it's a file path, not a directory. Use `Path(__file__).resolve().parent` or `str(Path(__file__).parent)`. - DB file at `news_mcp/data/news.sqlite` in the dev repo is empty (4096 bytes, no tables). The real data lives on the live server in Docker. - Indentation in patched Python files is fragile — always verify with `py_compile.compile(path, doraise=True)` after editing. - `updated_at` in the DB is row modification time (set to `datetime.now()` on every upsert), NOT event time. Event time lives in `payload.timestamp`. Always filter by `payload.timestamp` in Python, never by `updated_at` in SQL, for time-range queries.