This project spans two machines. Always check which machine you're operating on.
| Latitude (dev) | ThinkCenter-2 (live) | |
|---|---|---|
| Hostname | latitude | thinkcenter-2 |
| IP | 192.168.0.249 | 192.168.0.200 |
| Projects dir | /home/lucky/.openclaw/workspace/ |
/home/lucky/ |
| This repo | /home/lucky/.openclaw/workspace/news-mcp/ |
/home/lucky/news-mcp/ |
| DB path | news_mcp/data/news.sqlite (usually empty/stale dev copy) |
/app/data/news.sqlite inside Docker container (host bind-mount: news_mcp/data/news.sqlite) |
| Server URL | localhost:8506 | http://192.168.0.200:8506 |
lucky@thinkcenter-2, lucky@latitude).docker-compose up -d news-mcp).ssh lucky@192.168.0.200/app/data/news.sqlite in the container, which bind-mounts from news_mcp/data/news.sqlite on the host FS (relative to repo root).NEWS_MCP_DB_PATH=./data/news.sqlite (relative to working_dir=/app), so the container's DB is at /app/data/news.sqlite. But the host .env does NOT override this, so running the same script on the host resolves DB_PATH to the config default (news_mcp/data/news.sqlite) — a different, usually empty file. The docker-compose env vars only apply inside the container..venv first when it exists../tests.sh for offline verification and ./live_tests.sh only for provider-backed smoke checks../run.sh to start the server locally; it resolves the repo root and prefers the local Uvicorn binary.news_mcp/mcp_server_fastmcp.py: MCP tool surface, startup refresh, pruning, HTTP health endpoints, REST API.news_mcp/jobs/poller.py: feed refresh loop, clustering, enrichment, and cache writes.news_mcp/storage/sqlite_store.py: SQLite schema (payload_ts, junction tables), upsert with junction population, SQL-level read methods. Single data access layer for MCP tools.news_mcp/dashboard/dashboard_store.py: Read-only query layer for dashboard REST API. Wraps SQLiteClusterStore. Added junction-table entity/keyword search. NOTE: this store duplicates methods from sqlite_store — see Design Flaw in PROJECT.md.news_mcp/dedup/cluster.py: topic bucketing and fuzzy/embedding clustering.news_mcp/enrichment/llm_enrich.py: LLM extraction/summarization and blacklist filtering.news_mcp/trends_resolution.py and news_mcp/related_entities.py: entity resolution and neighborhood lookup.news_mcp/config.py: env-driven defaults and file paths.Time filtering: Always use payload_ts >= ? SQL filter. Never parse JSON timestamps in Python for time ranges.
Entity/keyword search: Use junction tables:
cluster_entities for entity search: JOIN cluster_entities ce ON c.cluster_id = ce.cluster_id WHERE ce.entity = ?cluster_keywords for keyword search: JOIN cluster_keywords ck ON c.cluster_id = ck.cluster_id WHERE ck.keyword = ?Backfill: After schema changes, run scripts/backfill_junction_tables.py in the Docker container:
docker exec -it news-mcp python3 scripts/backfill_junction_tables.py
SQLiteClusterStore and DashboardStore are parallel copies. Only DashboardStore was updated with junction-table entity search. MCP tools (get_events_for_entity, get_news_sentiment) still use SQLiteClusterStore Python-side entity matching with a row limit (top 200), missing entities in older clusters. See PROJECT.md for full analysis and proposed fix.
docker-compose.yml mounts ./:/app with working_dir: /appNEWS_MCP_DB_PATH: ./data/news.sqlite/app/data/news.sqlite in container → news_mcp/data/news.sqlite on hostscripts/normalize_cluster_timestamps.py — always run with explicit --db or set NEWS_MCP_DB_PATHnews_mcp/data/news.sqlite is a separate empty file — never confuse it with the live DBNEWS_DEFAULT_LOOKBACK_HOURS controls read freshness only.NEWS_PRUNING_ENABLED, NEWS_RETENTION_DAYS, and NEWS_PRUNE_INTERVAL_HOURS control physical deletion.config/entity_aliases.json tight.include_articles=true should keep responses compact and only return minimal article fields.YYYY-MM-DDTHH:MM:SS+00:00) at write time in sanitize_cluster_payload().payload_ts SQL column (VIRTUAL GENERATED) is the ONLY way to filter by event time. Use WHERE payload_ts >= ? in SQL. Never parse JSON timestamps in Python for time ranges.payload.timestamp in JSON is guaranteed YYYY-MM-DDTHH:MM:SS+00:00 at write time (enforced by sanitize_cluster_payload()).updated_at in the DB = row modification time, NOT event time. Never use for time-range queries.README.md, PROJECT.md, and OUTLOOK.md.cwd=__file__ in subprocess calls fails — it's a file path, not a directory. Use Path(__file__).resolve().parent or str(Path(__file__).parent).news_mcp/data/news.sqlite in the dev repo is empty (4096 bytes, no tables). The real data lives on the live server in Docker.py_compile.compile(path, doraise=True) after editing.updated_at in the DB is row modification time (set to datetime.now() on every upsert), NOT event time. Event time lives in payload.timestamp. Always filter by payload.timestamp in Python, never by updated_at in SQL, for time-range queries.