This project spans two machines. Always check which machine you're operating on.
| Latitude (dev) | ThinkCenter-2 (live) | |
|---|---|---|
| Hostname | latitude | thinkcenter-2 |
| IP | 192.168.0.249 | 192.168.0.200 |
| Projects dir | /home/lucky/.openclaw/workspace/ |
/home/lucky/ |
| This repo | /home/lucky/.openclaw/workspace/news-mcp/ |
/home/lucky/news-mcp/ |
| DB path | data/news.sqlite (host-side: repo-root data/) |
/app/data/news.sqlite inside Docker container (host bind-mount: data/news.sqlite) |
| Server URL | localhost:8506 | http://192.168.0.200:8506 |
lucky@thinkcenter-2, lucky@latitude).docker-compose up -d news-mcp).ssh lucky@192.168.0.200/app/data/news.sqlite in the container, which bind-mounts from data/news.sqlite on the host FS (relative to repo root)../data/news.sqlite — so run.sh and docker-compose up share the same database. The AGENTS.md section below ("Docker/DB path oddity") no longer applies on this machine..venv first when it exists../tests.sh for offline verification and ./live_tests.sh only for provider-backed smoke checks../run.sh to start the server locally; it resolves the repo root and prefers the local Uvicorn binary.data/ directory contains the same DB the Docker container uses — run.sh and docker-compose up converge on ./data/news.sqlite.news_mcp/mcp_server_fastmcp.py: MCP tool surface, startup refresh, pruning, HTTP health endpoints, REST API.news_mcp/jobs/poller.py: feed refresh loop, clustering, enrichment, and cache writes.news_mcp/storage/sqlite_store.py: SQLite schema (payload_ts, junction tables), upsert with junction population, SQL-level read methods. Single data access layer for MCP tools.news_mcp/dashboard/dashboard_store.py: Read-only query layer for dashboard REST API. Wraps SQLiteClusterStore. Added junction-table entity/keyword search. NOTE: this store duplicates methods from sqlite_store — see Design Flaw in PROJECT.md.news_mcp/dedup/cluster.py: topic bucketing and fuzzy/embedding clustering.news_mcp/enrichment/llm_enrich.py: LLM extraction/summarization and blacklist filtering.news_mcp/trends_resolution.py and news_mcp/related_entities.py: entity resolution and neighborhood lookup.news_mcp/config.py: env-driven defaults and file paths.Time filtering: Always use payload_ts >= ? SQL filter. Never parse JSON timestamps in Python for time ranges.
Entity/keyword search: Use junction tables:
cluster_entities for entity search: JOIN cluster_entities ce ON c.cluster_id = ce.cluster_id WHERE ce.entity = ?cluster_keywords for keyword search: JOIN cluster_keywords ck ON c.cluster_id = ck.cluster_id WHERE ck.keyword = ?Backfill: After schema changes, run scripts/backfill_junction_tables.py in the Docker container:
docker exec -it news-mcp python3 scripts/backfill_junction_tables.py
SQLiteClusterStore and DashboardStore are parallel copies. Only DashboardStore was updated with junction-table entity search. MCP tools (get_events_for_entity, get_news_sentiment) still use SQLiteClusterStore Python-side entity matching with a row limit (top 200), missing entities in older clusters. See PROJECT.md for full analysis and proposed fix.
docker-compose.yml mounts ./:/app with working_dir: /appNEWS_MCP_DB_PATH: ./data/news.sqlite/app/data/news.sqlite in container → data/news.sqlite on hostscripts/normalize_cluster_timestamps.py — always run with explicit --db or set NEWS_MCP_DB_PATH (default now matches ./data/news.sqlite)NEWS_DEFAULT_LOOKBACK_HOURS controls read freshness only.NEWS_PRUNING_ENABLED, NEWS_RETENTION_DAYS, and NEWS_PRUNE_INTERVAL_HOURS control physical deletion.config/entity_aliases.json tight.include_articles=true should keep responses compact and only return minimal article fields.YYYY-MM-DDTHH:MM:SS+00:00) at write time in sanitize_cluster_payload().DATA_DIR is repo-root ./data/ — used by both run.sh and docker-compose up. No env override needed.payload_ts SQL column (VIRTUAL GENERATED) is the ONLY way to filter by event time. Use WHERE payload_ts >= ? in SQL. Never parse JSON timestamps in Python for time ranges.payload.timestamp in JSON is guaranteed YYYY-MM-DDTHH:MM:SS+00:00 at write time (enforced by sanitize_cluster_payload()).updated_at in the DB = row modification time, NOT event time. Never use for time-range queries.README.md, PROJECT.md, and OUTLOOK.md.cwd=__file__ in subprocess calls fails — it's a file path, not a directory. Use Path(__file__).resolve().parent or str(Path(__file__).parent).data/news.sqlite was scp-copied from the live server — treat it as a snapshot, not the live DB. Never run destructive scripts against it without explicit direction.py_compile.compile(path, doraise=True) after editing.updated_at in the DB is row modification time (set to datetime.now() on every upsert), NOT event time. Event time lives in payload.timestamp. Always filter by payload.timestamp in Python, never by updated_at in SQL, for time-range queries.