PROJECT.md 4.8 KB

Atlas-MCP Project Plan (Agent View)

Context from the manifest

  • Atlas is the only semantic intelligence layer; it resolves entities, mediates ontologies, and expands graphs strictly through expand(entity, constraints, depth) workflows.
  • Atlas consumes input from domain-specific MCPs (news-mcp in the current ticket), resolves entities, and computes enrichment datasets. Resolution comes first; enrichment is secondary and may vary over time.
  • Ontologies remain data; Atlas maps between external sources, the canonical layer, and the domain ontologies without embedding domain-specific logic.
  • We should be ontology-first: model the representations before chasing enrichment details.
  • The derived layer is the domain-specific representation used by facts-mcp, news-mcp, and similar applications; enrichment is the dataset that feeds it.
  • Facts-mcp is a useful cautionary reference: the authoritative truth layer should stay small and explicit, while Atlas remains the semantic interpreter and never turns into a general fact store.

Today’s mission

  1. Baseline service
    • Stand up a FastAPI/Uvicorn app in app/main.py with the /health route that reports status, uptime, and FastMCP registration placeholders on port 8550.
    • Keep the web layer minimal so future MCPs call Atlas as the sole semantic brain, consistent with the manifest’s separation rule.
  2. Operational scripts
    • scripts/run.sh: launch uvicorn (and FastMCP registration) with sensible logging on port 8550.
    • scripts/killserver.sh: stop lingering uvicorn processes, report if stale instances were found, and exit cleanly.
    • scripts/restart.sh: call killserver.sh then run.sh so restarts stay sequential.
    • scripts/tests.sh: probe /health and verify the expected contract before moving on to richer tests.
  3. Documentation lineage
    • README.md (human-facing) summarizes the architecture, today’s goals, folder layout, and the news/virtuoso collaboration strategy.
    • PROJECT.md (this file) tracks agent priorities and reminders about the manifest’s hard rules.
    • Provenance & model configuration: any LLM used for classification must be configurable via .env, and provenance emitted by Atlas must explicitly identify both the model and the provider (e.g., provider=openai, model=gpt-5.4-nano) so downstream consumers can audit reproducibility.
  4. Dependencies & housekeeping
    • requirements.txt lists FastAPI, uvicorn, fastmcp, rdflib, httpx, and any enrichment helpers we’ll need in the canonical layer.
    • gitignore covers Python artifacts, FastAPI logs, and typical OS noise.

Immediate placeholders

  • The /health route should respond with {"status": "ok"}, uptime_seconds, and fastmcp_registered fields, with TODOs for wiring real service discovery.
  • Keep TODO comments in the code pointing to entity resolution, ontology mapping, and enrichment so the manifest’s strict responsibilities stay visible.

Follow-on goals

  • Build modules for news-mcp bindings that leave Atlas as the interpreter, with news-mcp defining relevance/constraints while Atlas owns canonicalization and enrichment.
  • Consider a Virtuoso-backed cache/knowledge-graph layer for resolved entities that carry MID, Wikidata ID, and source provenance.
  • Atlas should not need a second database yet; only virtuoso-mcp talks to the underlying Virtuoso instance. Atlas must read/write triples exclusively through virtuoso-mcp, never directly.
  • Add tests simulating news-mcp entity requests to assert Atlas returns canonical IDs plus a derived subgraph that flows into Virtuoso.
  • Even stubs should be tested; the shape of the contract matters before the implementation details are complete.
  • If a known alias is resolved a hundred times, Atlas should reuse the stored mapping instead of asking upstream services again.
  • Add a dedicated integration test layer for resolution and enrichment once the graph clients exist.
  • If we port resolution in-house later, keep the external resolver behind a thin adapter so the internal contract does not change.
  • Document expand(entity, constraints, depth) expectations, starting with rdflib-based stubs and SPARQL placeholders for future enrichment work.
  • Keep the implementation precise: no enrichment in news-mcp, no graph execution in Atlas, and no semantic interpretation in Virtuoso.
  • Add maintenance routines (script/cron) to re-check entities with missing source data (especially missing Wikidata), and to supersede stale claims without bloating the schema.
  • Next major Atlas refinement: transition atlas_id itself to a uniform opaque identifier format, and switch RDF node IRIs to opaque, collision-safe hash IRIs (Entity, Claim, Identifier_, etc.), keeping semantics exclusively in triples and not encoded in identifier strings.