Atlas is the only semantic intelligence layer; it resolves entities, mediates ontologies, and expands graphs strictly through expand(entity, constraints, depth) workflows.
Atlas consumes input from domain-specific MCPs (news-mcp in the current ticket), resolves entities, and computes enrichment datasets. Resolution comes first; enrichment is secondary and may vary over time.
Ontologies remain data; Atlas maps between external sources, the canonical layer, and the domain ontologies without embedding domain-specific logic.
We should be ontology-first: model the representations before chasing enrichment details.
The derived layer is the domain-specific representation used by facts-mcp, news-mcp, and similar applications; enrichment is the dataset that feeds it.
Facts-mcp is a useful cautionary reference: the authoritative truth layer should stay small and explicit, while Atlas remains the semantic interpreter and never turns into a general fact store.
Today’s mission
Baseline service
Stand up a FastAPI/Uvicorn app in app/main.py with the /health route that reports status, uptime, and FastMCP registration placeholders on port 8550.
Keep the web layer minimal so future MCPs call Atlas as the sole semantic brain, consistent with the manifest’s separation rule.
Operational scripts
scripts/run.sh: launch uvicorn (and FastMCP registration) with sensible logging on port 8550.
scripts/killserver.sh: stop lingering uvicorn processes, report if stale instances were found, and exit cleanly.
scripts/restart.sh: call killserver.sh then run.sh so restarts stay sequential.
scripts/tests.sh: probe /health and verify the expected contract before moving on to richer tests.
Documentation lineage
README.md (human-facing) summarizes the architecture, today’s goals, folder layout, and the news/virtuoso collaboration strategy.
PROJECT.md (this file) tracks agent priorities and reminders about the manifest’s hard rules.
Provenance & model configuration: any LLM used for classification must be configurable via .env, and provenance emitted by Atlas must explicitly identify both the model and the provider (e.g., provider=openai, model=gpt-5.4-nano) so downstream consumers can audit reproducibility.
Dependencies & housekeeping
requirements.txt lists FastAPI, uvicorn, fastmcp, rdflib, httpx, and any enrichment helpers we’ll need in the canonical layer.
gitignore covers Python artifacts, FastAPI logs, and typical OS noise.
Immediate placeholders
The /health route should respond with {"status": "ok"}, uptime_seconds, and fastmcp_registered fields, with TODOs for wiring real service discovery.
Keep TODO comments in the code pointing to entity resolution, ontology mapping, and enrichment so the manifest’s strict responsibilities stay visible.
Follow-on goals
Build modules for news-mcp bindings that leave Atlas as the interpreter, with news-mcp defining relevance/constraints while Atlas owns canonicalization and enrichment.
Consider a Virtuoso-backed cache/knowledge-graph layer for resolved entities that carry MID, Wikidata ID, and source provenance.
Atlas should not need a second database yet; only virtuoso-mcp talks to the underlying Virtuoso instance. Atlas must read/write triples exclusively through virtuoso-mcp, never directly.
Add tests simulating news-mcp entity requests to assert Atlas returns canonical IDs plus a derived subgraph that flows into Virtuoso.
Even stubs should be tested; the shape of the contract matters before the implementation details are complete.
If a known alias is resolved a hundred times, Atlas should reuse the stored mapping instead of asking upstream services again.
Add a dedicated integration test layer for resolution and enrichment once the graph clients exist.
If we port resolution in-house later, keep the external resolver behind a thin adapter so the internal contract does not change.
Document expand(entity, constraints, depth) expectations, starting with rdflib-based stubs and SPARQL placeholders for future enrichment work.
Keep the implementation precise: no enrichment in news-mcp, no graph execution in Atlas, and no semantic interpretation in Virtuoso.
Add maintenance routines (script/cron) to re-check entities with missing source data (especially missing Wikidata), and to supersede stale claims without bloating the schema.
Next major Atlas refinement: transition atlas_id itself to a uniform opaque identifier format, and switch RDF node IRIs to opaque, collision-safe hash IRIs (Entity, Claim, Identifier_, etc.), keeping semantics exclusively in triples and not encoded in identifier strings.