Sfoglia il codice sorgente

Atlas v0.0.1 claim lifecycle and docs

Lukas Goldschmidt 1 mese fa
commit
9c72d7b1ad

+ 10 - 0
.env.example

@@ -0,0 +1,10 @@
+# Atlas MCP sample environment configuration
+ATLAS_VIRTUOSO_MCP_SSE_URL=http://192.168.0.249:8501/mcp/sse
+ATLAS_VIRTUOSO_MCP_TIMEOUT=10
+ATLAS_VIRTUOSO_MCP_SSE_READ_TIMEOUT=300
+ATLAS_GRAPH_IRI=http://world.eu.org/atlas_data#
+ATLAS_PREFIX_IRI=http://world.eu.org/atlas_ontology#
+ATLAS_ENTITY_ALIASES_FILE=./config/entity_aliases.json
+GROQ_API_KEY=
+OPENAI_API_KEY=
+ATLAS_WIKIDATA_USER_AGENT=Atlas/1.0 (contact: lukas.goldschmidt+atlas@googlemail.com)

+ 228 - 0
ATLAS_ONTOLOGY.md

@@ -0,0 +1,228 @@
+# Atlas Internal Ontology Draft
+
+This file sketches the **internal canonical ontology** for Atlas.
+
+Atlas is not a facts store and not a domain app. It is the semantic resolver and enricher that normalizes entities into a stable internal model, then persists those mappings and related graph data in Virtuoso via the MCP server.
+
+The ontology below is deliberately small at first. It is meant to support:
+- repeated entity resolution without re-querying external services
+- provenance tracking
+- canonical labeling
+- external identifier mapping
+- future enrichment outputs
+
+## Core idea
+
+A single real-world thing may have many aliases and many external identifiers.
+Atlas should resolve all of those into one canonical internal entity record, then attach graph evidence around it.
+
+Example:
+- input alias: `Trump`
+- canonical entity: a single Atlas entity node
+- external identifiers: Wikidata ID, MID, provider-specific IDs
+- provenance: where the mapping came from
+- derived representation: the bundle consumed by facts-mcp / news-mcp
+
+## Proposed classes
+
+### `atlas:Entity`
+The canonical internal entity node.
+
+**Purpose**
+- the stable internal identity for one real-world referent
+
+**Typical fields**
+- `atlas:entityId`
+- `atlas:canonicalLabel`
+- `atlas:entityType`
+- `atlas:createdAt`
+- `atlas:updatedAt`
+
+---
+
+### `atlas:Alias`
+A surface form, nickname, variant label, or query label that can resolve to an entity.
+
+**Typical fields**
+- `atlas:aliasLabel`
+- `atlas:aliasLanguage`
+- `atlas:aliasSource`
+
+---
+
+### `atlas:ExternalIdentifier`
+An identifier from another system.
+
+**Examples**
+- Wikidata QID
+- Google Knowledge Graph MID
+- provider-specific ids
+
+**Typical fields**
+- `atlas:identifierValue`
+- `atlas:identifierSource`
+- `atlas:identifierType`
+
+---
+
+### `atlas:Provenance`
+Where a mapping or claim came from.
+
+**Typical fields**
+- `atlas:provenanceSource`
+- `atlas:retrievedAt`
+- `atlas:retrievalMethod`
+- `atlas:confidence`
+
+---
+
+### `atlas:ResolvedMapping`
+A record that says: this alias or external identifier points to this canonical entity.
+
+**Typical fields**
+- `atlas:sourceRef`
+- `atlas:targetEntity`
+- `atlas:provenance`
+- `atlas:status`
+
+---
+
+### `atlas:EnrichmentDataset`
+A computed set of related entities, relations, and evidence produced by Atlas.
+
+This is a **function result**, not the final domain-facing bundle.
+
+**Typical fields**
+- `atlas:seedEntity`
+- `atlas:relatedEntity`
+- `atlas:relatedRelation`
+- `atlas:queryContext`
+- `atlas:generatedAt`
+
+---
+
+### `atlas:EntityType`
+Canonical type nodes owned by Atlas.
+
+**Purpose**
+- represent the stable internal class (Person, Organization, Instrument, etc.)
+- map external type labels/URIs onto Atlas types (via `owl:sameAs`/`skos:exactMatch` style links)
+
+### `atlas:ExternalType`
+Raw type evidence from sources such as Google Trends, Wikidata, etc.
+
+**Purpose**
+- capture the literal strings we received (e.g. "46th U.S. President")
+- keep provenance about where/when we saw them
+- allow later mapping to canonical Atlas types
+
+### `atlas:DomainProjection`
+The conceptual bundle consumed by domain apps.
+
+This is the representation a domain-specific app uses after Atlas has resolved and enriched the entity.
+
+**Typical fields**
+- `atlas:projectionFor`
+- `atlas:sourceEntity`
+- `atlas:projectionPayload`
+- `atlas:projectionContext`
+
+## Key predicates
+
+### Identity and naming
+- `atlas:canonicalLabel`
+- `atlas:aliasLabel`
+- `atlas:entityType` (literal fallback when canonical type is unknown)
+
+### Type system
+- `atlas:hasCanonicalType` (Entity → EntityType)
+- `atlas:hasExternalType` (Entity → ExternalType)
+- `atlas:externalTypeLabel`
+- `atlas:equivalentType` / `owl:sameAs` links to external ontologies
+
+### Mapping
+- `atlas:hasAlias`
+- `atlas:hasExternalIdentifier`
+- `atlas:resolvedTo`
+- `atlas:preferredIdentifier`
+
+### Provenance
+- `atlas:hasProvenance`
+- `atlas:provenanceSource`
+- `atlas:retrievedAt`
+- `atlas:confidence`
+
+### Enrichment
+- `atlas:hasEnrichment`
+- `atlas:relatedEntity`
+- `atlas:relatedRelation`
+- `atlas:enrichmentDepth`
+
+### Domain projection
+- `atlas:hasDomainProjection`
+- `atlas:projectionFor`
+- `atlas:projectionPayload`
+
+## Minimal resolution flow
+
+1. Receive alias/text.
+2. Check Virtuoso for an existing mapping.
+3. If found, return the canonical entity.
+4. If not found, query upstream resolution sources.
+5. Normalize the result into the Atlas ontology.
+6. Store the mapping and provenance in Virtuoso.
+7. Return the resolved entity.
+
+## Minimal enrichment flow
+
+1. Receive canonical entity.
+2. Compute a related-entity dataset.
+3. Attach constraints, provenance, and depth.
+4. Return the enrichment dataset.
+5. Optionally build a domain projection from the result.
+
+## Draft Turtle sketch
+
+```turtle
+@prefix atlas: <http://world.eu.org/atlas_ontology#> .
+@prefix atlas_data: <http://world.eu.org/atlas_data#> .
+@prefix xsd:   <http://www.w3.org/2001/XMLSchema#> .
+
+atls:entity/trump a atlas:Entity ;
+  atlas:canonicalLabel "Donald Trump" ;
+  atlas:hasCanonicalType atls:type/person ;
+  atlas:entityType "person" ;  # literal fallback from resolver
+  atlas:preferredIdentifier atls:ext/wikidata/Q22686 ;
+  atlas:hasAlias atls:alias/trump ;
+  atlas:hasProvenance atls:prov/resolve/google-trends-2026-04-03 .
+
+atls:type/person a atlas:EntityType ;
+  atlas:canonicalLabel "Person" ;
+  atlas:equivalentType <http://schema.org/Person> .
+
+atls:alias/trump a atlas:Alias ;
+  atlas:aliasLabel "Trump" ;
+  atlas:aliasSource "query" .
+
+atls:ext/wikidata/Q22686 a atlas:ExternalIdentifier ;
+  atlas:identifierValue "Q22686" ;
+  atlas:identifierSource "wikidata" ;
+  atlas:identifierType "wikidata-qid" .
+
+atls:prov/resolve/google-trends-2026-04-03 a atlas:Provenance ;
+  atlas:provenanceSource "google-trends" ;
+  atlas:retrievalMethod "entity-resolution" ;
+  atlas:confidence "0.93"^^xsd:decimal .
+```
+
+## Open questions
+
+- Should `atlas:Entity` be one node per referent, with aliases and IDs attached as properties, or should aliases and identifiers be fully separate nodes?
+- Should the canonical label be unique or only preferred?
+- Which fields are required for the first cache hit path?
+- How much of the enrichment dataset should be persisted versus computed on demand?
+- What is the smallest useful domain projection for facts-mcp and news-mcp?
+
+## Working rule
+
+If a property is unclear, keep the ontology small and make the implementation prove it later.

+ 145 - 0
CLAIM_TRIPLE_MAPPING.md

@@ -0,0 +1,145 @@
+# Atlas Claim → Triple Mapping v1
+
+This document defines deterministic RDF emission from claim objects.
+
+## Prefixes
+- `atlas:` → `http://world.eu.org/atlas_ontology#`
+- `atlas_data:` → `http://world.eu.org/atlas_data#`
+
+---
+
+## 1) Core rule
+
+Each claim yields:
+1. one domain triple (subject-predicate-object)
+2. one claim node (`atlas:Claim`) carrying metadata
+3. one provenance node linked to the claim
+
+This ensures provenance is attached to *the specific statement*.
+
+---
+
+## 2) Claim node shape
+
+```turtle
+atlas_data:claim_<id> a atlas:Claim ;
+  atlas:claimSubject <subject> ;
+  atlas:claimPredicate <predicate> ;
+  atlas:claimObject <object-or-literal> ;
+  atlas:claimLayer "raw|derived" ;
+  atlas:hasProvenance atlas_data:prov_<id> .
+```
+
+### Provenance node
+
+```turtle
+atlas_data:prov_<id> a atlas:Provenance ;
+  atlas:provenanceSource "wikidata" ;
+  atlas:retrievalMethod "wbsearchentities+entitydata" ;
+  atlas:confidence "0.99"^^xsd:decimal ;
+  atlas:retrievedAt "2026-04-03T18:00:00Z"^^xsd:dateTime .
+```
+
+---
+
+## 3) Mapping table
+
+## 3.1 Identifier claim
+
+Input claim:
+```json
+{
+  "predicate": "atlas:hasIdentifier",
+  "object": {"kind": "identifier", "id_type": "atlas:WikidataQID", "value": "Q22686"}
+}
+```
+
+Triples:
+```turtle
+atlas_data:entity_trump atlas:hasIdentifier atlas_data:ident_q22686 .
+
+atlas_data:ident_q22686 a atlas:Identifier ;
+  atlas:identifierType atlas:WikidataQID ;
+  atlas:identifierValue "Q22686" .
+```
+
++ claim/provenance nodes
+
+---
+
+## 3.2 External type claim
+
+```turtle
+atlas_data:entity_trump atlas:hasExternalType atlas:WikidataType_Q5 .
+```
+
++ claim/provenance nodes
+
+---
+
+## 3.3 Canonical type claim (derived)
+
+```turtle
+atlas_data:entity_trump atlas:hasCanonicalType atlas:Person .
+```
+
++ claim/provenance nodes (source might be `wikidata`, `groq-llm`, or combined adjudication)
+
+---
+
+## 3.4 Alias claim
+
+```turtle
+atlas_data:entity_trump atlas:hasAlias atlas_data:alias_trump .
+atlas_data:alias_trump a atlas:Alias ;
+  atlas:aliasLabel "Trump" ;
+  atlas:resolvedTo atlas_data:entity_trump .
+```
+
++ claim/provenance nodes
+
+---
+
+## 3.5 Description claim
+
+```turtle
+atlas_data:entity_trump atlas:canonicalDescription "45th and 47th U.S. President" .
+```
+
++ claim/provenance nodes
+
+---
+
+## 4) Read-back policy
+
+When reading from store:
+- reconstruct normal response from canonical triples
+- reconstruct debug claims from `atlas:Claim` + linked provenance nodes
+- never infer provenance if claim metadata is missing
+
+---
+
+## 5) Write policy
+
+- Batch writes preferred (`batch_insert`) to reduce call overhead.
+- One entity resolution write should include:
+  - entity node
+  - identifier nodes
+  - alias nodes
+  - claim nodes
+  - provenance nodes
+- Idempotency key should be derived from `(entity_id, claim_id)`.
+
+---
+
+## 6) Minimal first storage set
+
+For first iteration, write:
+- canonical label
+- canonical type
+- identifiers (MID + QID where available)
+- alias used in resolution
+- claim/provenance for each of the above
+- needs_curation flag
+
+Add enrichment-derived claims in later phase.

+ 508 - 0
MANIFEST.md

@@ -0,0 +1,508 @@
+# 🧠 MCP Knowledge Graph Architecture Manifest (Final)
+
+## 0. Purpose
+
+Define a **strict, minimal, and general architecture** for MCP systems that operate on knowledge graphs.
+
+This architecture separates:
+
+* graph execution
+* semantic intelligence (enrichment)
+* domain-specific operations
+
+It is **deliberately rigid** to ensure:
+
+* clarity for agents
+* composability of systems
+* long-term scalability
+
+---
+
+# 1. Core Principles
+
+### 1.1 Absolute Separation
+
+Each layer has one responsibility:
+
+* execution
+* intelligence
+* application
+
+These MUST NOT overlap.
+
+---
+
+### 1.2 Enrichment is Abstract
+
+Enrichment is NOT:
+
+* geography-specific
+* ontology-specific
+* domain-specific
+
+Enrichment IS:
+
+> **the discovery of related entities under defined constraints**
+
+---
+
+### 1.3 Ontologies are Data
+
+* Ontologies are inputs
+* Ontologies are not hardcoded logic
+* Ontologies can vary per system
+
+---
+
+### 1.4 Everything is a Graph
+
+* entities
+* relations
+* ontologies
+* queries
+
+---
+
+# 2. System Architecture
+
+```text
+[ Domain-Specific MCPs ]
+            ↓
+        [ Atlas MCP ]
+            ↓
+     [ Virtuoso MCP ]
+```
+
+---
+
+# 3. Layer Definitions
+
+---
+
+## 3.1 Virtuoso MCP
+
+(Virtuoso MCP)
+
+### Role
+
+> **Graph execution layer**
+
+---
+
+### Responsibilities
+
+* store RDF triples
+* execute SPARQL queries
+* provide graph primitives
+
+---
+
+### Capabilities
+
+* traversal by predicate
+* node description
+* relation enumeration
+* path queries
+* prefix handling (technical)
+* named graph management
+
+---
+
+### MUST NOT
+
+* interpret meaning
+* resolve entities
+* enrich data
+* depend on external sources
+
+---
+
+### Philosophy
+
+> “Executes graph operations without understanding them.”
+
+---
+
+## 3.2 Atlas MCP
+
+### Role
+
+> **Semantic intelligence layer**
+
+Atlas is the **only layer allowed to interpret and enrich data**.
+
+---
+
+# 4. Enrichment (Formal Definition)
+
+Enrichment is defined as:
+
+> **The controlled expansion of an entity into a set of related entities based on structural, semantic, or constraint-based relationships.**
+
+---
+
+## 4.1 Enrichment Dimensions
+
+Atlas MUST support enrichment along arbitrary axes:
+
+### Structural
+
+* hierarchy (parent, child)
+* adjacency
+* graph distance
+
+---
+
+### Semantic
+
+* type similarity
+  (e.g. “musicians”)
+
+* shared attributes
+  (e.g. “born 1965”)
+
+---
+
+### Constraint-based
+
+* filtered relations
+* query-defined sets
+
+---
+
+## 4.2 Examples (All Equivalent Forms of Enrichment)
+
+* “Find parent regions of Kumasi”
+* “Find musicians born in 1965 in Ghana”
+* “Find entities with same profession as X”
+* “Find all siblings in a pedigree graph”
+* “Find all servers connected via health routes”
+
+👉 These are NOT different features
+👉 They are **instances of the same abstraction**
+
+---
+
+## 4.3 Core Enrichment Operation
+
+```text
+expand(entity, constraints, depth) → subgraph
+```
+
+Where:
+
+* `entity` = starting node
+* `constraints` = predicates, types, filters
+* `depth` = traversal boundary
+
+---
+
+## 4.4 Key Rule
+
+> Atlas provides the mechanism of enrichment
+> Domain-specific MCPs define what is relevant
+
+---
+
+# 5. Ontology Handling
+
+---
+
+## 5.1 Ontology Layers
+
+Atlas operates on three ontology types:
+
+### External Ontologies
+
+* Wikidata
+* other knowledge graphs
+
+---
+
+### Canonical Ontology (REQUIRED)
+
+Atlas MUST define a **minimal internal schema**
+
+Purpose:
+
+* normalize external data
+* provide stable structure
+
+---
+
+### Domain Ontologies
+
+Defined by domain-specific MCPs:
+
+* facts-mcp
+* garden-mcp
+* news-mcp
+* etc.
+
+---
+
+## 5.2 Core Rule
+
+> Atlas translates between ontologies but owns none of them
+
+---
+
+## 5.3 Ontology Mapping
+
+Atlas MUST:
+
+* ingest ontologies
+* map external → canonical
+* optionally map canonical → domain
+
+Atlas MUST NOT:
+
+* assume domain semantics
+* enforce domain schemas
+
+---
+
+# 6. Atlas Responsibilities
+
+### 6.1 Entity Resolution
+
+* text → entity identifiers
+
+---
+
+### 6.2 Graph Acquisition
+
+* retrieve data from:
+
+  * external knowledge graphs
+  * internal graph store
+
+---
+
+### 6.3 Ontology Mediation
+
+* align different schemas
+* normalize representations
+
+---
+
+### 6.4 Enrichment Execution
+
+* perform expansion queries
+* apply constraints
+* return structured subgraphs
+
+---
+
+### 6.5 Persistence
+
+* store results via Virtuoso MCP
+
+---
+
+### 6.6 Data Layers (MANDATORY)
+
+Atlas MUST maintain:
+
+#### Raw Layer
+
+* source-aligned data
+* full provenance
+
+#### Canonical Layer
+
+* normalized structure
+
+#### Derived Layer
+
+* enrichment results
+* computed relations
+
+---
+
+### Philosophy
+
+> “Atlas turns graphs into usable knowledge.”
+
+---
+
+## 3.3 Domain-Specific MCPs
+
+### Definition
+
+> A **domain-specific MCP** is any MCP that:
+
+* serves a specific purpose
+* defines its own ontology
+* uses Atlas and/or Virtuoso to operate on graph data
+
+---
+
+### Examples
+
+* Facts MCP
+* Garden MCP
+* News MCP
+
+---
+
+### Role
+
+> **Application and decision layer**
+
+---
+
+### Responsibilities
+
+* define domain ontology
+* define relevance and constraints
+* invoke Atlas for enrichment
+* store/query via Virtuoso MCP
+
+---
+
+### Capabilities
+
+* tagging
+* classification
+* domain-specific inference
+* state representation
+
+---
+
+### MUST NOT
+
+* resolve entities independently
+* perform enrichment logic outside Atlas
+* duplicate ontology mapping logic
+
+---
+
+### Key Rule
+
+> Domain-specific MCPs define *what to ask*
+> Atlas defines *how to answer*
+
+---
+
+# 7. Facts MCP (Clarified)
+
+### Role
+
+> **Domain-specific MCP for authoritative state**
+
+---
+
+### Responsibilities
+
+* define “current truth” in its domain
+* maintain state using its ontology
+* query enriched data via Atlas
+
+---
+
+### Example
+
+```text
+:ServerA :isOperational true
+```
+
+---
+
+### Key Rule
+
+> “Fact” is a domain concept, not a system concept
+
+---
+
+# 8. Prefix Handling
+
+| Layer                | Responsibility                     |
+| -------------------- | ---------------------------------- |
+| Virtuoso MCP         | technical prefix resolution        |
+| Atlas MCP            | namespace interpretation & mapping |
+| Domain-Specific MCPs | domain-specific prefixes           |
+
+---
+
+# 9. Data Flow
+
+```text
+Input (text / entity / query)
+        ↓
+Domain-Specific MCP
+        ↓
+Atlas MCP
+    - resolve
+    - enrich
+    - map ontologies
+        ↓
+Virtuoso MCP
+    - execute
+    - persist
+        ↓
+Domain-Specific MCP
+    - apply domain logic
+```
+
+---
+
+# 10. Hard Rules
+
+### Rule 1
+
+Virtuoso MCP MUST remain purely operational
+
+---
+
+### Rule 2
+
+Atlas MCP is the ONLY semantic intelligence layer
+
+---
+
+### Rule 3
+
+All enrichment MUST go through Atlas
+
+---
+
+### Rule 4
+
+All ontology mapping MUST go through Atlas
+
+---
+
+### Rule 5
+
+Domain-specific MCPs MUST NOT duplicate enrichment logic
+
+---
+
+### Rule 6
+
+Domain-specific MCPs define relevance, not mechanisms
+
+---
+
+# 11. Final Model
+
+```text
+Virtuoso MCP  → executes graph operations
+Atlas MCP     → performs semantic expansion
+Domain-Specific MCPs → define purpose and meaning
+```
+
+---
+
+# 12. Final Statement
+
+> Build systems where:
+>
+> * the graph layer executes
+> * the semantic layer expands
+> * the domain-specific layer decides
+>
+> and no layer crosses its boundary.
+
+---

+ 42 - 0
PROJECT.md

@@ -0,0 +1,42 @@
+# Atlas-MCP Project Plan (Agent View)
+
+## Context from the manifest
+* Atlas is the **only** semantic intelligence layer; it resolves entities, mediates ontologies, and expands graphs strictly through `expand(entity, constraints, depth)` workflows.
+* Atlas consumes input from domain-specific MCPs (news-mcp in the current ticket), resolves entities, and computes enrichment datasets. Resolution comes first; enrichment is secondary and may vary over time.
+* Ontologies remain data; Atlas maps between external sources, the canonical layer, and the domain ontologies without embedding domain-specific logic.
+* We should be ontology-first: model the representations before chasing enrichment details.
+* The derived layer is the domain-specific representation used by facts-mcp, news-mcp, and similar applications; enrichment is the dataset that feeds it.
+* Facts-mcp is a useful cautionary reference: the authoritative truth layer should stay small and explicit, while Atlas remains the semantic interpreter and never turns into a general fact store.
+
+## Today’s mission
+1. **Baseline service**
+   * Stand up a FastAPI/Uvicorn app in `app/main.py` with the `/health` route that reports status, uptime, and FastMCP registration placeholders on port `8550`.
+   * Keep the web layer minimal so future MCPs call Atlas as the sole semantic brain, consistent with the manifest’s separation rule.
+2. **Operational scripts**
+   * `scripts/run.sh`: launch uvicorn (and FastMCP registration) with sensible logging on port `8550`.
+   * `scripts/killserver.sh`: stop lingering uvicorn processes, report if stale instances were found, and exit cleanly.
+   * `scripts/restart.sh`: call `killserver.sh` then `run.sh` so restarts stay sequential.
+   * `scripts/tests.sh`: probe `/health` and verify the expected contract before moving on to richer tests.
+3. **Documentation lineage**
+   * `README.md` (human-facing) summarizes the architecture, today’s goals, folder layout, and the news/virtuoso collaboration strategy.
+   * `PROJECT.md` (this file) tracks agent priorities and reminders about the manifest’s hard rules.
+4. **Dependencies & housekeeping**
+   * `requirements.txt` lists FastAPI, uvicorn, fastmcp, rdflib, httpx, and any enrichment helpers we’ll need in the canonical layer.
+   * `gitignore` covers Python artifacts, FastAPI logs, and typical OS noise.
+
+## Immediate placeholders
+* The `/health` route should respond with `{"status": "ok"}`, `uptime_seconds`, and `fastmcp_registered` fields, with TODOs for wiring real service discovery.
+* Keep TODO comments in the code pointing to entity resolution, ontology mapping, and enrichment so the manifest’s strict responsibilities stay visible.
+
+## Follow-on goals
+* Build modules for news-mcp bindings that leave Atlas as the interpreter, with news-mcp defining relevance/constraints while Atlas owns canonicalization and enrichment.
+* Consider a Virtuoso-backed cache/knowledge-graph layer for resolved entities that carry MID, Wikidata ID, and source provenance.
+* Atlas should not need a second database yet; **only** virtuoso-mcp talks to the underlying Virtuoso instance. Atlas must read/write triples exclusively through virtuoso-mcp, never directly.
+* Add tests simulating news-mcp entity requests to assert Atlas returns canonical IDs plus a derived subgraph that flows into Virtuoso.
+* Even stubs should be tested; the shape of the contract matters before the implementation details are complete.
+* If a known alias is resolved a hundred times, Atlas should reuse the stored mapping instead of asking upstream services again.
+* Add a dedicated integration test layer for resolution and enrichment once the graph clients exist.
+* If we port resolution in-house later, keep the external resolver behind a thin adapter so the internal contract does not change.
+* Document `expand(entity, constraints, depth)` expectations, starting with rdflib-based stubs and SPARQL placeholders for future enrichment work.
+* Keep the implementation precise: no enrichment in news-mcp, no graph execution in Atlas, and no semantic interpretation in Virtuoso.
+* Add maintenance routines (script/cron) to re-check entities with missing source data (especially missing Wikidata), and to supersede stale claims without bloating the schema.

+ 63 - 0
README.md

@@ -0,0 +1,63 @@
+# Atlas MVP
+
+Atlas-MCP implements the semantic intelligence tier for the existing MCP stack. It follows the manifest’s mandate: Atlas is the only layer that resolves and enriches entities. For now, Atlas has exactly two public responsibilities: entity resolution and enrichment. The facts-mcp docs reinforce the same design pressure: keep the authoritative truth layer small, canonical, and explicit; Atlas should not blur into that role, but instead cooperate with it through clean graph contracts.
+
+## Today’s goals
+1. Bootstrap the FastMCP + FastAPI service with the basic `/health` route and deployment scripts so the runtime mirrors our other MCP servers.
+2. Capture the goals for news-mcp integration: entity resolution and enrichment only for now; trend discovery, persistence, and caching remain design concerns but are not first-cut Atlas tools.
+3. Document the folder layout, plans, and dependencies so future contributors can extend it safely.
+
+## Architecture snapshot
+* **FastMCP** powers the service boundary and service registration.
+* **FastAPI + Uvicorn** provide the HTTP interface (including `/health`) on port `8550` by default.
+* **Household scripts** (`run.sh`, `killserver.sh`, `restart.sh`) mirror the operational pattern from other MCP projects.
+* **Ontologies and enrichment**: Atlas ingests external definitions, normalizes them into the canonical schema, runs `expand(entity, constraints, depth)` workflows, and emits resolved/enriched representations. Persistence and caching will come later.
+* **Type intelligence**: resolution now feeds a pipeline of (1) local cache, (2) Virtuoso cache, (3) Google Trends evidence, (4) Wikidata `wbsearchentities` + `Special:EntityData` lookups (direct HTTP, with a proper user-agent/contact), (5) optional LLM classification (Groq `meta-llama/llama-4-scout-17b-16e-instruct` by default, falling back to OpenAI `gpt-4o-mini`) with optional caller-provided context, then finally the manual curation flag if everything fails.
+* **News-mcp / Virtuoso-mcp collaboration**: news-mcp asks Atlas to resolve text into canonical IDs; Atlas performs enrichment. Trends may assist resolution, but the resolution logic belongs in Atlas, not news-mcp. When Atlas eventually stores or recalls triples it must do so **via virtuoso-mcp only**; direct connections to the underlying Virtuoso instance are off-limits.
+
+## Folder layout sketch
+```
+atlas-mcp/
+├── README.md
+├── PROJECT.md
+├── requirements.txt
+├── gitignore
+├── app/
+│   ├── __init__.py
+│   └── main.py
+├── scripts/
+│   ├── run.sh
+│   ├── killserver.sh
+│   ├── restart.sh
+│   └── tests.sh
+└── MANIFEST.md ← existing architecture manifesto (reference)
+```
+
+## v0.0.1 status
+Atlas v0.0.1 is the first stable slice of the entity-resolution pipeline:
+- MCP-first `resolve_entity` / `enrich_entity`
+- automatic persistence hook inside resolution
+- active-claims-only normal responses
+- debug responses with raw/derived claims and Turtle dump
+- claim-level provenance and Wikidata lookup
+
+See `RELEASE_NOTES_v0.0.1.md` for the full summary.
+
+## Next steps
+* Keep the `/health` endpoint as a minimal service check on port `8550`.
+* Wire in configuration placeholders for news-mcp and virtuoso-mcp credentials so Atlas can resolve entities and enrich them.
+* Continue tightening claim lifecycle handling and store/read roundtrips via virtuoso-mcp.
+
+## Ontology + graph URIs
+- Atlas’ current ontology file lives in `ontology/atlas.ttl`; load that into Protege to inspect the classes/predicates.
+- The ontology uses the `atlas:` prefix for `http://world.eu.org/atlas_ontology#`, and stored data should use `atlas_data:` for `http://world.eu.org/atlas_data#`.
+- Override them via `.env` (`ATLAS_PREFIX_IRI`, `ATLAS_GRAPH_IRI`) if you need a different URI, but keep it consistent across Protege, the ontology file, and the data graph.
+
+## Quick mcporter check
+After restarting Atlas, you can test the server with your local config file like this:
+
+```bash
+mcporter --config "$CONFIG" call atlas.resolve_entity subject=Trump context="Short snippet for disambiguation"
+```
+
+If your config uses a different server name, swap `atlas` for that name. The optional `context` argument is forwarded to the LLM classifier when needed. The `/health` route is separate and should be checked with HTTP, not mcporter.

+ 30 - 0
RELEASE_NOTES_v0.0.1.md

@@ -0,0 +1,30 @@
+# Atlas v0.0.1 Release Notes
+
+## Summary
+Atlas v0.0.1 establishes the initial entity-resolution and claim-model pipeline.
+
+## Highlights
+- MCP-first Atlas service with `resolve_entity` and `enrich_entity`
+- Automatic persistence hook inside resolution flow (background / fail-soft)
+- Canonical entity model with:
+  - canonical label
+  - canonical description
+  - canonical type
+  - external identifiers
+  - claim-level provenance
+  - curation flag
+- Wikidata lookup with HTTP user-agent and fail-soft behavior
+- Debug mode with:
+  - raw claims
+  - derived claims
+  - Turtle export
+  - optional debug file dump
+- Claim lifecycle support:
+  - `active`
+  - `superseded`
+- SPARQL snippet for entity + claims + provenance retrieval
+
+## Notes
+- Normal responses are active-claims-only.
+- Debug responses may include superseded claims for audit visibility.
+- Claim and provenance structure is designed to stay stable for later write/read roundtrips via virtuoso-mcp.

+ 127 - 0
RESPONSE_SCHEMA.md

@@ -0,0 +1,127 @@
+# Atlas Response Schema v1
+
+This file defines the canonical response contract for `resolve_entity`.
+
+## Design goals
+- One coherent entity record across Raw / Canonical / Derived layers
+- Claim-level provenance (no floating provenance blobs)
+- Same data model for normal + debug output, with debug as superset
+
+---
+
+## 1) Normal response (default)
+
+```json
+{
+  "entity": {
+    "entity_id": "atlas:mid:/m/0cqt90",
+    "canonical_label": "Donald Trump",
+    "canonical_description": "45th and 47th U.S. President",
+    "canonical_type": "atlas:Person",
+    "needs_curation": false,
+    "identifiers": [
+      {"type": "atlas:Mid", "value": "/m/0cqt90"},
+      {"type": "atlas:WikidataQID", "value": "Q22686"}
+    ]
+  },
+  "summary": {
+    "raw_claim_count": 5,
+    "derived_claim_count": 1,
+    "sources": ["google-trends", "wikidata", "groq-llm"]
+  }
+}
+```
+
+### Notes
+- Normal response is compact and consumer-friendly.
+- No giant payload blobs by default.
+- Identifiers and canonical type are always easy to access.
+
+---
+
+## 2) Debug response (`debug=true`)
+
+Debug mode returns the normal response plus:
+
+```json
+{
+  "debug": {
+    "raw_claims": [
+      {
+        "claim_id": "clm_raw_mid_1",
+        "layer": "raw",
+        "subject": "atlas:mid:/m/0cqt90",
+        "predicate": "atlas:hasIdentifier",
+        "object": {"kind": "identifier", "id_type": "atlas:Mid", "value": "/m/0cqt90"},
+        "provenance": {
+          "source": "google-trends",
+          "method": "trends-resolution",
+          "confidence": 0.90,
+          "retrieved_at": "2026-04-03T18:00:00Z"
+        }
+      }
+    ],
+    "derived_claims": [
+      {
+        "claim_id": "clm_drv_type_1",
+        "layer": "derived",
+        "subject": "atlas:mid:/m/0cqt90",
+        "predicate": "atlas:hasCanonicalType",
+        "object": {"kind": "type", "value": "atlas:Person"},
+        "provenance": {
+          "source": "wikidata+llm",
+          "method": "type-adjudication",
+          "confidence": 0.97,
+          "retrieved_at": "2026-04-03T18:00:02Z"
+        }
+      }
+    ],
+    "source_payloads": {
+      "g_trends_payload": {},
+      "wikidata_payload": {},
+      "llm_payload": {}
+    },
+    "turtle": "...",
+    "turtle_path": "/tmp/atlas-debug/trump.ttl"
+  }
+}
+```
+
+### Notes
+- Debug is strictly a superset of normal.
+- Provenance belongs to each claim.
+- Payload snapshots are debug-only.
+
+---
+
+## 3) Layer interpretation
+
+- **Raw layer**: source-aligned facts (MIDs, QIDs, external type claims, labels)
+- **Canonical layer**: Atlas normalized entity fields (canonical label/type/description)
+- **Derived layer**: computed claims (e.g., canonical type adjudication, enrichment links)
+
+All three layers must align around the same `entity_id`.
+
+---
+
+## 4) Field policy
+
+### Required in normal mode
+- `entity.entity_id`
+- `entity.canonical_label`
+- `entity.canonical_type`
+- `entity.needs_curation`
+- `entity.identifiers[]`
+
+### Required in debug mode
+- `debug.raw_claims[]`
+- `debug.derived_claims[]`
+- `debug.source_payloads`
+- `debug.turtle`
+
+---
+
+## 5) Backward compatibility
+
+Current implementation fields (`atlas_id`, `entity_type`, etc.) may remain temporarily,
+but target output should migrate to this schema to avoid ambiguity and drift.

+ 72 - 0
SPARQL_SNIPPETS.md

@@ -0,0 +1,72 @@
+# SPARQL Snippets (Atlas)
+
+Purpose: keep stable, reusable queries close to the project so we do not drift.
+
+## 1) Entity + all attached claims + claim provenance
+
+Use this to read one coherent entity record from the graph.
+
+```sparql
+PREFIX atlas: <http://world.eu.org/atlas_ontology#>
+
+SELECT ?entity ?label ?claim ?pred ?objIri ?objLit ?layer ?prov ?src ?method ?conf ?ts
+WHERE {
+  ?entity a atlas:Entity ;
+          atlas:canonicalLabel ?label ;
+          atlas:hasClaim ?claim .
+
+  ?claim atlas:claimSubjectIri ?entity ;
+         atlas:claimPredicate ?pred ;
+         atlas:claimLayer ?layer .
+
+  OPTIONAL { ?claim atlas:claimObjectIri ?objIri . }
+  OPTIONAL { ?claim atlas:claimObjectLiteral ?objLit . }
+
+  OPTIONAL {
+    ?claim atlas:hasProvenance ?prov .
+    ?prov atlas:provenanceSource ?src .
+    OPTIONAL { ?prov atlas:retrievalMethod ?method . }
+    OPTIONAL { ?prov atlas:confidence ?conf . }
+    OPTIONAL { ?prov atlas:retrievedAt ?ts . }
+  }
+}
+ORDER BY ?entity ?claim
+```
+
+## 2) Same query filtered to one entity node
+
+Replace `atlas_data:entity_atlas_mid__m_0cqt90` with the target entity URI.
+
+```sparql
+PREFIX atlas: <http://world.eu.org/atlas_ontology#>
+PREFIX atlas_data: <http://world.eu.org/atlas_data#>
+
+SELECT ?entity ?label ?claim ?pred ?objIri ?objLit ?layer ?prov ?src ?method ?conf ?ts
+WHERE {
+  VALUES ?entity { atlas_data:entity_atlas_mid__m_0cqt90 }
+
+  ?entity a atlas:Entity ;
+          atlas:canonicalLabel ?label ;
+          atlas:hasClaim ?claim .
+
+  ?claim atlas:claimSubjectIri ?entity ;
+         atlas:claimPredicate ?pred ;
+         atlas:claimLayer ?layer .
+
+  OPTIONAL { ?claim atlas:claimObjectIri ?objIri . }
+  OPTIONAL { ?claim atlas:claimObjectLiteral ?objLit . }
+
+  OPTIONAL {
+    ?claim atlas:hasProvenance ?prov .
+    ?prov atlas:provenanceSource ?src .
+    OPTIONAL { ?prov atlas:retrievalMethod ?method . }
+    OPTIONAL { ?prov atlas:confidence ?conf . }
+    OPTIONAL { ?prov atlas:retrievedAt ?ts . }
+  }
+}
+ORDER BY ?claim
+```
+
+## Notes
+- Keep this file as the canonical place for read/query snippets.
+- If claim schema changes, update this file in the same commit.

+ 1 - 0
app/__init__.py

@@ -0,0 +1 @@
+"""Atlas-MCP application package."""

+ 130 - 0
app/atlas.py

@@ -0,0 +1,130 @@
+"""Atlas semantic core for entity resolution and enrichment."""
+
+from __future__ import annotations
+
+from app.cache import EntityCache
+from app.entity_normalize import normalize_entity
+from app.models import (
+    AtlasAlias,
+    AtlasEntity,
+    AtlasEnrichmentDataset,
+    AtlasIdentifier,
+    AtlasProvenance,
+)
+from app.trends_resolution import resolve_entity_via_trends
+from app.type_classifier import TypeClassification, classify_entity_type
+from app.storage_service import AtlasStorageService
+from app.virtuoso_store import VirtuosoEntityStore
+from app.wikidata_lookup import lookup_wikidata
+
+_entity_cache = EntityCache(max_entries=512)
+_virtuoso_store = VirtuosoEntityStore(max_cache_entries=256)
+_storage = AtlasStorageService()
+
+
+async def resolve_entity(subject: str, context: str | None = None) -> AtlasEntity:
+    normalized = normalize_entity(subject)
+    token = normalized.strip().lower()
+    cached = _entity_cache.get(token)
+    if cached is not None:
+        return cached
+
+    virt_hit = await _virtuoso_store.lookup(token)
+    if virt_hit is not None:
+        _entity_cache.store(virt_hit, extra_tokens=[subject, normalized])
+        return virt_hit
+
+    resolution = resolve_entity_via_trends(subject)
+    classification = await classify_entity_type(subject, resolution, context)
+    wikidata = await lookup_wikidata(subject)
+    entity = _entity_from_resolution(subject, resolution, classification, wikidata)
+    _entity_cache.store(entity, extra_tokens=[subject, normalized])
+    try:
+        await _storage.write_entity(entity)
+    except Exception:
+        pass
+    return entity
+
+
+def _entity_from_resolution(subject: str, resolution: dict, classification: TypeClassification, wikidata: dict | None = None) -> AtlasEntity:
+    canonical_label = (
+        resolution.get("canonical_label")
+        or resolution.get("normalized")
+        or subject.strip()
+    )
+    atlas_id = resolution.get("mid")
+    if atlas_id:
+        atlas_id = f"atlas:mid:{atlas_id.strip()}"
+    else:
+        slug = canonical_label.strip().lower().replace(" ", "-") or "entity"
+        atlas_id = f"atlas:{slug}"
+
+    identifiers = []
+    mid = resolution.get("mid")
+    if mid:
+        identifiers.append(
+            AtlasIdentifier(value=mid, source="google", identifier_type="mid")
+        )
+    if wikidata and wikidata.get("qid"):
+        identifiers.append(
+            AtlasIdentifier(value=wikidata["qid"], source="wikidata", identifier_type="qid")
+        )
+
+    provenance = [
+        AtlasProvenance(
+            source=resolution.get("source") or "resolver",
+            retrieval_method="trends-resolution",
+            confidence=0.9 if resolution.get("mid") else 0.3,
+            retrieved_at=resolution.get("resolved_at"),
+        )
+    ]
+    if classification.provenance:
+        provenance.append(classification.provenance)
+    if wikidata and wikidata.get("qid"):
+        provenance.append(
+            AtlasProvenance(
+                source="wikidata",
+                retrieval_method="wbsearchentities + entitydata",
+                confidence=0.99,
+                retrieved_at=wikidata.get("retrieved_at"),
+            )
+        )
+
+    canonical_type = (
+        classification.canonical_type
+        or resolution.get("type")
+        or "unknown"
+    )
+
+    payload = dict(resolution)
+    if wikidata:
+        payload["wikidata"] = {
+            "status": "ok",
+            "qid": wikidata.get("qid"),
+            "label": wikidata.get("label"),
+            "description": wikidata.get("description"),
+            "retrieved_at": wikidata.get("retrieved_at"),
+        }
+    else:
+        payload["wikidata"] = {"status": "missing"}
+
+    return AtlasEntity(
+        atlas_id=atlas_id,
+        canonical_label=canonical_label,
+        canonical_description=(wikidata or {}).get("description"),
+        entity_type=canonical_type,
+        aliases=[AtlasAlias(label=subject.strip() or canonical_label)],
+        identifiers=identifiers,
+        provenance=provenance,
+        raw_payload=payload,
+        needs_curation=classification.needs_curation,
+    )
+
+
+def enrich_entity(entity: AtlasEntity, constraints=None, depth: int = 1) -> AtlasEnrichmentDataset:
+    return AtlasEnrichmentDataset(
+        seed_entity=entity,
+        related_entities=[],
+        query_context=constraints or {},
+        depth=depth,
+    )

+ 44 - 0
app/cache.py

@@ -0,0 +1,44 @@
+"""Local caches for Atlas."""
+
+from __future__ import annotations
+
+from collections import OrderedDict
+from typing import Iterable, Optional
+
+from app.models import AtlasEntity
+
+
+class EntityCache:
+    def __init__(self, max_entries: int = 512):
+        self.max_entries = max_entries
+        self._data: OrderedDict[str, AtlasEntity] = OrderedDict()
+
+    def _normalize_token(self, token: str | None) -> str:
+        return str(token or "").strip().lower()
+
+    def get(self, token: str | None) -> Optional[AtlasEntity]:
+        key = self._normalize_token(token)
+        if not key:
+            return None
+        entity = self._data.get(key)
+        if entity is not None:
+            self._data.move_to_end(key)
+        return entity
+
+    def store(self, entity: AtlasEntity, extra_tokens: Iterable[str] | None = None) -> None:
+        tokens = set(extra_tokens or [])
+        tokens.add(entity.canonical_label)
+        tokens.add(entity.atlas_id)
+        for alias in entity.aliases:
+            tokens.add(alias.label)
+        mid = entity.raw_payload.get("mid") if isinstance(entity.raw_payload, dict) else None
+        if mid:
+            tokens.add(mid)
+        for token in tokens:
+            key = self._normalize_token(token)
+            if not key:
+                continue
+            self._data[key] = entity
+            self._data.move_to_end(key)
+        while len(self._data) > self.max_entries:
+            self._data.popitem(last=False)

+ 117 - 0
app/claims.py

@@ -0,0 +1,117 @@
+"""Claim extraction helpers for Atlas layered outputs."""
+
+from __future__ import annotations
+
+from typing import Any
+
+from app.models import AtlasEntity, AtlasProvenance
+
+
+def _prov_to_dict(p: AtlasProvenance | None) -> dict[str, Any] | None:
+    if p is None:
+        return None
+    return {
+        "source": p.source,
+        "method": p.retrieval_method,
+        "confidence": p.confidence,
+        "retrieved_at": p.retrieved_at,
+    }
+
+
+def _pick_provenance(entity: AtlasEntity, *, source_hint: str | None = None, method_hint: str | None = None) -> AtlasProvenance | None:
+    if not entity.provenance:
+        return None
+    if method_hint:
+        for p in entity.provenance:
+            if p.retrieval_method == method_hint:
+                return p
+    if source_hint:
+        for p in entity.provenance:
+            if p.source == source_hint:
+                return p
+    return entity.provenance[0]
+
+
+def _id_type_resource(identifier_type: str) -> str:
+    if identifier_type == "mid":
+        return "atlas:Mid"
+    if identifier_type == "qid":
+        return "atlas:WikidataQID"
+    return f"atlas:{identifier_type}"
+
+
+def build_claim_sets(entity: AtlasEntity) -> tuple[list[dict[str, Any]], list[dict[str, Any]]]:
+    raw_claims: list[dict[str, Any]] = []
+    derived_claims: list[dict[str, Any]] = []
+
+    for ident in entity.identifiers:
+        prov = _pick_provenance(entity, source_hint=ident.source)
+        raw_claims.append(
+            {
+                "claim_id": f"clm_raw_ident_{ident.identifier_type}_{ident.value}",
+                "layer": "raw",
+                "subject": entity.atlas_id,
+                "predicate": "atlas:hasIdentifier",
+                "object": {
+                    "kind": "identifier",
+                    "id_type": _id_type_resource(ident.identifier_type),
+                    "value": ident.value,
+                },
+                "provenance": _prov_to_dict(prov),
+            }
+        )
+
+    for alias in entity.aliases:
+        raw_claims.append(
+            {
+                "claim_id": f"clm_raw_alias_{alias.label}",
+                "layer": "raw",
+                "subject": entity.atlas_id,
+                "predicate": "atlas:hasAlias",
+                "object": {"kind": "alias", "value": alias.label},
+                "provenance": _prov_to_dict(_pick_provenance(entity, method_hint="trends-resolution")),
+            }
+        )
+
+    wd = entity.raw_payload.get("wikidata") or {}
+    if wd.get("status") == "ok":
+        derived_claims.append(
+            {
+                "claim_id": "clm_drv_wikidata_type",
+                "layer": "derived",
+                "subject": entity.atlas_id,
+                "predicate": "atlas:hasExternalType",
+                "object": {"kind": "external_type", "value": "atlas:WikidataType_Q5", "qid": wd.get("qid")},
+                "provenance": {
+                    "source": "wikidata",
+                    "method": "wbsearchentities + entitydata",
+                    "confidence": 0.99,
+                    "retrieved_at": wd.get("retrieved_at"),
+                },
+            }
+        )
+    else:
+        raw_claims.append(
+            {
+                "claim_id": "clm_raw_wikidata_missing",
+                "layer": "raw",
+                "subject": entity.atlas_id,
+                "predicate": "atlas:wikidataLookupStatus",
+                "object": {"kind": "literal", "value": wd.get("status", "missing")},
+                "provenance": _prov_to_dict(_pick_provenance(entity, method_hint="trends-resolution")),
+            }
+        )
+
+    type_prov = _pick_provenance(entity, method_hint="type-classification")
+    derived_claims.append(
+        {
+            "claim_id": "clm_drv_canonical_type",
+            "layer": "derived",
+            "subject": entity.atlas_id,
+            "predicate": "atlas:hasCanonicalType",
+            "object": {"kind": "type", "value": f"atlas:{entity.entity_type}"},
+            "provenance": _prov_to_dict(type_prov),
+        }
+    )
+
+    return raw_claims, derived_claims

+ 17 - 0
app/config.py

@@ -0,0 +1,17 @@
+"""Atlas configuration helpers."""
+
+import os
+from pathlib import Path
+
+from dotenv import load_dotenv
+
+PROJECT_ROOT = Path(__file__).resolve().parent.parent
+load_dotenv(PROJECT_ROOT / ".env")
+
+CONFIG_DIR = PROJECT_ROOT / "config"
+CONFIG_DIR.mkdir(parents=True, exist_ok=True)
+
+ENTITY_ALIASES_FILE = Path(os.getenv("ATLAS_ENTITY_ALIASES_FILE", CONFIG_DIR / "entity_aliases.json"))
+ENTITY_ALIASES_FILE.parent.mkdir(parents=True, exist_ok=True)
+if not ENTITY_ALIASES_FILE.exists():
+    ENTITY_ALIASES_FILE.write_text("{}\n", encoding="utf-8")

+ 50 - 0
app/entity_normalize.py

@@ -0,0 +1,50 @@
+"""Entity normalization helpers reused from news-mcp."""
+
+from __future__ import annotations
+
+import json
+from functools import lru_cache
+from pathlib import Path
+from typing import Iterable
+
+from .config import ENTITY_ALIASES_FILE
+
+
+def _alias_map() -> dict[str, str]:
+    path = Path(ENTITY_ALIASES_FILE)
+    if not path.exists():
+        return {}
+    try:
+        raw = json.loads(path.read_text(encoding="utf-8"))
+    except Exception:
+        return {}
+    out: dict[str, str] = {}
+    if isinstance(raw, dict):
+        for k, v in raw.items():
+            if k and v:
+                out[str(k).strip().lower()] = str(v).strip()
+    return out
+
+
+def _lookup_alias(key: str) -> str | None:
+    return _alias_map().get(key)
+
+
+def normalize_entity(value: str) -> str:
+    key = str(value).strip().lower()
+    if not key:
+        return ""
+    return _lookup_alias(key) or str(value).strip()
+
+
+def normalize_entities(values: Iterable[str]) -> list[str]:
+    out: list[str] = []
+    seen: set[str] = set()
+    for value in values or []:
+        norm = normalize_entity(value)
+        key = norm.lower()
+        if not norm or key in seen:
+            continue
+        seen.add(key)
+        out.append(norm)
+    return out

+ 31 - 0
app/main.py

@@ -0,0 +1,31 @@
+"""FastAPI entrypoint for Atlas-MCP."""
+
+from __future__ import annotations
+
+from datetime import datetime, timezone
+from typing import Dict
+
+from fastapi import FastAPI
+
+from .mcp_server import mcp
+
+START_TIME = datetime.now(timezone.utc)
+
+app = FastAPI(
+    title="Atlas-MCP",
+    description="Semantic intelligence layer for entity resolution and enrichment.",
+    version="0.1.0",
+)
+app.mount("/mcp", mcp.sse_app())
+
+
+@app.get("/health", tags=["liveness"])
+async def health() -> Dict[str, object]:
+    now = datetime.now(timezone.utc)
+    uptime_seconds = (now - START_TIME).total_seconds()
+    return {
+        "status": "ok",
+        "uptime_seconds": round(uptime_seconds, 2),
+        "fastmcp_registered": False,
+        "tools": ["resolve_entity", "enrich_entity"],
+    }

+ 86 - 0
app/mcp_server.py

@@ -0,0 +1,86 @@
+"""FastMCP transport for Atlas tools."""
+
+from __future__ import annotations
+
+from pathlib import Path
+
+from mcp.server.fastmcp import FastMCP
+from mcp.server.transport_security import TransportSecuritySettings
+
+from .atlas import enrich_entity, resolve_entity
+from .claims import build_claim_sets
+from .triple_export import entity_to_turtle
+
+mcp = FastMCP(
+    "atlas",
+    transport_security=TransportSecuritySettings(
+        enable_dns_rebinding_protection=False
+    ),
+)
+
+
+@mcp.tool(name="resolve_entity", description="Resolve a subject string to a canonical Atlas entity.")
+async def resolve_entity_tool(subject: str, context: str | None = None, debug: bool = False, debug_path: str | None = None):
+    entity = await resolve_entity(subject, context)
+    result = {
+        "atlas_id": entity.atlas_id,
+        "canonical_label": entity.canonical_label,
+        "canonical_description": entity.canonical_description,
+        "entity_type": entity.entity_type,
+        "needs_curation": entity.needs_curation,
+        "aliases": [alias.label for alias in entity.aliases],
+        "identifiers": [
+            {
+                "value": identifier.value,
+                "source": identifier.source,
+                "identifier_type": identifier.identifier_type,
+            }
+            for identifier in entity.identifiers
+        ],
+        "provenance": [
+            {
+                "source": provenance.source,
+                "retrieval_method": provenance.retrieval_method,
+                "confidence": provenance.confidence,
+                "retrieved_at": provenance.retrieved_at,
+            }
+            for provenance in entity.provenance
+        ],
+        "g_trends_payload": {k: v for k, v in entity.raw_payload.items() if k != "wikidata"},
+        "wikidata_payload": entity.raw_payload.get("wikidata"),
+    }
+    if debug:
+        raw_claims, derived_claims = build_claim_sets(entity)
+        turtle = entity_to_turtle(entity)
+        result["raw_claims"] = raw_claims
+        result["derived_claims"] = derived_claims
+        result["turtle"] = turtle
+        if debug_path:
+            path = Path(debug_path)
+            path.parent.mkdir(parents=True, exist_ok=True)
+            path.write_text(turtle, encoding="utf-8")
+            result["turtle_path"] = str(path)
+    return result
+
+
+@mcp.tool(name="enrich_entity", description="Enrich a canonical Atlas entity.")
+async def enrich_entity_tool(subject: str, depth: int = 1, context: str | None = None):
+    entity = await resolve_entity(subject, context)
+    result = enrich_entity(entity, depth=depth)
+    return {
+        "seed": {
+            "atlas_id": result.seed_entity.atlas_id,
+            "canonical_label": result.seed_entity.canonical_label,
+        },
+        "related_entities": [
+            {
+                "atlas_id": item.atlas_id,
+                "canonical_label": item.canonical_label,
+                "entity_type": item.entity_type,
+            }
+            for item in result.related_entities
+        ],
+        "query_context": result.query_context,
+        "depth": result.depth,
+    }
+

+ 47 - 0
app/models.py

@@ -0,0 +1,47 @@
+"""Atlas internal data models."""
+
+from dataclasses import dataclass, field
+from typing import Any, Dict, List, Optional
+
+
+@dataclass
+class AtlasIdentifier:
+    value: str
+    source: str
+    identifier_type: str
+
+
+@dataclass
+class AtlasAlias:
+    label: str
+    language: str = "und"
+    source: str = "query"
+
+
+@dataclass
+class AtlasProvenance:
+    source: str
+    retrieval_method: str
+    confidence: float = 0.0
+    retrieved_at: Optional[str] = None
+
+
+@dataclass
+class AtlasEntity:
+    atlas_id: str
+    canonical_label: str
+    canonical_description: str | None = None
+    entity_type: str = "unknown"
+    aliases: List[AtlasAlias] = field(default_factory=list)
+    identifiers: List[AtlasIdentifier] = field(default_factory=list)
+    provenance: List[AtlasProvenance] = field(default_factory=list)
+    raw_payload: Dict[str, Any] = field(default_factory=dict)
+    needs_curation: bool = False
+
+
+@dataclass
+class AtlasEnrichmentDataset:
+    seed_entity: AtlasEntity
+    related_entities: List[AtlasEntity] = field(default_factory=list)
+    query_context: Dict[str, Any] = field(default_factory=dict)
+    depth: int = 1

+ 132 - 0
app/storage_service.py

@@ -0,0 +1,132 @@
+"""Atlas persistence/read service via virtuoso-mcp (MCP transport)."""
+
+from __future__ import annotations
+
+import os
+from typing import Any, Awaitable, Callable
+
+from mcp import ClientSession
+from mcp.client.sse import sse_client
+
+from app.models import AtlasEntity
+from app.triple_export import entity_to_turtle
+
+ATLAS_GRAPH_IRI = os.getenv("ATLAS_GRAPH_IRI", "http://world.eu.org/atlas_data#")
+VIRTUOSO_MCP_SSE_URL = os.getenv("ATLAS_VIRTUOSO_MCP_SSE_URL", "http://192.168.0.249:8501/mcp/sse")
+VIRTUOSO_MCP_TIMEOUT = float(os.getenv("ATLAS_VIRTUOSO_MCP_TIMEOUT", "10"))
+VIRTUOSO_MCP_SSE_READ_TIMEOUT = float(os.getenv("ATLAS_VIRTUOSO_MCP_SSE_READ_TIMEOUT", str(60 * 5)))
+
+CallToolFn = Callable[[str, dict[str, Any]], Awaitable[dict[str, Any]]]
+
+
+def _safe_fragment(value: str) -> str:
+    value = (value or "").strip().lower()
+    out = []
+    for ch in value:
+        if ch.isalnum() or ch in ["_", "-"]:
+            out.append(ch)
+        else:
+            out.append("_")
+    frag = "".join(out).strip("_")
+    return frag or "entity"
+
+
+def entity_iri(entity_id: str) -> str:
+    return f"http://world.eu.org/atlas_data#entity_{_safe_fragment(entity_id)}"
+
+
+class AtlasStorageService:
+    def __init__(self, call_tool: CallToolFn | None = None):
+        self._call_tool_override = call_tool
+
+    async def _call_tool(self, tool_name: str, payload: dict[str, Any]) -> dict[str, Any]:
+        if self._call_tool_override:
+            return await self._call_tool_override(tool_name, payload)
+
+        async with sse_client(
+            VIRTUOSO_MCP_SSE_URL,
+            timeout=VIRTUOSO_MCP_TIMEOUT,
+            sse_read_timeout=VIRTUOSO_MCP_SSE_READ_TIMEOUT,
+        ) as (read_stream, write_stream):
+            async with ClientSession(read_stream, write_stream) as session:
+                await session.initialize()
+                result = await session.call_tool(tool_name, payload)
+                if result.isError:
+                    raise RuntimeError(f"Tool {tool_name} failed: {result.content}")
+                if isinstance(result.structuredContent, dict):
+                    return result.structuredContent
+                return {"content": result.content}
+
+    async def write_entity(self, entity: AtlasEntity) -> dict[str, Any]:
+        ttl = entity_to_turtle(entity)
+        try:
+            result = await self._call_tool(
+                "batch_insert",
+                {
+                    "ttl": ttl,
+                    "graph": ATLAS_GRAPH_IRI,
+                },
+            )
+            return {
+                "status": "ok",
+                "graph": ATLAS_GRAPH_IRI,
+                "entity_id": entity.atlas_id,
+                "result": result,
+            }
+        except Exception as exc:
+            return {
+                "status": "unfinished",
+                "message": "Persistence path not fully available yet",
+                "error": str(exc),
+                "entity_id": entity.atlas_id,
+            }
+
+    async def read_entity_claims(self, entity_id: str, include_superseded: bool = False) -> dict[str, Any]:
+        iri = entity_iri(entity_id)
+        status_filter = "" if include_superseded else 'FILTER(?status = "active")'
+        query = f"""
+PREFIX atlas: <http://world.eu.org/atlas_ontology#>
+SELECT ?entity ?label ?claim ?pred ?objIri ?objLit ?layer ?status ?prov ?src ?method ?conf ?ts
+WHERE {{
+  VALUES ?entity {{ <{iri}> }}
+
+  ?entity a atlas:Entity ;
+          atlas:canonicalLabel ?label ;
+          atlas:hasClaim ?claim .
+
+  ?claim atlas:claimSubjectIri ?entity ;
+         atlas:claimPredicate ?pred ;
+         atlas:claimLayer ?layer ;
+         atlas:claimStatus ?status .
+
+  OPTIONAL {{ ?claim atlas:claimObjectIri ?objIri . }}
+  OPTIONAL {{ ?claim atlas:claimObjectLiteral ?objLit . }}
+
+  OPTIONAL {{
+    ?claim atlas:hasProvenance ?prov .
+    ?prov atlas:provenanceSource ?src .
+    OPTIONAL {{ ?prov atlas:retrievalMethod ?method . }}
+    OPTIONAL {{ ?prov atlas:confidence ?conf . }}
+    OPTIONAL {{ ?prov atlas:retrievedAt ?ts . }}
+  }}
+
+  {status_filter}
+}}
+ORDER BY ?claim
+"""
+        try:
+            result = await self._call_tool("sparql_query", {"query": query})
+            return {
+                "status": "ok",
+                "entity_id": entity_id,
+                "query": query,
+                "result": result,
+            }
+        except Exception as exc:
+            return {
+                "status": "unfinished",
+                "message": "Read path not fully available yet",
+                "error": str(exc),
+                "entity_id": entity_id,
+                "query": query,
+            }

+ 110 - 0
app/trends_resolution.py

@@ -0,0 +1,110 @@
+"""Google Trends-backed entity resolution borrowed from news-mcp."""
+
+from __future__ import annotations
+
+import json
+from datetime import datetime, timezone
+from functools import lru_cache
+from typing import Any
+from urllib.parse import quote
+
+import httpx
+
+from .entity_normalize import normalize_entity
+
+
+class GoogleTrendsError(RuntimeError):
+    pass
+
+
+class GoogleTrendsProvider:
+    _SUGGESTIONS_URL = "https://trends.google.com/trends/api/autocomplete/"
+
+    def __init__(self, *, hl: str = "en-US", tz: int = 120, timeout: float = 10.0):
+        self.hl = hl
+        self.tz = tz
+        self.timeout = timeout
+        self._headers = {
+            "User-Agent": (
+                "Mozilla/5.0 (X11; Linux x86_64) "
+                "AppleWebKit/537.36 (KHTML, like Gecko) "
+                "Chrome/135.0.0.0 Safari/537.36"
+            ),
+            "Accept": "application/json,text/javascript,*/*;q=0.1",
+        }
+
+    def suggestions(self, keyword: str) -> list[dict[str, Any]]:
+        url = self._SUGGESTIONS_URL + quote(keyword)
+        params = {"hl": self.hl, "tz": str(self.tz)}
+        response = httpx.get(
+            url,
+            params=params,
+            headers=self._headers,
+            timeout=self.timeout,
+            follow_redirects=True,
+        )
+        response.raise_for_status()
+        text = response.text.strip()
+        if text.startswith(")]}',"):
+            text = text[5:]
+        payload = json.loads(text)
+        default = payload.get("default") if isinstance(payload, dict) else None
+        topics = default.get("topics") if isinstance(default, dict) else None
+        return topics if isinstance(topics, list) else []
+
+
+@lru_cache(maxsize=1)
+def _provider() -> GoogleTrendsProvider | None:
+    try:
+        return GoogleTrendsProvider()
+    except Exception:
+        return None
+
+
+def _resolved_at() -> str:
+    return datetime.now(timezone.utc).isoformat()
+
+
+@lru_cache(maxsize=1024)
+def resolve_entity_via_trends(subject: str) -> dict[str, Any]:
+    normalized = normalize_entity(subject)
+    if not normalized:
+        return {
+            "raw": subject,
+            "normalized": "",
+            "canonical_label": "",
+            "mid": None,
+            "type": None,
+            "candidates": [],
+            "source": "empty",
+            "resolved_at": _resolved_at(),
+        }
+
+    provider = _provider()
+    if provider is not None:
+        try:
+            suggestions = provider.suggestions(normalized)
+            best = suggestions[0] if suggestions else None
+            return {
+                "raw": subject,
+                "normalized": normalized,
+                "canonical_label": best.get("title") if best else normalized,
+                "mid": best.get("mid") if best else None,
+                "type": best.get("type") if best else None,
+                "candidates": suggestions,
+                "source": "google-trends",
+                "resolved_at": _resolved_at(),
+            }
+        except Exception:
+            pass
+
+    return {
+        "raw": subject,
+        "normalized": normalized,
+        "canonical_label": normalized,
+        "mid": None,
+        "type": None,
+        "candidates": [],
+        "source": "fallback",
+        "resolved_at": _resolved_at(),
+    }

+ 183 - 0
app/triple_export.py

@@ -0,0 +1,183 @@
+"""Serialize resolved Atlas entities to Turtle for inspection or write-path preparation."""
+
+from __future__ import annotations
+
+from app.models import AtlasEntity, AtlasProvenance
+
+PREFIXES = """@prefix atlas: <http://world.eu.org/atlas_ontology#> .
+@prefix atlas_data: <http://world.eu.org/atlas_data#> .
+@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
+@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
+
+"""
+
+
+def _safe_fragment(value: str) -> str:
+    value = (value or "").strip().lower()
+    out = []
+    for ch in value:
+        if ch.isalnum() or ch in ["_", "-"]:
+            out.append(ch)
+        else:
+            out.append("_")
+    frag = "".join(out).strip("_")
+    return frag or "entity"
+
+
+def _entity_node(entity: AtlasEntity) -> str:
+    return f"atlas_data:entity_{_safe_fragment(entity.atlas_id)}"
+
+
+def _alias_node(alias_label: str) -> str:
+    return f"atlas_data:alias_{_safe_fragment(alias_label)}"
+
+
+def _identifier_node(identifier_value: str) -> str:
+    return f"atlas_data:ident_{_safe_fragment(identifier_value)}"
+
+
+def _provenance_node(source: str, retrieved_at: str | None, retrieval_method: str) -> str:
+    parts = [source, retrieval_method, retrieved_at or ""]
+    return f"atlas_data:prov_{_safe_fragment('_'.join(parts))}"
+
+
+def _type_assertion_node(entity: AtlasEntity, source: str) -> str:
+    return f"atlas_data:typeassert_{_safe_fragment(entity.atlas_id)}_{_safe_fragment(source)}"
+
+
+def _literal(text: str) -> str:
+    return text.replace("\\", "\\\\").replace('"', '\\"')
+
+
+def _identifier_type_resource(identifier_type: str) -> str:
+    kind = _safe_fragment(identifier_type)
+    if kind == "mid":
+        return "atlas:Mid"
+    if kind in {"qid", "wikidata_qid", "wikidataqid"}:
+        return "atlas:WikidataQID"
+    return f"atlas:{kind.capitalize()}"
+
+
+def _pick_provenance(entity: AtlasEntity, source_hint: str | None = None, method_hint: str | None = None) -> AtlasProvenance | None:
+    if not entity.provenance:
+        return None
+    if method_hint:
+        for p in entity.provenance:
+            if p.retrieval_method == method_hint:
+                return p
+    if source_hint:
+        for p in entity.provenance:
+            if p.source == source_hint:
+                return p
+    return entity.provenance[0]
+
+
+def entity_to_turtle(entity: AtlasEntity) -> str:
+    lines: list[str] = [PREFIXES]
+    subject = _entity_node(entity)
+
+    claim_nodes = [f"atlas_data:claim_ident_{_safe_fragment(i.value)}" for i in entity.identifiers]
+    if entity.entity_type and entity.entity_type != "unknown":
+        claim_nodes.append(f"atlas_data:claim_type_{_safe_fragment(entity.atlas_id)}")
+
+    lines.append(f"{subject} a atlas:Entity ;")
+    lines.append(f'  atlas:canonicalLabel "{_literal(entity.canonical_label)}" ;')
+    if entity.canonical_description:
+        lines.append(f'  atlas:canonicalDescription "{_literal(entity.canonical_description)}" ;')
+    if entity.entity_type and entity.entity_type != "unknown":
+        lines.append(f"  atlas:hasCanonicalType atlas:{_safe_fragment(entity.entity_type).capitalize()} ;")
+    for alias in entity.aliases:
+        lines.append(f"  atlas:hasAlias {_alias_node(alias.label)} ;")
+    for ident in entity.identifiers:
+        lines.append(f"  atlas:hasIdentifier {_identifier_node(ident.value)} ;")
+    for claim_node in claim_nodes:
+        lines.append(f"  atlas:hasClaim {claim_node} ;")
+    lines.append(f"  atlas:needsCuration {'true' if entity.needs_curation else 'false'} .")
+    lines.append("")
+
+    for alias in entity.aliases:
+        alias_node = _alias_node(alias.label)
+        lines.append(f"{alias_node} a atlas:Alias ;")
+        lines.append(f'  atlas:aliasLabel "{_literal(alias.label)}" ;')
+        lines.append(f"  atlas:resolvedTo {subject} .")
+        lines.append("")
+
+    for ident in entity.identifiers:
+        ident_node = _identifier_node(ident.value)
+        lines.append(f"{ident_node} a atlas:Identifier ;")
+        lines.append(f'  atlas:identifierValue "{_literal(ident.value)}" ;')
+        lines.append(f'  atlas:identifierSource "{_literal(ident.source)}" ;')
+        lines.append(f"  atlas:identifierType {_identifier_type_resource(ident.identifier_type)} ;")
+        prov = _pick_provenance(entity, source_hint=ident.source)
+        if prov:
+            lines.append(f"  atlas:hasIdentifierProvenance {_provenance_node(prov.source, prov.retrieved_at, prov.retrieval_method)} .")
+        else:
+            lines[-1] = lines[-1].rstrip(" ;") + " ."
+        lines.append("")
+
+    for prov in entity.provenance:
+        prov_node = _provenance_node(prov.source, prov.retrieved_at, prov.retrieval_method)
+        lines.append(f"{prov_node} a atlas:Provenance ;")
+        lines.append(f'  atlas:provenanceSource "{_literal(prov.source)}" ;')
+        lines.append(f'  atlas:retrievalMethod "{_literal(prov.retrieval_method)}" ;')
+        lines.append(f'  atlas:confidence "{prov.confidence}"^^xsd:decimal ;')
+        if prov.retrieved_at:
+            lines.append(f'  atlas:retrievedAt "{_literal(prov.retrieved_at)}"^^xsd:dateTime .')
+        else:
+            lines[-1] = lines[-1].rstrip(" ;") + " ."
+        lines.append("")
+
+    wd = entity.raw_payload.get("wikidata") or {}
+    if wd.get("status") == "ok":
+        typeassert_node = _type_assertion_node(entity, "wikidata")
+        lines.append(f"{typeassert_node} a atlas:TypeAssertion ;")
+        lines.append("  atlas:assertedType atlas:WikidataType_Q5 ;")
+        prov = _pick_provenance(entity, source_hint="wikidata")
+        if prov:
+            lines.append(f"  atlas:hasAssertionProvenance {_provenance_node(prov.source, prov.retrieved_at, prov.retrieval_method)} ;")
+        lines.append('  atlas:assertionReason "wikidata instance-of" .')
+        lines.append("")
+
+    if entity.entity_type and entity.entity_type != "unknown":
+        typeassert_node = _type_assertion_node(entity, "canonical")
+        lines.append(f"{typeassert_node} a atlas:TypeAssertion ;")
+        lines.append(f"  atlas:assertedType atlas:{_safe_fragment(entity.entity_type).capitalize()} ;")
+        prov = _pick_provenance(entity, method_hint="type-classification")
+        if prov:
+            lines.append(f"  atlas:hasAssertionProvenance {_provenance_node(prov.source, prov.retrieved_at, prov.retrieval_method)} ;")
+        lines.append('  atlas:assertionReason "canonical type adjudication" .')
+        lines.append("")
+
+    # Claim nodes with explicit claim-object semantics
+    for ident in entity.identifiers:
+        claim_node = f"atlas_data:claim_ident_{_safe_fragment(ident.value)}"
+        ident_node = _identifier_node(ident.value)
+        prov = _pick_provenance(entity, source_hint=ident.source)
+        lines.append(f"{claim_node} a atlas:Claim ;")
+        lines.append(f"  atlas:claimSubjectIri {subject} ;")
+        lines.append('  atlas:claimPredicate "atlas:hasIdentifier" ;')
+        lines.append(f"  atlas:claimObjectIri {ident_node} ;")
+        lines.append('  atlas:claimLayer "raw" ;')
+        lines.append('  atlas:claimStatus "active" ;')
+        if prov:
+            lines.append(f"  atlas:hasProvenance {_provenance_node(prov.source, prov.retrieved_at, prov.retrieval_method)} .")
+        else:
+            lines[-1] = lines[-1].rstrip(" ;") + " ."
+        lines.append("")
+
+    if entity.entity_type and entity.entity_type != "unknown":
+        claim_node = f"atlas_data:claim_type_{_safe_fragment(entity.atlas_id)}"
+        prov = _pick_provenance(entity, method_hint="type-classification")
+        lines.append(f"{claim_node} a atlas:Claim ;")
+        lines.append(f"  atlas:claimSubjectIri {subject} ;")
+        lines.append('  atlas:claimPredicate "atlas:hasCanonicalType" ;')
+        lines.append(f"  atlas:claimObjectIri atlas:{_safe_fragment(entity.entity_type).capitalize()} ;")
+        lines.append('  atlas:claimLayer "derived" ;')
+        lines.append('  atlas:claimStatus "active" ;')
+        if prov:
+            lines.append(f"  atlas:hasProvenance {_provenance_node(prov.source, prov.retrieved_at, prov.retrieval_method)} .")
+        else:
+            lines[-1] = lines[-1].rstrip(" ;") + " ."
+        lines.append("")
+
+    return "\n".join(lines).strip() + "\n"

+ 206 - 0
app/type_classifier.py

@@ -0,0 +1,206 @@
+"""Canonical type classification pipeline for Atlas entities."""
+
+from __future__ import annotations
+
+import json
+import os
+from dataclasses import dataclass
+from datetime import datetime, timezone
+from typing import Optional
+
+import httpx
+
+from app.models import AtlasProvenance
+
+CANONICAL_TYPES = [
+    "Person",
+    "Organization",
+    "Location",
+    "CreativeWork",
+    "Event",
+    "Product",
+    "Other",
+]
+
+WIKIDATA_CLASS_MAP = {
+    "Q5": "Person",
+    "Q43229": "Organization",
+    "Q17334923": "Location",  # human settlement
+    "Q515": "Location",  # city
+    "Q82794": "Location",  # geographic region
+    "Q16521": "Taxon",
+    "Q571": "CreativeWork",
+    "Q11424": "CreativeWork",  # film
+    "Q49848": "CreativeWork",  # album
+    "Q1656682": "Event",
+    "Q191067": "Product",
+}
+
+OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
+OPENAI_MODEL = os.getenv("ATLAS_OPENAI_MODEL", os.getenv("OPENAI_MODEL", "gpt-4o-mini"))
+GROQ_API_KEY = os.getenv("GROQ_API_KEY")
+GROQ_MODEL = os.getenv(
+    "ATLAS_GROQ_MODEL",
+    os.getenv("GROQ_MODEL", "meta-llama/llama-4-scout-17b-16e-instruct"),
+)
+
+
+@dataclass
+class TypeClassification:
+    canonical_type: Optional[str]
+    provenance: Optional[AtlasProvenance]
+    needs_curation: bool
+
+
+async def classify_entity_type(subject: str, resolution: dict, context: Optional[str]) -> TypeClassification:
+    label = resolution.get("canonical_label") or subject
+    wikidata_hit = await _classify_via_wikidata(label)
+    if wikidata_hit is not None:
+        return wikidata_hit
+
+    llm_hit = await _classify_via_llm(subject, resolution, context)
+    if llm_hit is not None:
+        return llm_hit
+
+    return TypeClassification(canonical_type=None, provenance=None, needs_curation=True)
+
+
+async def _classify_via_wikidata(label: str) -> Optional[TypeClassification]:
+    search_params = {
+        "action": "wbsearchentities",
+        "search": label,
+        "language": "en",
+        "limit": 1,
+        "format": "json",
+    }
+    try:
+        async with httpx.AsyncClient(timeout=8) as client:
+            search_resp = await client.get("https://www.wikidata.org/w/api.php", params=search_params)
+            search_resp.raise_for_status()
+            search_data = search_resp.json()
+            if not search_data.get("search"):
+                return None
+            entity_id = search_data["search"][0].get("id")
+            if not entity_id:
+                return None
+            data_resp = await client.get(
+                f"https://www.wikidata.org/wiki/Special:EntityData/{entity_id}.json",
+                params={"flavor": "dump"},
+            )
+            data_resp.raise_for_status()
+            data_payload = data_resp.json()
+            entities = data_payload.get("entities", {})
+            entity_block = entities.get(entity_id)
+            if not entity_block:
+                return None
+            claims = entity_block.get("claims", {})
+            p31 = claims.get("P31", [])
+            for claim in p31:
+                mainsnak = claim.get("mainsnak", {})
+                datavalue = mainsnak.get("datavalue", {})
+                value = datavalue.get("value", {})
+                wid = value.get("id")
+                canonical = WIKIDATA_CLASS_MAP.get(wid)
+                if canonical:
+                    prov = AtlasProvenance(
+                        source="wikidata",
+                        retrieval_method="type-classification",
+                        confidence=0.97,
+                        retrieved_at=datetime.now(timezone.utc).isoformat(),
+                    )
+                    return TypeClassification(canonical_type=canonical, provenance=prov, needs_curation=False)
+    except Exception:
+        return None
+    return None
+
+
+async def _classify_via_llm(subject: str, resolution: dict, context: Optional[str]) -> Optional[TypeClassification]:
+    provider = None
+    if GROQ_API_KEY:
+        provider = "groq"
+    elif OPENAI_API_KEY:
+        provider = "openai"
+    if provider is None:
+        return None
+
+    prompt = _build_llm_prompt(subject, resolution, context)
+    payload = {
+        "model": GROQ_MODEL if provider == "groq" else OPENAI_MODEL,
+        "messages": [
+            {
+                "role": "system",
+                "content": (
+                    "You classify named entities into canonical Atlas types. "
+                    "Valid types: Person, Organization, Location, CreativeWork, Event, Product, Other. "
+                    "Respond with JSON: {\"type\": <type>, \"confidence\": <0-1>, \"reason\": <short>}"
+                ),
+            },
+            {"role": "user", "content": prompt},
+        ],
+        "temperature": 0,
+    }
+    headers = {"Content-Type": "application/json"}
+    url = "https://api.groq.com/openai/v1/chat/completions"
+    if provider == "groq":
+        headers["Authorization"] = f"Bearer {GROQ_API_KEY}"
+    else:
+        headers["Authorization"] = f"Bearer {OPENAI_API_KEY}"
+        url = "https://api.openai.com/v1/chat/completions"
+    try:
+        async with httpx.AsyncClient(timeout=15) as client:
+            resp = await client.post(url, json=payload, headers=headers)
+            resp.raise_for_status()
+            data = resp.json()
+            choice = data.get("choices", [{}])[0]
+            message = choice.get("message", {})
+            content = message.get("content")
+            if not content:
+                return None
+            parsed = _parse_llm_json(content)
+            if not parsed:
+                return None
+            canonical_type = parsed.get("type")
+            confidence = float(parsed.get("confidence", 0))
+            if canonical_type not in CANONICAL_TYPES:
+                return None
+            needs_curation = confidence < 0.6
+            prov = AtlasProvenance(
+                source=f"{provider}-llm",
+                retrieval_method="type-classification",
+                confidence=confidence,
+                retrieved_at=datetime.now(timezone.utc).isoformat(),
+            )
+            return TypeClassification(canonical_type=canonical_type, provenance=prov, needs_curation=needs_curation)
+    except Exception:
+        return None
+
+    return None
+
+
+def _build_llm_prompt(subject: str, resolution: dict, context: Optional[str]) -> str:
+    raw_type = resolution.get("type") or resolution.get("raw_type") or ""
+    candidates = resolution.get("candidates") or []
+    candidate_titles = ", ".join(sorted({c.get("title") for c in candidates if c.get("title")}))
+    parts = [
+        f"Subject: {subject}",
+        f"Canonical label: {resolution.get('canonical_label')}",
+        f"Raw type hints: {raw_type}",
+        f"Candidates: {candidate_titles}",
+    ]
+    if context:
+        parts.append(f"Context: {context}")
+    parts.append(f"Return JSON with keys type/confidence/reason. Types allowed: {', '.join(CANONICAL_TYPES)}")
+    return "\n".join(parts)
+
+
+def _parse_llm_json(text: str) -> Optional[dict]:
+    text = text.strip()
+    if text.startswith("```") and text.endswith("```"):
+        inner = text.strip("`")
+        if inner.lower().startswith("json"):
+            inner = inner[4:]
+        text = inner
+    try:
+        return json.loads(text)
+    except Exception:
+        return None

+ 145 - 0
app/virtuoso_store.py

@@ -0,0 +1,145 @@
+"""Virtuoso MCP bridge for cached entity lookups."""
+
+from __future__ import annotations
+
+import json
+import os
+from collections import OrderedDict
+from typing import Optional
+
+from mcp import ClientSession
+from mcp.client.sse import sse_client
+
+from app.models import AtlasAlias, AtlasEntity, AtlasIdentifier, AtlasProvenance
+
+VIRTUOSO_MCP_SSE_URL = os.getenv("ATLAS_VIRTUOSO_MCP_SSE_URL", "http://192.168.0.249:8501/mcp/sse")
+VIRTUOSO_MCP_TIMEOUT = float(os.getenv("ATLAS_VIRTUOSO_MCP_TIMEOUT", "10"))
+VIRTUOSO_MCP_SSE_READ_TIMEOUT = float(os.getenv("ATLAS_VIRTUOSO_MCP_SSE_READ_TIMEOUT", str(60 * 5)))
+ATLAS_GRAPH_IRI = os.getenv("ATLAS_GRAPH_IRI", "http://world.eu.org/atlas_data#")
+PREFIX_ATLAS = os.getenv("ATLAS_PREFIX_IRI", "http://world.eu.org/atlas_ontology#")
+
+
+class VirtuosoEntityStore:
+    def __init__(self, max_cache_entries: int = 256):
+        self.max_cache_entries = max_cache_entries
+        self._cache: OrderedDict[str, AtlasEntity] = OrderedDict()
+
+    def _cache_key(self, token: str) -> str:
+        return str(token or "").strip().lower()
+
+    def _cache_get(self, token: str) -> Optional[AtlasEntity]:
+        key = self._cache_key(token)
+        if not key:
+            return None
+        hit = self._cache.get(key)
+        if hit is not None:
+            self._cache.move_to_end(key)
+        return hit
+
+    def _cache_set(self, token: str, entity: AtlasEntity) -> None:
+        key = self._cache_key(token)
+        if not key:
+            return
+        self._cache[key] = entity
+        self._cache.move_to_end(key)
+        while len(self._cache) > self.max_cache_entries:
+            self._cache.popitem(last=False)
+
+    async def lookup(self, token: str) -> Optional[AtlasEntity]:
+        cached = self._cache_get(token)
+        if cached is not None:
+            return cached
+        entity = await self._lookup_remote(token)
+        if entity is not None:
+            self._cache_set(token, entity)
+        return entity
+
+    async def _lookup_remote(self, token: str) -> Optional[AtlasEntity]:
+        literal = token.strip().lower()
+        if not literal or not VIRTUOSO_MCP_SSE_URL:
+            return None
+        query = _build_sparql_query(literal)
+        try:
+            async with sse_client(
+                VIRTUOSO_MCP_SSE_URL,
+                timeout=VIRTUOSO_MCP_TIMEOUT,
+                sse_read_timeout=VIRTUOSO_MCP_SSE_READ_TIMEOUT,
+            ) as (read_stream, write_stream):
+                async with ClientSession(read_stream, write_stream) as session:
+                    await session.initialize()
+                    result = await session.call_tool("sparql_query", {"query": query})
+                    if result.isError:
+                        return None
+                    payload = result.structuredContent or _content_to_json(result.content)
+                    if not isinstance(payload, dict):
+                        return None
+                    bindings = (
+                        payload.get("results", {})
+                        .get("bindings", [])
+                        if isinstance(payload.get("results"), dict)
+                        else []
+                    )
+                    if not bindings:
+                        return None
+                    return _entity_from_binding(bindings[0])
+        except Exception:
+            return None
+
+
+def _content_to_json(content):
+    if not content:
+        return None
+    first = content[0]
+    text = getattr(first, "text", None)
+    if not text:
+        return None
+    try:
+        return json.loads(text)
+    except Exception:
+        return None
+
+
+def _build_sparql_query(literal: str) -> str:
+    esc = literal.replace("\\", "\\\\").replace("\"", "\\\"")
+    return f"""
+PREFIX atlas: <{PREFIX_ATLAS}>
+SELECT ?entity ?label ?type ?mid WHERE {{
+  GRAPH <{ATLAS_GRAPH_IRI}> {{
+    ?entity atlas:canonicalLabel ?label .
+    OPTIONAL {{ ?entity atlas:entityType ?type. }}
+    OPTIONAL {{
+      ?entity atlas:hasExternalIdentifier ?identifier .
+      ?identifier atlas:identifierType "mid" .
+      ?identifier atlas:identifierValue ?mid .
+    }}
+  }}
+  FILTER(LCASE(?label) = \"{esc}\")
+}}
+LIMIT 1
+"""
+
+
+def _entity_from_binding(binding: dict) -> AtlasEntity:
+    label = binding.get("label", {}).get("value", "")
+    entity_uri = binding.get("entity", {}).get("value", "")
+    entity_type = binding.get("type", {}).get("value", "unknown")
+    mid = binding.get("mid", {}).get("value")
+    identifiers = []
+    if mid:
+        identifiers.append(AtlasIdentifier(value=mid, source="virtuoso", identifier_type="mid"))
+    provenance = [
+        AtlasProvenance(
+            source="virtuoso-cache",
+            retrieval_method="sparql",
+            confidence=0.95,
+        )
+    ]
+    return AtlasEntity(
+        atlas_id=entity_uri or f"atlas:{label.strip().lower().replace(' ', '-')}",
+        canonical_label=label or entity_uri,
+        entity_type=entity_type or "unknown",
+        aliases=[AtlasAlias(label=label or entity_uri)],
+        identifiers=identifiers,
+        provenance=provenance,
+        raw_payload={"source": "virtuoso", "binding": binding},
+    )

+ 60 - 0
app/wikidata_lookup.py

@@ -0,0 +1,60 @@
+"""Direct Wikidata lookup helpers."""
+
+from __future__ import annotations
+
+import os
+from datetime import datetime, timezone
+from typing import Any, Optional
+
+import httpx
+
+WIKIDATA_TIMEOUT = float(os.getenv("ATLAS_WIKIDATA_TIMEOUT", "10"))
+WIKIDATA_USER_AGENT = os.getenv(
+    "ATLAS_WIKIDATA_USER_AGENT",
+    "Atlas/1.0 (contact: lukas.goldschmidt+atlas@googlemail.com)",
+)
+
+
+async def lookup_wikidata(subject: str) -> Optional[dict[str, Any]]:
+    term = (subject or "").strip()
+    if not term:
+        return None
+
+    async with httpx.AsyncClient(timeout=WIKIDATA_TIMEOUT, follow_redirects=True) as client:
+        search = await client.get(
+            "https://www.wikidata.org/w/api.php",
+            params={
+                "action": "wbsearchentities",
+                "search": term,
+                "language": "en",
+                "format": "json",
+                "limit": 1,
+            },
+            headers={"Accept": "application/json", "User-Agent": WIKIDATA_USER_AGENT},
+        )
+        if search.status_code >= 400:
+            return None
+        payload = search.json()
+        results = payload.get("search") or []
+        if not results:
+            return None
+        top = results[0]
+        qid = top.get("id")
+        if not qid:
+            return None
+        entity = await client.get(
+            f"https://www.wikidata.org/wiki/Special:EntityData/{qid}.json",
+            params={"flavor": "dump"},
+            headers={"Accept": "application/json", "User-Agent": WIKIDATA_USER_AGENT},
+        )
+        if entity.status_code >= 400:
+            return None
+        entity_payload = entity.json()
+        return {
+            "qid": qid,
+            "label": top.get("label") or term,
+            "description": top.get("description"),
+            "entity": entity_payload.get("entities", {}).get(qid, {}),
+            "source": "wikidata",
+            "retrieved_at": datetime.now(timezone.utc).isoformat(),
+        }

+ 17 - 0
config/entity_aliases.json

@@ -0,0 +1,17 @@
+{
+  "btc": "Bitcoin",
+  "bitcoin": "Bitcoin",
+  "eth": "Ethereum",
+  "ether": "Ethereum",
+  "ethereum": "Ethereum",
+  "fed": "Federal Reserve",
+  "federal reserve": "Federal Reserve",
+  "ecb": "European Central Bank",
+  "european central bank": "European Central Bank",
+  "eu": "European Union",
+  "european union": "European Union",
+  "trump": "Donald Trump",
+  "donald trump": "Donald Trump",
+  "merz": "Friedrich Merz",
+  "friedrich merz": "Friedrich Merz"
+}

+ 64 - 0
examples/examples.ttl

@@ -0,0 +1,64 @@
+@prefix atlas: <http://world.eu.org/atlas_ontology#> .
+@prefix atlas_data: <http://world.eu.org/atlas_data#> .
+@prefix xsd:   <http://www.w3.org/2001/XMLSchema#> .
+
+### Joe Biden
+
+atlas_data:joe_biden a atlas:Entity ;
+  atlas:canonicalLabel "Joe Biden" ;
+  atlas:canonicalDescription "46th president of the United States" ;
+  atlas:hasCanonicalType atlas:Person ;
+  atlas:hasExternalType atlas:WikidataType_Q5 ;
+  atlas:hasAlias atlas_data:alias_joe_biden ;
+  atlas:hasIdentifier atlas_data:mid_m012gx2 , atlas_data:wikidata_Q6279 ;
+  atlas:hasTypeAssertion atlas_data:typeassert_joe_biden_wikidata ;
+  atlas:hasClaim atlas_data:claim_joe_biden_mid ;
+  atlas:needsCuration false .
+
+atlas_data:alias_joe_biden a atlas:Alias ;
+  atlas:aliasLabel "Joe Biden" ;
+  atlas:resolvedTo atlas_data:joe_biden .
+
+atlas_data:mid_m012gx2 a atlas:Identifier ;
+  atlas:identifierValue "/m/012gx2" ;
+  atlas:identifierSource "google" ;
+  atlas:identifierType atlas:Mid ;
+  atlas:hasIdentifierProvenance atlas_data:prov_trends_joe_biden_2026_04_03 .
+
+atlas_data:wikidata_Q6279 a atlas:Identifier ;
+  atlas:identifierValue "Q6279" ;
+  atlas:identifierSource "wikidata" ;
+  atlas:identifierType atlas:WikidataQID ;
+  atlas:hasIdentifierProvenance atlas_data:prov_wikidata_joe_biden_2026_04_03 .
+
+atlas_data:typeassert_joe_biden_wikidata a atlas:TypeAssertion ;
+  atlas:assertedType atlas:WikidataType_Q5 ;
+  atlas:hasAssertionProvenance atlas_data:prov_wikidata_joe_biden_2026_04_03 ;
+  atlas:assertionConfidence "0.99"^^xsd:decimal .
+
+atlas_data:prov_trends_joe_biden_2026_04_03 a atlas:Provenance ;
+  atlas:provenanceSource "google-trends" ;
+  atlas:retrievalMethod "trends-resolution" ;
+  atlas:confidence "0.9"^^xsd:decimal ;
+  atlas:retrievedAt "2026-04-03T15:23:36.322743+00:00"^^xsd:dateTime .
+
+atlas_data:prov_wikidata_joe_biden_2026_04_03 a atlas:Provenance ;
+  atlas:provenanceSource "wikidata" ;
+  atlas:retrievalMethod "wbsearchentities + entitydata" ;
+  atlas:confidence "0.99"^^xsd:decimal ;
+  atlas:retrievedAt "2026-04-03T15:30:00+00:00"^^xsd:dateTime .
+
+atlas_data:claim_joe_biden_mid a atlas:Claim ;
+  atlas:claimPredicate "atlas:hasIdentifier" ;
+  atlas:claimObjectIri atlas_data:mid_m012gx2 ;
+  atlas:claimLayer "raw" ;
+  atlas:claimStatus "active" ;
+  atlas:supersedes atlas_data:claim_joe_biden_mid_old ;
+  atlas:hasProvenance atlas_data:prov_trends_joe_biden_2026_04_03 .
+
+atlas_data:claim_joe_biden_mid_old a atlas:Claim ;
+  atlas:claimPredicate "atlas:hasIdentifier" ;
+  atlas:claimObjectLiteral "(missing MID at first resolution)" ;
+  atlas:claimLayer "raw" ;
+  atlas:claimStatus "superseded" ;
+  atlas:hasProvenance atlas_data:prov_trends_joe_biden_2026_04_03 .

+ 12 - 0
gitignore

@@ -0,0 +1,12 @@
+.venv/
+__pycache__/
+*.pyc
+*.pyo
+*.pyd
+*.log
+*.sqlite3
+.env
+.env.*
+.DS_Store
+.idea/
+.vscode/

+ 61 - 0
killserver.sh

@@ -0,0 +1,61 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+cd "$SCRIPT_DIR"
+
+PID_FILE="logs/server.pid"
+LEGACY_PID_FILE="server.pid"
+PORT="${MCP_PORT:-${PORT:-8550}}"
+
+echo "[killserver] Checking for running atlas-mcp instances..."
+
+for pidfile in "$PID_FILE" "$LEGACY_PID_FILE"; do
+  if [[ -f "$pidfile" ]]; then
+    PID="$(cat "$pidfile" 2>/dev/null || true)"
+    if [[ -n "${PID:-}" ]] && kill -0 "$PID" 2>/dev/null; then
+      echo "[killserver] Stopping PID from pidfile: $PID"
+      kill "$PID" || true
+      sleep 0.5
+      if kill -0 "$PID" 2>/dev/null; then
+        echo "[killserver] PID $PID still alive, sending SIGKILL"
+        kill -9 "$PID" || true
+      fi
+    else
+      echo "[killserver] Stale or empty pidfile, removing."
+    fi
+    rm -f "$pidfile"
+  fi
+done
+
+STRAY_PIDS="$(ps -ef | grep -E 'uvicorn[[:space:]]+app\.main:app' | grep -v grep | awk '{print $2}' || true)"
+if [[ -n "${STRAY_PIDS:-}" ]]; then
+  echo "[killserver] Killing stray uvicorn PIDs: $STRAY_PIDS"
+  for p in $STRAY_PIDS; do
+    kill "$p" || true
+  done
+  sleep 0.5
+  for p in $STRAY_PIDS; do
+    if kill -0 "$p" 2>/dev/null; then
+      kill -9 "$p" || true
+    fi
+  done
+fi
+
+if command -v lsof >/dev/null 2>&1; then
+  PORT_PIDS="$(lsof -ti tcp:"$PORT" || true)"
+  if [[ -n "${PORT_PIDS:-}" ]]; then
+    echo "[killserver] Port $PORT still in use by: $PORT_PIDS"
+    for p in $PORT_PIDS; do
+      kill "$p" || true
+    done
+    sleep 0.5
+    for p in $PORT_PIDS; do
+      if kill -0 "$p" 2>/dev/null; then
+        kill -9 "$p" || true
+      fi
+    done
+  fi
+fi
+
+echo "[killserver] Done."

+ 341 - 0
ontology/atlas.ttl

@@ -0,0 +1,341 @@
+@prefix atlas: <http://world.eu.org/atlas_ontology#> .
+@prefix atlas_data: <http://world.eu.org/atlas_data#> .
+@prefix owl:   <http://www.w3.org/2002/07/owl#> .
+@prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
+@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
+@prefix skos:  <http://www.w3.org/2004/02/skos/core#> .
+@prefix xsd:   <http://www.w3.org/2001/XMLSchema#> .
+@prefix schema: <http://schema.org/> .
+@prefix wd:    <http://www.wikidata.org/entity/> .
+
+atlas:Ontology a owl:Ontology ;
+  rdfs:label "Atlas Internal Ontology" ;
+  rdfs:comment "Canonical ontology for entity resolution, type adjudication, and enrichment in Atlas." .
+
+### Core classes
+
+atlas:Entity a owl:Class ;
+  rdfs:label "Entity" ;
+  rdfs:comment "Canonical Atlas entity representing one real-world referent." .
+
+atlas:EntityType a owl:Class ;
+  rdfs:label "Entity Type" ;
+  rdfs:comment "Canonical internal type class owned by Atlas." .
+
+atlas:ExternalType a owl:Class ;
+  rdfs:label "External Type" ;
+  rdfs:comment "Type evidence from external sources such as Wikidata or LLM classification." .
+
+atlas:IdentifierType a owl:Class ;
+  rdfs:label "Identifier Type" ;
+  rdfs:comment "A canonical class for identifier schemes like MID or QID." .
+
+atlas:Mid a owl:Class ;
+  rdfs:subClassOf atlas:IdentifierType ;
+  rdfs:label "MID" .
+
+atlas:WikidataQID a owl:Class ;
+  rdfs:subClassOf atlas:IdentifierType ;
+  rdfs:label "Wikidata QID" .
+
+atlas:Alias a owl:Class ;
+  rdfs:label "Alias" ;
+  rdfs:comment "Surface form or alternative label that may resolve to an entity." .
+
+atlas:Identifier a owl:Class ;
+  rdfs:label "Identifier" ;
+  rdfs:comment "External identifier such as MID, Wikidata QID, etc." .
+
+atlas:Provenance a owl:Class ;
+  rdfs:label "Provenance" ;
+  rdfs:comment "Source and method metadata for a claim or mapping." .
+
+atlas:TypeAssertion a owl:Class ;
+  rdfs:label "Type Assertion" ;
+  rdfs:comment "A recorded statement that a source suggested a type for an entity." .
+
+atlas:Claim a owl:Class ;
+  rdfs:label "Claim" ;
+  rdfs:comment "Atomic statement object with claim-level provenance." .
+
+
+atlas:DomainProjection a owl:Class ;
+  rdfs:label "Domain Projection" ;
+  rdfs:comment "Domain-facing conceptual bundle derived from resolution + enrichment, if we choose to persist it later." .
+
+### Object properties
+
+atlas:hasCanonicalType a owl:ObjectProperty ;
+  rdfs:domain atlas:Entity ;
+  rdfs:range atlas:EntityType ;
+  rdfs:label "has canonical type" .
+
+atlas:hasExternalType a owl:ObjectProperty ;
+  rdfs:domain atlas:Entity ;
+  rdfs:range atlas:ExternalType ;
+  rdfs:label "has external type" .
+
+atlas:hasAlias a owl:ObjectProperty ;
+  rdfs:domain atlas:Entity ;
+  rdfs:range atlas:Alias ;
+  rdfs:label "has alias" .
+
+atlas:hasIdentifier a owl:ObjectProperty ;
+  rdfs:domain atlas:Entity ;
+  rdfs:range atlas:Identifier ;
+  rdfs:label "has identifier" .
+
+atlas:hasProvenance a owl:ObjectProperty ;
+  rdfs:domain atlas:Entity ;
+  rdfs:range atlas:Provenance ;
+  rdfs:label "has provenance" .
+
+atlas:hasIdentifierProvenance a owl:ObjectProperty ;
+  rdfs:domain atlas:Identifier ;
+  rdfs:range atlas:Provenance ;
+  rdfs:label "identifier provenance" .
+
+atlas:derivedFromProvenance a owl:ObjectProperty ;
+  rdfs:domain atlas:TypeAssertion ;
+  rdfs:range atlas:Provenance ;
+  rdfs:label "derived from provenance" .
+
+atlas:hasAssertionProvenance a owl:ObjectProperty ;
+  rdfs:domain atlas:TypeAssertion ;
+  rdfs:range atlas:Provenance ;
+  rdfs:label "assertion provenance" .
+
+atlas:hasTypeAssertion a owl:ObjectProperty ;
+  rdfs:domain atlas:Entity ;
+  rdfs:range atlas:TypeAssertion ;
+  rdfs:label "has type assertion" .
+
+atlas:hasClaim a owl:ObjectProperty ;
+  rdfs:domain atlas:Entity ;
+  rdfs:range atlas:Claim ;
+  rdfs:label "has claim" .
+
+atlas:claimSubjectIri a owl:ObjectProperty ;
+  rdfs:domain atlas:Claim ;
+  rdfs:range atlas:Entity ;
+  rdfs:label "claim subject iri" .
+
+atlas:assertedType a owl:ObjectProperty ;
+  rdfs:domain atlas:TypeAssertion ;
+  rdfs:range atlas:ExternalType ;
+  rdfs:label "asserted type" .
+
+atlas:needsCuration a owl:DatatypeProperty ;
+  rdfs:domain atlas:Entity ;
+  rdfs:range xsd:boolean ;
+  rdfs:label "needs curation" .
+
+atlas:projectionFor a owl:ObjectProperty ;
+  rdfs:domain atlas:DomainProjection ;
+  rdfs:range atlas:Entity ;
+  rdfs:label "projection for" .
+
+atlas:resolvedTo a owl:ObjectProperty ;
+  rdfs:domain atlas:Alias ;
+  rdfs:range atlas:Entity ;
+  rdfs:label "resolved to" .
+
+atlas:sameAsExternalType a owl:ObjectProperty ;
+  rdfs:domain atlas:EntityType ;
+  rdfs:range owl:Class ;
+  rdfs:label "same as external type" .
+
+### Datatype properties
+
+atlas:canonicalLabel a owl:DatatypeProperty ;
+  rdfs:domain atlas:Entity ;
+  rdfs:range xsd:string ;
+  rdfs:label "canonical label" .
+
+atlas:canonicalDescription a owl:DatatypeProperty ;
+  rdfs:domain atlas:Entity ;
+  rdfs:range xsd:string ;
+  rdfs:label "canonical description" .
+
+atlas:entityTypeLabel a owl:DatatypeProperty ;
+  rdfs:domain atlas:EntityType ;
+  rdfs:range xsd:string ;
+  rdfs:label "entity type label" .
+
+atlas:externalTypeLabel a owl:DatatypeProperty ;
+  rdfs:domain atlas:ExternalType ;
+  rdfs:range xsd:string ;
+  rdfs:label "external type label" .
+
+atlas:aliasLabel a owl:DatatypeProperty ;
+  rdfs:domain atlas:Alias ;
+  rdfs:range xsd:string ;
+  rdfs:label "alias label" .
+
+atlas:identifierValue a owl:DatatypeProperty ;
+  rdfs:domain atlas:Identifier ;
+  rdfs:range xsd:string ;
+  rdfs:label "identifier value" .
+
+atlas:identifierSource a owl:DatatypeProperty ;
+  rdfs:domain atlas:Identifier ;
+  rdfs:range xsd:string ;
+  rdfs:label "identifier source" .
+
+atlas:identifierType a owl:ObjectProperty ;
+  rdfs:domain atlas:Identifier ;
+  rdfs:range atlas:IdentifierType ;
+  rdfs:label "identifier type" .
+
+atlas:provenanceSource a owl:DatatypeProperty ;
+  rdfs:domain atlas:Provenance ;
+  rdfs:range xsd:string ;
+  rdfs:label "provenance source" .
+
+atlas:retrievalMethod a owl:DatatypeProperty ;
+  rdfs:domain atlas:Provenance ;
+  rdfs:range xsd:string ;
+  rdfs:label "retrieval method" .
+
+atlas:retrievedAt a owl:DatatypeProperty ;
+  rdfs:domain atlas:Provenance ;
+  rdfs:range xsd:dateTime ;
+  rdfs:label "retrieved at" .
+
+atlas:confidence a owl:DatatypeProperty ;
+  rdfs:domain atlas:Provenance ;
+  rdfs:range xsd:decimal ;
+  rdfs:label "confidence" .
+
+atlas:claimPredicate a owl:DatatypeProperty ;
+  rdfs:domain atlas:Claim ;
+  rdfs:range xsd:string ;
+  rdfs:label "claim predicate" .
+
+atlas:claimObjectIri a owl:ObjectProperty ;
+  rdfs:domain atlas:Claim ;
+  rdfs:range owl:Thing ;
+  rdfs:label "claim object iri" .
+
+atlas:claimObjectLiteral a owl:DatatypeProperty ;
+  rdfs:domain atlas:Claim ;
+  rdfs:range xsd:string ;
+  rdfs:label "claim object literal" .
+
+atlas:claimLayer a owl:DatatypeProperty ;
+  rdfs:domain atlas:Claim ;
+  rdfs:range xsd:string ;
+  rdfs:label "claim layer" .
+
+atlas:claimStatus a owl:DatatypeProperty ;
+  rdfs:domain atlas:Claim ;
+  rdfs:range xsd:string ;
+  rdfs:label "claim status" .
+
+atlas:supersedes a owl:ObjectProperty ;
+  rdfs:domain atlas:Claim ;
+  rdfs:range atlas:Claim ;
+  rdfs:label "supersedes" .
+
+atlas:assertionReason a owl:DatatypeProperty ;
+  rdfs:domain atlas:TypeAssertion ;
+  rdfs:range xsd:string ;
+  rdfs:label "assertion reason" .
+
+atlas:assertionConfidence a owl:DatatypeProperty ;
+  rdfs:domain atlas:TypeAssertion ;
+  rdfs:range xsd:decimal ;
+  rdfs:label "assertion confidence" .
+
+atlas:needsCurationFlag a owl:DatatypeProperty ;
+  rdfs:domain atlas:CurateFlag ;
+  rdfs:range xsd:boolean ;
+  rdfs:label "needs curation flag" .
+
+atlas:curationReason a owl:DatatypeProperty ;
+  rdfs:domain atlas:CurateFlag ;
+  rdfs:range xsd:string ;
+  rdfs:label "curation reason" .
+
+atlas:generatedAt a owl:DatatypeProperty ;
+  rdfs:domain atlas:DomainProjection ;
+  rdfs:range xsd:dateTime ;
+  rdfs:label "generated at" .
+
+atlas:projectionPayload a owl:DatatypeProperty ;
+  rdfs:domain atlas:DomainProjection ;
+  rdfs:range xsd:string ;
+  rdfs:label "projection payload" .
+
+atlas:projectionContext a owl:DatatypeProperty ;
+  rdfs:domain atlas:DomainProjection ;
+  rdfs:range xsd:string ;
+  rdfs:label "projection context" .
+
+### Initial canonical type catalog
+
+atlas:Person a owl:Class ;
+  rdfs:subClassOf atlas:EntityType ;
+  rdfs:label "Person" ;
+  owl:sameAs schema:Person ;
+  owl:sameAs wd:Q5 .
+
+atlas:Organization a owl:Class ;
+  rdfs:subClassOf atlas:EntityType ;
+  rdfs:label "Organization" ;
+  owl:sameAs schema:Organization ;
+  owl:sameAs wd:Q43229 .
+
+atlas:Location a owl:Class ;
+  rdfs:subClassOf atlas:EntityType ;
+  rdfs:label "Location" ;
+  owl:sameAs schema:Place ;
+  owl:sameAs wd:Q17334923 .
+
+atlas:CreativeWork a owl:Class ;
+  rdfs:subClassOf atlas:EntityType ;
+  rdfs:label "Creative Work" ;
+  owl:sameAs schema:CreativeWork ;
+  owl:sameAs wd:Q17537576 .
+
+atlas:Event a owl:Class ;
+  rdfs:subClassOf atlas:EntityType ;
+  rdfs:label "Event" ;
+  owl:sameAs schema:Event ;
+  owl:sameAs wd:Q1656682 .
+
+atlas:Product a owl:Class ;
+  rdfs:subClassOf atlas:EntityType ;
+  rdfs:label "Product" ;
+  owl:sameAs schema:Product ;
+  owl:sameAs wd:Q2424752 .
+
+atlas:Other a owl:Class ;
+  rdfs:subClassOf atlas:EntityType ;
+  rdfs:label "Other" .
+
+### Initial known external type nodes
+
+atlas:WikidataType_Q5 a atlas:ExternalType ;
+  atlas:externalTypeLabel "human" ;
+  atlas:provenanceSource "wikidata" .
+
+atlas:WikidataType_Q43229 a atlas:ExternalType ;
+  atlas:externalTypeLabel "organization" ;
+  atlas:provenanceSource "wikidata" .
+
+atlas:WikidataType_Q17334923 a atlas:ExternalType ;
+  atlas:externalTypeLabel "human settlement" ;
+  atlas:provenanceSource "wikidata" .
+
+atlas:WikidataType_Q17537576 a atlas:ExternalType ;
+  atlas:externalTypeLabel "creative work" ;
+  atlas:provenanceSource "wikidata" .
+
+atlas:WikidataType_Q1656682 a atlas:ExternalType ;
+  atlas:externalTypeLabel "event" ;
+  atlas:provenanceSource "wikidata" .
+
+atlas:WikidataType_Q2424752 a atlas:ExternalType ;
+  atlas:externalTypeLabel "product" ;
+  atlas:provenanceSource "wikidata" .

+ 9 - 0
requirements.txt

@@ -0,0 +1,9 @@
+fastapi>=0.110.0
+uvicorn[standard]>=0.23.0
+fastmcp>=0.5.0
+httpx>=0.28.1
+rdflib>=7.6.0
+pydantic>=2.5.0
+pytest>=8.0.0
+mcp>=1.27.0
+python-dotenv>=1.0.0

+ 8 - 0
restart.sh

@@ -0,0 +1,8 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+cd "$SCRIPT_DIR"
+
+./killserver.sh
+./run.sh

+ 40 - 0
run.sh

@@ -0,0 +1,40 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+cd "$SCRIPT_DIR"
+
+if [[ -f .env ]]; then
+  set -a
+  # shellcheck source=/dev/null
+  source .env
+  set +a
+fi
+
+if [[ -f .venv/bin/activate ]]; then
+  # shellcheck source=/dev/null
+  source .venv/bin/activate
+fi
+
+PORT="${MCP_PORT:-${PORT:-8550}}"
+HOST="${MCP_HOST:-0.0.0.0}"
+LOG_DIR="logs"
+mkdir -p "$LOG_DIR"
+PID_FILE="$LOG_DIR/server.pid"
+LOG_FILE="$LOG_DIR/server.log"
+
+if [[ -f "$PID_FILE" ]]; then
+  PID="$(cat "$PID_FILE" 2>/dev/null || true)"
+  if [[ -n "${PID:-}" ]] && kill -0 "$PID" 2>/dev/null; then
+    echo "Atlas already running (PID $PID)."
+    exit 1
+  else
+    echo "Removing stale pidfile."
+    rm -f "$PID_FILE"
+  fi
+fi
+
+nohup python3 -m uvicorn app.main:app --host "$HOST" --port "$PORT" >"$LOG_FILE" 2>&1 &
+PID=$!
+echo "$PID" >"$PID_FILE"
+echo "Atlas started (PID $PID). Logs: $LOG_FILE"

+ 78 - 0
tests.sh

@@ -0,0 +1,78 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+PORT="${MCP_PORT:-${PORT:-8550}}"
+URL="${ATLAS_HEALTH_URL:-http://127.0.0.1:${PORT}/health}"
+ENTITY_SUBJECT="${ATLAS_TEST_SUBJECT:-Joe Biden}"
+DEBUG_PATH="${ATLAS_TEST_DEBUG_PATH:-/tmp/atlas-debug/${ENTITY_SUBJECT// /_}.ttl}"
+
+log() {
+  printf '\n[%s] %s\n' "$(date +'%H:%M:%S')" "$1"
+}
+
+if ! command -v curl >/dev/null 2>&1; then
+  echo "curl is required for tests.sh"
+  exit 1
+fi
+
+if ! command -v jq >/dev/null 2>&1; then
+  echo "jq is required for tests.sh"
+  exit 1
+fi
+
+if ! command -v mcporter >/dev/null 2>&1; then
+  echo "mcporter is required for tests.sh"
+  exit 1
+fi
+
+log "1) Checking Atlas health endpoint"
+health_response="$(curl -fsS "$URL")"
+echo "$health_response" | jq
+
+if ! printf '%s' "$health_response" | grep -q '"status":"ok"'; then
+  echo "Expected health status ok"
+  exit 1
+fi
+
+if ! printf '%s' "$health_response" | grep -q '"tools"'; then
+  echo "Expected tools in health response"
+  exit 1
+fi
+
+if [[ -n "${CONFIG:-}" ]]; then
+  log "2) Resolving a known entity via mcporter"
+  resolve_out="$({ mcporter --config "$CONFIG" call atlas.resolve_entity subject="$ENTITY_SUBJECT" debug=true debug_path="$DEBUG_PATH"; } 2>&1)"
+  printf '%s\n' "$resolve_out"
+
+  if ! printf '%s' "$resolve_out" | grep -q '"canonical_label"'; then
+    echo "Expected canonical_label in resolve output"
+    exit 1
+  fi
+
+  if ! printf '%s' "$resolve_out" | grep -q '"wikidata_payload"'; then
+    echo "Expected wikidata_payload in resolve output"
+    exit 1
+  fi
+
+  if ! printf '%s' "$resolve_out" | grep -q '"raw_claims"'; then
+    echo "Expected raw_claims in debug output"
+    exit 1
+  fi
+
+  if ! printf '%s' "$resolve_out" | grep -q '"derived_claims"'; then
+    echo "Expected derived_claims in debug output"
+    exit 1
+  fi
+
+  if [[ -f "$DEBUG_PATH" ]]; then
+    log "3) Debug Turtle dumped to $DEBUG_PATH"
+    head -n 40 "$DEBUG_PATH"
+  else
+    echo "Expected debug Turtle at $DEBUG_PATH"
+    exit 1
+  fi
+else
+  log "2) CONFIG not set, skipping mcporter resolve smoke test"
+fi
+
+log "All Atlas checks passed."

+ 6 - 0
tests/conftest.py

@@ -0,0 +1,6 @@
+import sys
+from pathlib import Path
+
+ROOT = Path(__file__).resolve().parent.parent
+if str(ROOT) not in sys.path:
+    sys.path.insert(0, str(ROOT))

+ 106 - 0
tests/test_atlas_contracts.py

@@ -0,0 +1,106 @@
+import pytest
+
+import app.atlas as atlas_module
+from app.atlas import enrich_entity, resolve_entity
+from app.models import AtlasAlias, AtlasEntity, AtlasIdentifier, AtlasProvenance
+from app.type_classifier import TypeClassification
+
+
+@pytest.mark.anyio
+async def test_resolve_entity_returns_canonical_structure():
+    entity = await resolve_entity("Trump")
+
+    assert entity.atlas_id.startswith("atlas:")
+    assert entity.canonical_label
+    assert entity.aliases[0].label.lower() == "trump" or entity.aliases[0].label.lower() == "donald trump"
+    assert entity.provenance
+    assert entity.raw_payload["raw"] == "Trump"
+
+
+@pytest.mark.anyio
+async def test_enrich_entity_returns_dataset_shape():
+    entity = await resolve_entity("Trump")
+    result = enrich_entity(entity, constraints={"type": "person"}, depth=2)
+
+    assert result.seed_entity.atlas_id == entity.atlas_id
+    assert result.query_context == {"type": "person"}
+    assert result.depth == 2
+    assert result.related_entities == []
+
+
+def test_internal_models_support_identity_and_provenance():
+    entity = AtlasEntity(
+        atlas_id="atlas:donald-trump",
+        canonical_label="Donald Trump",
+        entity_type="person",
+        aliases=[AtlasAlias(label="Trump")],
+        identifiers=[AtlasIdentifier(value="Q22686", source="wikidata", identifier_type="wikidata-qid")],
+        provenance=[AtlasProvenance(source="google-trends", retrieval_method="entity-resolution", confidence=0.93)],
+    )
+
+    assert entity.atlas_id == "atlas:donald-trump"
+    assert entity.aliases[0].label == "Trump"
+    assert entity.identifiers[0].value == "Q22686"
+    assert entity.provenance[0].source == "google-trends"
+
+
+@pytest.mark.anyio
+async def test_resolve_entity_passes_context_to_classifier(monkeypatch):
+    captured = {}
+
+    async def fake_classifier(subject, resolution, context):
+        captured["context"] = context
+        return TypeClassification(canonical_type="Person", provenance=None, needs_curation=False)
+
+    def fake_trends(subject):
+        return {
+            "canonical_label": subject,
+            "normalized": subject,
+            "mid": None,
+            "type": "Person",
+            "source": "resolver",
+            "resolved_at": "2026-04-03T00:00:00Z",
+            "candidates": [],
+            "raw": subject,
+        }
+
+    writes = []
+
+    async def fake_write(entity):
+        writes.append(entity)
+        return {"status": "ok"}
+
+    monkeypatch.setattr("app.atlas.classify_entity_type", fake_classifier)
+    monkeypatch.setattr("app.atlas.resolve_entity_via_trends", fake_trends)
+    monkeypatch.setattr(atlas_module._storage, "write_entity", fake_write)
+
+    entity = await resolve_entity("Sample", context="news paragraph")
+
+    assert captured["context"] == "news paragraph"
+    assert entity.entity_type == "Person"
+    assert writes and writes[0].canonical_label == "Sample"
+
+
+@pytest.mark.anyio
+async def test_resolve_entity_marks_needs_curation(monkeypatch):
+    async def fake_classifier(subject, resolution, context):
+        return TypeClassification(canonical_type=None, provenance=None, needs_curation=True)
+
+    def fake_trends(subject):
+        return {
+            "canonical_label": subject,
+            "normalized": subject,
+            "mid": None,
+            "type": "Unknown",
+            "source": "resolver",
+            "resolved_at": "2026-04-03T00:00:00Z",
+            "candidates": [],
+            "raw": subject,
+        }
+
+    monkeypatch.setattr("app.atlas.classify_entity_type", fake_classifier)
+    monkeypatch.setattr("app.atlas.resolve_entity_via_trends", fake_trends)
+
+    entity = await resolve_entity("Mysterious")
+
+    assert entity.needs_curation is True

+ 33 - 0
tests/test_claims.py

@@ -0,0 +1,33 @@
+from app.claims import build_claim_sets
+from app.models import AtlasAlias, AtlasEntity, AtlasIdentifier, AtlasProvenance
+
+
+def test_build_claim_sets_attaches_provenance_per_claim():
+    entity = AtlasEntity(
+        atlas_id="atlas:mid:/m/0cqt90",
+        canonical_label="Donald Trump",
+        canonical_description="45th and 47th U.S. President",
+        entity_type="Person",
+        aliases=[AtlasAlias(label="Donald Trump")],
+        identifiers=[
+            AtlasIdentifier(value="/m/0cqt90", source="google", identifier_type="mid"),
+            AtlasIdentifier(value="Q22686", source="wikidata", identifier_type="qid"),
+        ],
+        provenance=[
+            AtlasProvenance(source="google", retrieval_method="trends-resolution", confidence=0.9, retrieved_at="2026-04-03T00:00:00Z"),
+            AtlasProvenance(source="wikidata", retrieval_method="wbsearchentities + entitydata", confidence=0.99, retrieved_at="2026-04-03T00:00:01Z"),
+            AtlasProvenance(source="openai-llm", retrieval_method="type-classification", confidence=1.0, retrieved_at="2026-04-03T00:00:02Z"),
+        ],
+        raw_payload={"wikidata": {"status": "ok", "qid": "Q22686", "retrieved_at": "2026-04-03T00:00:01Z"}},
+        needs_curation=False,
+    )
+
+    raw_claims, derived_claims = build_claim_sets(entity)
+
+    mid_claim = next(c for c in raw_claims if c["object"].get("value") == "/m/0cqt90")
+    qid_claim = next(c for c in raw_claims if c["object"].get("value") == "Q22686")
+    canonical_type = next(c for c in derived_claims if c["predicate"] == "atlas:hasCanonicalType")
+
+    assert mid_claim["provenance"]["source"] == "google"
+    assert qid_claim["provenance"]["source"] == "wikidata"
+    assert canonical_type["provenance"]["method"] == "type-classification"

+ 22 - 0
tests/test_debug_export_file.py

@@ -0,0 +1,22 @@
+from pathlib import Path
+
+from app.models import AtlasAlias, AtlasEntity, AtlasIdentifier, AtlasProvenance
+from app.triple_export import entity_to_turtle
+
+
+def test_debug_turtle_can_be_written_to_file(tmp_path: Path):
+    entity = AtlasEntity(
+        atlas_id="atlas:mid:/m/012gx2",
+        canonical_label="Joe Biden",
+        entity_type="Person",
+        aliases=[AtlasAlias(label="Joe Biden")],
+        identifiers=[AtlasIdentifier(value="/m/012gx2", source="google", identifier_type="mid")],
+        provenance=[AtlasProvenance(source="google-trends", retrieval_method="trends-resolution", confidence=0.9, retrieved_at="2026-04-03T17:33:21.651528+00:00")],
+        needs_curation=False,
+    )
+    turtle = entity_to_turtle(entity)
+    out = tmp_path / "joe_biden.ttl"
+    out.write_text(turtle, encoding="utf-8")
+
+    assert out.exists()
+    assert 'atlas:canonicalLabel "Joe Biden"' in out.read_text(encoding="utf-8")

+ 77 - 0
tests/test_storage_service.py

@@ -0,0 +1,77 @@
+import pytest
+
+from app.models import AtlasAlias, AtlasEntity, AtlasIdentifier, AtlasProvenance
+from app.storage_service import AtlasStorageService, entity_iri
+
+
+@pytest.mark.anyio
+async def test_write_entity_uses_batch_insert():
+    calls = []
+
+    async def fake_call(tool, payload):
+        calls.append((tool, payload))
+        return {"ok": True}
+
+    svc = AtlasStorageService(call_tool=fake_call)
+    entity = AtlasEntity(
+        atlas_id="atlas:mid:/m/0cqt90",
+        canonical_label="Donald Trump",
+        canonical_description="45th and 47th U.S. President",
+        entity_type="Person",
+        aliases=[AtlasAlias(label="Donald Trump")],
+        identifiers=[AtlasIdentifier(value="/m/0cqt90", source="google", identifier_type="mid")],
+        provenance=[AtlasProvenance(source="google", retrieval_method="trends-resolution", confidence=0.9)],
+    )
+
+    result = await svc.write_entity(entity)
+
+    assert result["status"] == "ok"
+    assert calls[0][0] == "batch_insert"
+    assert "ttl" in calls[0][1]
+
+
+@pytest.mark.anyio
+async def test_read_entity_claims_uses_sparql_query():
+    calls = []
+
+    async def fake_call(tool, payload):
+        calls.append((tool, payload))
+        return {"results": {"bindings": []}}
+
+    svc = AtlasStorageService(call_tool=fake_call)
+    result = await svc.read_entity_claims("atlas:mid:/m/0cqt90")
+
+    assert result["status"] == "ok"
+    assert calls[0][0] == "sparql_query"
+    assert entity_iri("atlas:mid:/m/0cqt90") in calls[0][1]["query"]
+    assert 'FILTER(?status = "active")' in calls[0][1]["query"]
+
+
+@pytest.mark.anyio
+async def test_read_entity_claims_include_superseded_removes_filter():
+    calls = []
+
+    async def fake_call(tool, payload):
+        calls.append((tool, payload))
+        return {"results": {"bindings": []}}
+
+    svc = AtlasStorageService(call_tool=fake_call)
+    result = await svc.read_entity_claims("atlas:mid:/m/0cqt90", include_superseded=True)
+
+    assert result["status"] == "ok"
+    assert calls[0][0] == "sparql_query"
+    assert 'FILTER(?status = "active")' not in calls[0][1]["query"]
+
+
+@pytest.mark.anyio
+async def test_write_entity_unfinished_on_failure():
+    async def fake_call(tool, payload):
+        raise RuntimeError("backend down")
+
+    svc = AtlasStorageService(call_tool=fake_call)
+    entity = AtlasEntity(atlas_id="atlas:x", canonical_label="X")
+
+    result = await svc.write_entity(entity)
+
+    assert result["status"] == "unfinished"
+    assert "backend down" in result["error"]

+ 24 - 0
tests/test_triple_export.py

@@ -0,0 +1,24 @@
+from app.models import AtlasAlias, AtlasEntity, AtlasIdentifier, AtlasProvenance
+from app.triple_export import entity_to_turtle
+
+
+def test_entity_to_turtle_contains_expected_triples():
+    entity = AtlasEntity(
+        atlas_id="atlas:mid:/m/012gx2",
+        canonical_label="Joe Biden",
+        entity_type="Person",
+        aliases=[AtlasAlias(label="Joe Biden")],
+        identifiers=[AtlasIdentifier(value="/m/012gx2", source="google", identifier_type="mid")],
+        provenance=[AtlasProvenance(source="google-trends", retrieval_method="trends-resolution", confidence=0.9, retrieved_at="2026-04-03T17:33:21.651528+00:00")],
+        needs_curation=False,
+    )
+    ttl = entity_to_turtle(entity)
+
+    assert '@prefix atlas: <http://world.eu.org/atlas_ontology#> .' in ttl
+    assert 'atlas:canonicalLabel "Joe Biden"' in ttl
+    assert 'atlas:hasCanonicalType atlas:Person' in ttl
+    assert 'atlas:hasIdentifier' in ttl
+    assert 'atlas:needsCuration false' in ttl
+    assert 'a atlas:Claim' in ttl
+    assert 'atlas:claimPredicate "atlas:hasIdentifier"' in ttl
+    assert 'atlas:claimStatus "active"' in ttl