RESPONSE_SCHEMA.md 4.2 KB

Atlas Response Schema v2

This file defines the canonical response contract for resolve_entity.

Design goals

  • One coherent entity record across Raw / Canonical / Derived layers
  • Claim-level provenance (no floating provenance blobs)
  • Same data model for normal + debug output, with debug as superset

1) Normal response (default)

{
  "entity": {
    "entity_id": "atlas:1c7ce7c18db59332",
    "canonical_label": "Donald Trump",
    "canonical_description": "45th and 47th U.S. President",
    "canonical_type": "atlas:Person",
    "needs_curation": false,
    "identifiers": [
      {"type": "atlas:Mid", "value": "/m/0cqt90"},
      {"type": "atlas:WikidataQID", "value": "Q22686"}
    ]
  },
  "active_claims": [
    {
      "claim_id": "clm_raw_ident_mid_/m/0cqt90",
      "layer": "raw",
      "status": "active",
      "subject": "atlas:1c7ce7c18db59332",
      "predicate": "atlas:hasIdentifier",
      "object": {"kind": "identifier", "id_type": "atlas:Mid", "value": "/m/0cqt90"},
      "provenance": {
        "source": "google-trends",
        "method": "trends-resolution",
        "confidence": 0.9,
        "retrieved_at": "2026-04-03T18:00:00Z"
      }
    }
  ],
  "summary": {
    "raw_claim_count": 5,
    "derived_claim_count": 1,
    "sources": ["google-trends", "wikidata", "groq-llm"]
  }
}

Notes

  • Normal response is compact and consumer-friendly.
  • No giant payload blobs by default.
  • Identifiers and canonical type are always easy to access.

2) Debug response (debug=true)

Debug mode returns the normal response plus:

{
  "debug": {
    "raw_claims": [
      {
        "claim_id": "clm_raw_mid_1",
        "layer": "raw",
        "subject": "atlas:mid:/m/0cqt90",
        "predicate": "atlas:hasIdentifier",
        "object": {"kind": "identifier", "id_type": "atlas:Mid", "value": "/m/0cqt90"},
        "provenance": {
          "source": "google-trends",
          "method": "trends-resolution",
          "confidence": 0.90,
          "retrieved_at": "2026-04-03T18:00:00Z"
        }
      }
    ],
    "derived_claims": [
      {
        "claim_id": "clm_drv_type_1",
        "layer": "derived",
        "subject": "atlas:mid:/m/0cqt90",
        "predicate": "atlas:hasCanonicalType",
        "object": {"kind": "type", "value": "atlas:Person"},
        "provenance": {
          "source": "wikidata+llm",
          "method": "type-adjudication",
          "confidence": 0.97,
          "retrieved_at": "2026-04-03T18:00:02Z"
        }
      }
    ],
    "source_payloads": {
      "g_trends_payload": {},
      "wikidata_payload": {},
      "llm_payload": {}
    },
    "turtle": "...",
    "turtle_path": "/tmp/atlas-debug/trump.ttl"
  }
}

Notes

  • Debug is strictly a superset of normal.
  • Provenance belongs to each claim.
  • Payload snapshots are debug-only.

3) Layer interpretation

  • Raw layer: source-aligned facts (MIDs, QIDs, external type claims, labels)
  • Canonical layer: Atlas normalized entity fields (canonical label/type/description)
  • Derived layer: computed claims (e.g., canonical type adjudication, enrichment links)

All three layers must align around the same entity_id.


4) Field policy

Required in normal mode

  • entity.entity_id
  • entity.canonical_label
  • entity.canonical_type
  • entity.needs_curation
  • entity.identifiers[]
  • active_claims[]

Required in debug mode

  • debug.raw_claims[]
  • debug.derived_claims[]
  • debug.source_payloads
  • debug.turtle

5) Maintenance model

Atlas maintenance jobs may fetch the full Wikidata entity payload when a Wikidata hit exists. That payload can generate additional identifier claims; the adjudicator may activate or supersede claims based on identifier alignment (for example, MID vs Wikidata QID vs other external IDs).

Recommended maintenance interface:

  • scripts/maintain_entities.py SUBJECT...
  • --dry-run prints planned claim changes without writing
  • --include-wikidata-entity fetches the full Wikidata entity object for richer identifier claims

6) Backward compatibility

Current implementation fields (atlas_id, entity_type, etc.) may remain temporarily, but target output should migrate to this schema to avoid ambiguity and drift.