# Project: MCP Bridge for Virtuoso (Community Edition)

## Overview

Build a minimal MCP server that proxies Virtuoso Community Edition SPARQL endpoint for LLM agents, then expand to additional data sources (PostgreSQL, CouchDB, Qdrant) while keeping tooling tightly structured.

## Stage 1 — Minimal MCP Server (Virtuoso only)

- Implement `sparql_query` tool that POSTs to `http://localhost:8891/sparql` with Accept header `application/sparql-results+json`.
- Return parsed JSON straight to the caller; consider timeouts and result limits.
- Provide sanitization / guardrails to prevent runaway queries (SELECT-only + LIMIT enforcement).
- Validate the server works from a simple CLI script before wiring to OpenClaw.

## Stage 2 — Helper Tools

- `get_entities_by_type`: fetches all subjects of `rdf:type <TYPE>`.
- `search_label`: filters `rdfs:label` via case-insensitive substring matching.
- `list_graphs`: enumerates distinct graphs that currently contain triples.
- `get_predicates_for_subject`: lists distinct predicates for a subject URI.
- `get_labels_for_subject`: returns labels for a subject URI.
- `insert_triple`: insert a single triple (debugging updates).
- `load_examples`: optionally load Turtle example files from `examples/` into a graph (guarded by `MCP_ALLOW_EXAMPLE_LOAD=true`).
- Later add more semantic tools (predicate discovery, ontology hints) rather than letting the agent write arbitrary SPARQL.

## Stage 3 — Schema Awareness & Introspection

- Tools for predicate discovery and class hierarchy.
- Graph-level tooling (e.g., `graph_stats`, `graph_prefixes`).
- Cache basic ontology info to reduce repeated introspection.

## Stage 4 — Multi-Database Expansion

- PostgreSQL connector (`sql_query`) via `psycopg` or SQLAlchemy; wrap results in MCP tool schema.
- CouchDB connector (`document_lookup`) via its REST API.
- Qdrant/Chroma connector (`vector_search`) for embedding similarity.
- Each connector implements sanitization, pagination, and ability to annotate results with metadata.

## Stage 5 — Cross-Source Reasoning

- MCP server composes SPARQL + SQL + vector results into coherent tool responses.
- Example workflow:
  1. `sparql_query` → IDs + labels.
  2. `sql_query` → metadata for those IDs.
  3. `vector_search` → semantically related docs.
- Provide helper endpoints for the LLM to request multi-source aggregations (e.g., `entity_context`).

## Tech Stack

- Python + FastAPI (or lightweight async server).
- `requests` for SPARQL HTTP calls; optional `rdflib` for validation/parsing.
- DB drivers for PostgreSQL/CouchDB; `qdrant-client` or similar for vector search.
- JSON-based MCP schema compatible with OpenClaw tool expectations.

## Constraints & Safeguards

- Virtuoso Community Edition cannot load OPAL/VAL (`val_dav.vad` is unsupported).
- Guard against complex SPARQL by providing helper tools and imposing query limits/timeouts.
- Log queries and enforce sanitization to avoid exposing unfiltered input.
- Evaluate performance (SPARQL can be slow); consider caching frequent patterns.

## Future Extensions

- Ontology-aware prompting and reasoning layer.
- Caching of frequent query results.
- Hybrid symbolic + vector search mix.
- Expose MCP server as a possible `tools.json` descriptor for OpenClaw.

## Domain plugin layers

- Introduce a `DOMAIN_LAYERS` environment variable that lists plugin modules (default `garden_layer.plugin`).
- Each plugin module exposes a `register_layer(tools)` hook that registers domain-prefixed tools (e.g., `garden_add_seedling`).
- On startup, the MCP server imports those modules, calls their hooks, and the new endpoints appear in the `/mcp` tool list without modifying the single FastAPI route.
- This keeps the core server generic while letting any specialized layer (garden, almanac, inventory) add helpers via a simple plugin contract.