# Project: MCP Bridge for Virtuoso (Community Edition) ## Overview Build a minimal MCP server that proxies Virtuoso Community Edition SPARQL endpoint for LLM agents, then expand to additional data sources (PostgreSQL, CouchDB, Qdrant) while keeping tooling tightly structured. ## Stage 1 — Minimal MCP Server (Virtuoso only) - Implement `sparql_query` tool that POSTs to `http://localhost:8891/sparql` with Accept header `application/sparql-results+json`. - Return parsed JSON straight to the caller; consider timeouts and result limits. - Provide sanitization / guardrails to prevent runaway queries (SELECT-only + LIMIT enforcement). - Validate the server works from a simple CLI script before wiring to OpenClaw. ## Stage 2 — Helper Tools - `get_entities_by_type`: fetches all subjects of `rdf:type `. - `search_label`: filters `rdfs:label` via case-insensitive substring matching. - `list_graphs`: enumerates distinct graphs that currently contain triples. - `get_predicates_for_subject`: lists distinct predicates for a subject URI. - `get_labels_for_subject`: returns labels for a subject URI. - `insert_triple`: insert a single triple (debugging updates). - `load_examples`: optionally load Turtle example files from `examples/` into a graph (guarded by `MCP_ALLOW_EXAMPLE_LOAD=true`). - Later add more semantic tools (predicate discovery, ontology hints) rather than letting the agent write arbitrary SPARQL. ## Stage 3 — Schema Awareness & Introspection - Tools for predicate discovery and class hierarchy. - Graph-level tooling (e.g., `graph_stats`, `graph_prefixes`). - Cache basic ontology info to reduce repeated introspection. ## Stage 4 — Multi-Database Expansion - PostgreSQL connector (`sql_query`) via `psycopg` or SQLAlchemy; wrap results in MCP tool schema. - CouchDB connector (`document_lookup`) via its REST API. - Qdrant/Chroma connector (`vector_search`) for embedding similarity. - Each connector implements sanitization, pagination, and ability to annotate results with metadata. ## Stage 5 — Cross-Source Reasoning - MCP server composes SPARQL + SQL + vector results into coherent tool responses. - Example workflow: 1. `sparql_query` → IDs + labels. 2. `sql_query` → metadata for those IDs. 3. `vector_search` → semantically related docs. - Provide helper endpoints for the LLM to request multi-source aggregations (e.g., `entity_context`). ## Tech Stack - Python + FastAPI (or lightweight async server). - `requests` for SPARQL HTTP calls; optional `rdflib` for validation/parsing. - DB drivers for PostgreSQL/CouchDB; `qdrant-client` or similar for vector search. - JSON-based MCP schema compatible with OpenClaw tool expectations. ## Constraints & Safeguards - Virtuoso Community Edition cannot load OPAL/VAL (`val_dav.vad` is unsupported). - Guard against complex SPARQL by providing helper tools and imposing query limits/timeouts. - Log queries and enforce sanitization to avoid exposing unfiltered input. - Evaluate performance (SPARQL can be slow); consider caching frequent patterns. ## Future Extensions - Ontology-aware prompting and reasoning layer. - Caching of frequent query results. - Hybrid symbolic + vector search mix. - Expose MCP server as a possible `tools.json` descriptor for OpenClaw. ## Domain plugin layers - Introduce a `DOMAIN_LAYERS` environment variable that lists plugin modules (default `garden_layer.plugin`). - Each plugin module exposes a `register_layer(tools)` hook that registers domain-prefixed tools (e.g., `garden_add_seedling`). - On startup, the MCP server imports those modules, calls their hooks, and the new endpoints appear in the `/mcp` tool list without modifying the single FastAPI route. - This keeps the core server generic while letting any specialized layer (garden, almanac, inventory) add helpers via a simple plugin contract.