Project: MCP Bridge for Virtuoso (Community Edition)
Overview
Build a minimal MCP server that proxies Virtuoso Community Edition SPARQL endpoint for LLM agents, then expand to additional data sources (PostgreSQL, CouchDB, Qdrant) while keeping tooling tightly structured.
Stage 1 — Minimal MCP Server (Virtuoso only)
- Implement
sparql_query tool that POSTs to http://localhost:8891/sparql with Accept header application/sparql-results+json.
- Return parsed JSON straight to the caller; consider timeouts and result limits.
- Provide sanitization / guardrails to prevent runaway queries (SELECT-only + LIMIT enforcement).
- Validate the server works from a simple CLI script before wiring to OpenClaw.
Stage 2 — Helper Tools
get_entities_by_type: fetches all subjects of rdf:type <TYPE>.
search_label: filters rdfs:label via case-insensitive substring matching.
list_graphs: enumerates distinct graphs that currently contain triples.
get_predicates_for_subject: lists distinct predicates for a subject URI.
get_labels_for_subject: returns labels for a subject URI.
insert_triple: insert a single triple (debugging updates).
load_examples: optionally load Turtle example files from examples/ into a graph (guarded by MCP_ALLOW_EXAMPLE_LOAD=true).
- Later add more semantic tools (predicate discovery, ontology hints) rather than letting the agent write arbitrary SPARQL.
Stage 3 — Schema Awareness & Introspection
- Tools for predicate discovery and class hierarchy.
- Graph-level tooling (e.g.,
graph_stats, graph_prefixes).
- Cache basic ontology info to reduce repeated introspection.
Stage 4 — Multi-Database Expansion
- PostgreSQL connector (
sql_query) via psycopg or SQLAlchemy; wrap results in MCP tool schema.
- CouchDB connector (
document_lookup) via its REST API.
- Qdrant/Chroma connector (
vector_search) for embedding similarity.
- Each connector implements sanitization, pagination, and ability to annotate results with metadata.
Stage 5 — Cross-Source Reasoning
- MCP server composes SPARQL + SQL + vector results into coherent tool responses.
- Example workflow:
sparql_query → IDs + labels.
sql_query → metadata for those IDs.
vector_search → semantically related docs.
- Provide helper endpoints for the LLM to request multi-source aggregations (e.g.,
entity_context).
Tech Stack
- Python + FastAPI (or lightweight async server).
requests for SPARQL HTTP calls; optional rdflib for validation/parsing.
- DB drivers for PostgreSQL/CouchDB;
qdrant-client or similar for vector search.
- JSON-based MCP schema compatible with OpenClaw tool expectations.
Constraints & Safeguards
- Virtuoso Community Edition cannot load OPAL/VAL (
val_dav.vad is unsupported).
- Guard against complex SPARQL by providing helper tools and imposing query limits/timeouts.
- Log queries and enforce sanitization to avoid exposing unfiltered input.
- Evaluate performance (SPARQL can be slow); consider caching frequent patterns.
Future Extensions
- Ontology-aware prompting and reasoning layer.
- Caching of frequent query results.
- Hybrid symbolic + vector search mix.
- Expose MCP server as a possible
tools.json descriptor for OpenClaw.
Domain plugin layers
- Introduce a
DOMAIN_LAYERS environment variable that lists plugin modules (default garden_layer.plugin).
- Each plugin module exposes a
register_layer(tools) hook that registers domain-prefixed tools (e.g., garden_add_seedling).
- On startup, the MCP server imports those modules, calls their hooks, and the new endpoints appear in the
/mcp tool list without modifying the single FastAPI route.
- This keeps the core server generic while letting any specialized layer (garden, almanac, inventory) add helpers via a simple plugin contract.