Project: MCP Bridge for Virtuoso (Community Edition)

Overview

Build a minimal MCP server that proxies Virtuoso Community Edition SPARQL endpoint for LLM agents, then expand to additional data sources (PostgreSQL, CouchDB, Qdrant) while keeping tooling tightly structured.

Stage 1 — Minimal MCP Server (Virtuoso only)

Implement sparql_query tool that POSTs to http://localhost:8891/sparql with Accept header application/sparql-results+json.
Return parsed JSON straight to the caller; consider timeouts and result limits.
Provide sanitization / guardrails to prevent runaway queries (SELECT-only + LIMIT enforcement).
Validate the server works from a simple CLI script before wiring to OpenClaw.

Stage 2 — Helper Tools

get_entities_by_type: fetches all subjects of rdf:type <TYPE>.
search_label: filters rdfs:label via case-insensitive substring matching.
list_graphs: enumerates distinct graphs that currently contain triples.
get_predicates_for_subject: lists distinct predicates for a subject URI.
get_labels_for_subject: returns labels for a subject URI.
insert_triple: insert a single triple (debugging updates).
load_examples: optionally load Turtle example files from examples/ into a graph (guarded by MCP_ALLOW_EXAMPLE_LOAD=true).
Later add more semantic tools (predicate discovery, ontology hints) rather than letting the agent write arbitrary SPARQL.

Stage 3 — Schema Awareness & Introspection

Tools for predicate discovery and class hierarchy.
Graph-level tooling (e.g., graph_stats, graph_prefixes).
Cache basic ontology info to reduce repeated introspection.

Stage 4 — Multi-Database Expansion

PostgreSQL connector (sql_query) via psycopg or SQLAlchemy; wrap results in MCP tool schema.
CouchDB connector (document_lookup) via its REST API.
Qdrant/Chroma connector (vector_search) for embedding similarity.
Each connector implements sanitization, pagination, and ability to annotate results with metadata.

Stage 5 — Cross-Source Reasoning

MCP server composes SPARQL + SQL + vector results into coherent tool responses.
Example workflow:
1. sparql_query → IDs + labels.
2. sql_query → metadata for those IDs.
3. vector_search → semantically related docs.
Provide helper endpoints for the LLM to request multi-source aggregations (e.g., entity_context).

Tech Stack

Python + FastAPI (or lightweight async server).
requests for SPARQL HTTP calls; optional rdflib for validation/parsing.
DB drivers for PostgreSQL/CouchDB; qdrant-client or similar for vector search.
JSON-based MCP schema compatible with OpenClaw tool expectations.

Constraints & Safeguards

Virtuoso Community Edition cannot load OPAL/VAL (val_dav.vad is unsupported).
Guard against complex SPARQL by providing helper tools and imposing query limits/timeouts.
Log queries and enforce sanitization to avoid exposing unfiltered input.
Evaluate performance (SPARQL can be slow); consider caching frequent patterns.

Future Extensions

Ontology-aware prompting and reasoning layer.
Caching of frequent query results.
Hybrid symbolic + vector search mix.
Expose MCP server as a possible tools.json descriptor for OpenClaw.

Domain plugin layers

Introduce a DOMAIN_LAYERS environment variable that lists plugin modules (default garden_layer.plugin).
Each plugin module exposes a register_layer(tools) hook that registers domain-prefixed tools (e.g., garden_add_seedling).
On startup, the MCP server imports those modules, calls their hooks, and the new endpoints appear in the /mcp tool list without modifying the single FastAPI route.
This keeps the core server generic while letting any specialized layer (garden, almanac, inventory) add helpers via a simple plugin contract.

PROJECT.md 3.8 KB Historie Surový