PROJECT.md 3.8 KB

Project: MCP Bridge for Virtuoso (Community Edition)

Overview

Build a minimal MCP server that proxies Virtuoso Community Edition SPARQL endpoint for LLM agents, then expand to additional data sources (PostgreSQL, CouchDB, Qdrant) while keeping tooling tightly structured.

Stage 1 — Minimal MCP Server (Virtuoso only)

  • Implement sparql_query tool that POSTs to http://localhost:8891/sparql with Accept header application/sparql-results+json.
  • Return parsed JSON straight to the caller; consider timeouts and result limits.
  • Provide sanitization / guardrails to prevent runaway queries (SELECT-only + LIMIT enforcement).
  • Validate the server works from a simple CLI script before wiring to OpenClaw.

Stage 2 — Helper Tools

  • get_entities_by_type: fetches all subjects of rdf:type <TYPE>.
  • search_label: filters rdfs:label via case-insensitive substring matching.
  • list_graphs: enumerates distinct graphs that currently contain triples.
  • get_predicates_for_subject: lists distinct predicates for a subject URI.
  • get_labels_for_subject: returns labels for a subject URI.
  • insert_triple: insert a single triple (debugging updates).
  • load_examples: optionally load Turtle example files from examples/ into a graph (guarded by MCP_ALLOW_EXAMPLE_LOAD=true).
  • Later add more semantic tools (predicate discovery, ontology hints) rather than letting the agent write arbitrary SPARQL.

Stage 3 — Schema Awareness & Introspection

  • Tools for predicate discovery and class hierarchy.
  • Graph-level tooling (e.g., graph_stats, graph_prefixes).
  • Cache basic ontology info to reduce repeated introspection.

Stage 4 — Multi-Database Expansion

  • PostgreSQL connector (sql_query) via psycopg or SQLAlchemy; wrap results in MCP tool schema.
  • CouchDB connector (document_lookup) via its REST API.
  • Qdrant/Chroma connector (vector_search) for embedding similarity.
  • Each connector implements sanitization, pagination, and ability to annotate results with metadata.

Stage 5 — Cross-Source Reasoning

  • MCP server composes SPARQL + SQL + vector results into coherent tool responses.
  • Example workflow:
    1. sparql_query → IDs + labels.
    2. sql_query → metadata for those IDs.
    3. vector_search → semantically related docs.
  • Provide helper endpoints for the LLM to request multi-source aggregations (e.g., entity_context).

Tech Stack

  • Python + FastAPI (or lightweight async server).
  • requests for SPARQL HTTP calls; optional rdflib for validation/parsing.
  • DB drivers for PostgreSQL/CouchDB; qdrant-client or similar for vector search.
  • JSON-based MCP schema compatible with OpenClaw tool expectations.

Constraints & Safeguards

  • Virtuoso Community Edition cannot load OPAL/VAL (val_dav.vad is unsupported).
  • Guard against complex SPARQL by providing helper tools and imposing query limits/timeouts.
  • Log queries and enforce sanitization to avoid exposing unfiltered input.
  • Evaluate performance (SPARQL can be slow); consider caching frequent patterns.

Future Extensions

  • Ontology-aware prompting and reasoning layer.
  • Caching of frequent query results.
  • Hybrid symbolic + vector search mix.
  • Expose MCP server as a possible tools.json descriptor for OpenClaw.

Domain plugin layers

  • Introduce a DOMAIN_LAYERS environment variable that lists plugin modules (default garden_layer.plugin).
  • Each plugin module exposes a register_layer(tools) hook that registers domain-prefixed tools (e.g., garden_add_seedling).
  • On startup, the MCP server imports those modules, calls their hooks, and the new endpoints appear in the /mcp tool list without modifying the single FastAPI route.
  • This keeps the core server generic while letting any specialized layer (garden, almanac, inventory) add helpers via a simple plugin contract.