|
|
@@ -0,0 +1,62 @@
|
|
|
+# Project: MCP Bridge for Virtuoso (Community Edition)
|
|
|
+
|
|
|
+## Overview
|
|
|
+
|
|
|
+Build a minimal MCP server that proxies Virtuoso Community Edition SPARQL endpoint for LLM agents, then expand to additional data sources (PostgreSQL, CouchDB, Qdrant) while keeping tooling tightly structured.
|
|
|
+
|
|
|
+## Stage 1 — Minimal MCP Server (Virtuoso only)
|
|
|
+
|
|
|
+- Implement `sparql_query` tool that POSTs to `http://localhost:8891/sparql` with Accept header `application/sparql-results+json`.
|
|
|
+- Return parsed JSON straight to the caller; consider timeouts and result limits.
|
|
|
+- Provide sanitization / guardrails to prevent runaway queries.
|
|
|
+- Validate the server works from a simple CLI script before wiring to OpenClaw.
|
|
|
+
|
|
|
+## Stage 2 — Helper Tools
|
|
|
+
|
|
|
+- `get_entities_by_type`: fetches all subjects of `rdf:type <TYPE>`.
|
|
|
+- `search_by_label`: filters `rdfs:label` via case-insensitive substring matching.
|
|
|
+- `list_graphs`: enumerates distinct graphs that currently contain triples.
|
|
|
+- Later add more semantic tools (predicate discovery, ontology hints) rather than letting the agent write arbitrary SPARQL.
|
|
|
+
|
|
|
+## Stage 3 — Schema Awareness & Introspection
|
|
|
+
|
|
|
+- Tools for predicate discovery and class hierarchy.
|
|
|
+- Graph-level tooling (e.g., `graph_stats`, `graph_prefixes`).
|
|
|
+- Cache basic ontology info to reduce repeated introspection.
|
|
|
+
|
|
|
+## Stage 4 — Multi-Database Expansion
|
|
|
+
|
|
|
+- PostgreSQL connector (`sql_query`) via `psycopg` or SQLAlchemy; wrap results in MCP tool schema.
|
|
|
+- CouchDB connector (`document_lookup`) via its REST API.
|
|
|
+- Qdrant/Chroma connector (`vector_search`) for embedding similarity.
|
|
|
+- Each connector implements sanitization, pagination, and ability to annotate results with metadata.
|
|
|
+
|
|
|
+## Stage 5 — Cross-Source Reasoning
|
|
|
+
|
|
|
+- MCP server composes SPARQL + SQL + vector results into coherent tool responses.
|
|
|
+- Example workflow:
|
|
|
+ 1. `sparql_query` → IDs + labels.
|
|
|
+ 2. `sql_query` → metadata for those IDs.
|
|
|
+ 3. `vector_search` → semantically related docs.
|
|
|
+- Provide helper endpoints for the LLM to request multi-source aggregations (e.g., `entity_context`).
|
|
|
+
|
|
|
+## Tech Stack
|
|
|
+
|
|
|
+- Python + FastAPI (or lightweight async server).
|
|
|
+- `requests` for SPARQL HTTP calls; optional `rdflib` for validation/parsing.
|
|
|
+- DB drivers for PostgreSQL/CouchDB; `qdrant-client` or similar for vector search.
|
|
|
+- JSON-based MCP schema compatible with OpenClaw tool expectations.
|
|
|
+
|
|
|
+## Constraints & Safeguards
|
|
|
+
|
|
|
+- Virtuoso Community Edition cannot load OPAL/VAL (`val_dav.vad` is unsupported).
|
|
|
+- Guard against complex SPARQL by providing helper tools and imposing query limits/timeouts.
|
|
|
+- Log queries and enforce sanitization to avoid exposing unfiltered input.
|
|
|
+- Evaluate performance (SPARQL can be slow); consider caching frequent patterns.
|
|
|
+
|
|
|
+## Future Extensions
|
|
|
+
|
|
|
+- Ontology-aware prompting and reasoning layer.
|
|
|
+- Caching of frequent query results.
|
|
|
+- Hybrid symbolic + vector search mix.
|
|
|
+- Expose MCP server as a possible `tools.json` descriptor for OpenClaw.
|