Преглед изворни кода

initial MCP bridge scaffolding

Lukas Goldschmidt пре 1 месец
комит
1df0b45f98
2 измењених фајлова са 94 додато и 0 уклоњено
  1. 62 0
      PROJECT.md
  2. 32 0
      README.md

+ 62 - 0
PROJECT.md

@@ -0,0 +1,62 @@
+# Project: MCP Bridge for Virtuoso (Community Edition)
+
+## Overview
+
+Build a minimal MCP server that proxies Virtuoso Community Edition SPARQL endpoint for LLM agents, then expand to additional data sources (PostgreSQL, CouchDB, Qdrant) while keeping tooling tightly structured.
+
+## Stage 1 — Minimal MCP Server (Virtuoso only)
+
+- Implement `sparql_query` tool that POSTs to `http://localhost:8891/sparql` with Accept header `application/sparql-results+json`.
+- Return parsed JSON straight to the caller; consider timeouts and result limits.
+- Provide sanitization / guardrails to prevent runaway queries.
+- Validate the server works from a simple CLI script before wiring to OpenClaw.
+
+## Stage 2 — Helper Tools
+
+- `get_entities_by_type`: fetches all subjects of `rdf:type <TYPE>`.
+- `search_by_label`: filters `rdfs:label` via case-insensitive substring matching.
+- `list_graphs`: enumerates distinct graphs that currently contain triples.
+- Later add more semantic tools (predicate discovery, ontology hints) rather than letting the agent write arbitrary SPARQL.
+
+## Stage 3 — Schema Awareness & Introspection
+
+- Tools for predicate discovery and class hierarchy.
+- Graph-level tooling (e.g., `graph_stats`, `graph_prefixes`).
+- Cache basic ontology info to reduce repeated introspection.
+
+## Stage 4 — Multi-Database Expansion
+
+- PostgreSQL connector (`sql_query`) via `psycopg` or SQLAlchemy; wrap results in MCP tool schema.
+- CouchDB connector (`document_lookup`) via its REST API.
+- Qdrant/Chroma connector (`vector_search`) for embedding similarity.
+- Each connector implements sanitization, pagination, and ability to annotate results with metadata.
+
+## Stage 5 — Cross-Source Reasoning
+
+- MCP server composes SPARQL + SQL + vector results into coherent tool responses.
+- Example workflow:
+  1. `sparql_query` → IDs + labels.
+  2. `sql_query` → metadata for those IDs.
+  3. `vector_search` → semantically related docs.
+- Provide helper endpoints for the LLM to request multi-source aggregations (e.g., `entity_context`).
+
+## Tech Stack
+
+- Python + FastAPI (or lightweight async server).
+- `requests` for SPARQL HTTP calls; optional `rdflib` for validation/parsing.
+- DB drivers for PostgreSQL/CouchDB; `qdrant-client` or similar for vector search.
+- JSON-based MCP schema compatible with OpenClaw tool expectations.
+
+## Constraints & Safeguards
+
+- Virtuoso Community Edition cannot load OPAL/VAL (`val_dav.vad` is unsupported).
+- Guard against complex SPARQL by providing helper tools and imposing query limits/timeouts.
+- Log queries and enforce sanitization to avoid exposing unfiltered input.
+- Evaluate performance (SPARQL can be slow); consider caching frequent patterns.
+
+## Future Extensions
+
+- Ontology-aware prompting and reasoning layer.
+- Caching of frequent query results.
+- Hybrid symbolic + vector search mix.
+- Expose MCP server as a possible `tools.json` descriptor for OpenClaw.

+ 32 - 0
README.md

@@ -0,0 +1,32 @@
+# MCP Bridge for Virtuoso (Community Edition)
+
+A custom MCP server that lets OpenClaw (or any LLM agent) access Virtuoso Community Edition as a semantic backend without running raw SPARQL from the agent. The MCP layer exposes structured tools that orchestrate queries and later aggregate data across additional stores (PostgreSQL, CouchDB, Qdrant).
+
+## Vision
+
+- LLMs never issue SQL/SPARQL directly—they call MCP tools.
+- The MCP server handles orchestration, sanitization, rate limiting, and multi-source composition.
+- Start with Virtuoso (SPARQL) and progressively add new connectors.
+
+## Architecture
+
+```
+LLM Agent (OpenClaw)
+↓
+MCP Server
+├── Virtuoso (SPARQL)
+├── PostgreSQL
+└── Vector DBs (e.g., Qdrant)
+```
+
+## Design Principles
+
+1. Tool-based abstraction: Provide helpers such as `sparql_query`, `get_entities_by_type`, `list_graphs` instead of exposing raw SPARQL.
+2. Gradual complexity: Ship a minimal working setup, then layer on helper tooling, schema introspection, and connectors.
+3. Separation of concerns: Virtuoso stores RDF, MCP runs tool interfaces, and LLMs focus on reasoning/tool selection.
+
+## Success Criteria
+
+- Phase 1: MCP tool (`sparql_query`) returns valid SPARQL JSON results.
+- Phase 2: LLM relies on helper tools instead of free-form queries.
+- Phase 3: Multiple data sources accessible through a unified MCP interface.