LLM router - proxies to provider skills (claude, openai, ollama)

This commit is contained in:
Azat
2026-02-03 00:09:06 +01:00
commit 1d9de9d770
4 changed files with 551 additions and 0 deletions

141
SKILL.md Normal file
View File

@@ -0,0 +1,141 @@
---
name: llm
description: LLM router that proxies to provider skills (claude, openai, ollama)
metadata:
version: "1.0.0"
vibestack:
main: false
---
# LLM Skill
Unified LLM router that proxies requests to provider-specific skills. Abstracts away which LLM backend is being used.
## Architecture
```
┌─────────────┐ ┌─────────────┐
│ client │────▶│ llm │ (router)
└─────────────┘ └──────┬──────┘
┌──────────────────┼──────────────────┐
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ claude skill │ │ openai skill │ │ ollama skill │
│ localhost:8888│ │ localhost:8889│ │ localhost:11434
└───────────────┘ └───────────────┘ └───────────────┘
```
## Configuration
### Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `LLM_PORT` | `8082` | Router port |
| `LLM_PROVIDER` | `claude` | Active provider: `claude`, `openai`, `ollama` |
| `CLAUDE_URL` | `http://localhost:8888` | Claude skill URL |
| `OPENAI_URL` | `http://localhost:8889` | OpenAI skill URL |
| `OLLAMA_URL` | `http://localhost:11434` | Ollama URL |
| `MEMORY_URL` | (none) | Memory skill URL for conversation persistence |
## API
### WebSocket Chat
Connect to `ws://localhost:8082/chat` for unified chat interface.
**Send message:**
```json
{
"type": "message",
"content": "Hello!",
"session_id": "optional-session-id"
}
```
**Receive:**
```json
{"type": "start", "session_id": "abc123"}
{"type": "token", "content": "Hello"}
{"type": "token", "content": "!"}
{"type": "end"}
```
### REST API
```bash
# Chat (proxied to provider)
curl http://localhost:8082/chat \
-H "Content-Type: application/json" \
-d '{"message": "Hello!"}'
# Execute (one-shot, proxied to provider)
curl http://localhost:8082/execute \
-H "Content-Type: application/json" \
-d '{"prompt": "List all files"}'
# Health check
curl http://localhost:8082/health
# Get current provider
curl http://localhost:8082/provider
```
## Provider Skills
Each provider skill implements its own API. The LLM router translates:
### Claude Skill (port 8888)
- `POST /chat` - `{"message": "...", "session_id": "..."}`
- `POST /execute` - `{"prompt": "..."}`
### OpenAI Skill (port 8889)
- `POST /v1/chat/completions` - OpenAI format
### Ollama (port 11434)
- `POST /api/chat` - Ollama format
## Switching Providers
```bash
# Use Claude (default)
LLM_PROVIDER=claude
# Use OpenAI
LLM_PROVIDER=openai
# Use Ollama
LLM_PROVIDER=ollama
```
Clients connect to `localhost:8082` - they don't need to know which provider is active.
## Tool Calling (Pass-through)
Tools are passed to the provider skill. When the LLM wants to call a tool:
1. LLM router sends tool definitions to provider
2. Provider returns tool call request
3. Router passes tool call to client via WebSocket
4. Client executes tool, sends result back
5. Router forwards result to provider
6. Provider continues conversation
```json
// Client receives
{"type": "tool_call", "name": "read_file", "arguments": {"path": "/etc/hosts"}}
// Client sends back
{"type": "tool_result", "name": "read_file", "result": "127.0.0.1 localhost..."}
```
## Conversation Memory
If `MEMORY_URL` is set, conversations are stored:
```bash
MEMORY_URL=http://localhost:8081
```
Each conversation is saved to the memory skill for later retrieval.

48
scripts/autorun.sh Normal file
View File

@@ -0,0 +1,48 @@
#!/bin/bash
set -e
SKILL_DIR="$(dirname "$(dirname "$0")")"
# Install Python if not present
install_python() {
if command -v python3 &>/dev/null; then
echo "Python already installed: $(python3 --version)"
return 0
fi
echo "Installing Python..."
apt-get update
apt-get install -y python3 python3-pip python3-venv
echo "Python installed: $(python3 --version)"
}
# Setup Python virtual environment and dependencies
setup_python_env() {
local venv_dir="$SKILL_DIR/.venv"
if [ -d "$venv_dir" ]; then
echo "Python venv already exists"
return 0
fi
echo "Creating Python virtual environment..."
python3 -m venv "$venv_dir"
echo "Installing Python dependencies..."
"$venv_dir/bin/pip" install --upgrade pip
"$venv_dir/bin/pip" install \
fastapi==0.109.0 \
uvicorn==0.27.0 \
websockets==12.0 \
httpx==0.26.0 \
pydantic==2.5.0 \
python-ulid==2.2.0
echo "Python environment ready"
}
install_python
setup_python_env
echo "LLM router setup complete"

25
scripts/run.sh Normal file
View File

@@ -0,0 +1,25 @@
#!/bin/bash
set -e
LLM_PORT="${LLM_PORT:-8082}"
SKILL_DIR="$(dirname "$(dirname "$0")")"
VENV_DIR="$SKILL_DIR/.venv"
# Export config for Python
export LLM_PORT
export LLM_PROVIDER="${LLM_PROVIDER:-claude}"
export CLAUDE_URL="${CLAUDE_URL:-http://localhost:8888}"
export OPENAI_URL="${OPENAI_URL:-http://localhost:8889}"
export OLLAMA_URL="${OLLAMA_URL:-http://localhost:11434}"
export MEMORY_URL="${MEMORY_URL:-}"
echo "Starting LLM Router on port $LLM_PORT..."
echo "Provider: $LLM_PROVIDER"
case "$LLM_PROVIDER" in
claude) echo "Backend: $CLAUDE_URL" ;;
openai) echo "Backend: $OPENAI_URL" ;;
ollama) echo "Backend: $OLLAMA_URL" ;;
esac
exec "$VENV_DIR/bin/python" "$SKILL_DIR/src/api.py"

337
src/api.py Normal file
View File

@@ -0,0 +1,337 @@
#!/usr/bin/env python3
"""
LLM Router - Proxies requests to provider skills (claude, openai, ollama)
"""
import os
import json
import asyncio
from typing import Optional
from contextlib import asynccontextmanager
import httpx
from fastapi import FastAPI, WebSocket, WebSocketDisconnect, HTTPException
from fastapi.responses import JSONResponse
from pydantic import BaseModel
from ulid import ULID
# Configuration
LLM_PORT = int(os.environ.get("LLM_PORT", "8082"))
LLM_PROVIDER = os.environ.get("LLM_PROVIDER", "claude")
# Provider skill URLs
CLAUDE_URL = os.environ.get("CLAUDE_URL", "http://localhost:8888")
OPENAI_URL = os.environ.get("OPENAI_URL", "http://localhost:8889")
OLLAMA_URL = os.environ.get("OLLAMA_URL", "http://localhost:11434")
# Memory integration
MEMORY_URL = os.environ.get("MEMORY_URL", "")
def get_provider_url() -> str:
"""Get URL for current provider."""
providers = {
"claude": CLAUDE_URL,
"openai": OPENAI_URL,
"ollama": OLLAMA_URL,
}
return providers.get(LLM_PROVIDER, CLAUDE_URL)
class ChatRequest(BaseModel):
message: str
session_id: Optional[str] = None
class ExecuteRequest(BaseModel):
prompt: str
# Memory integration
async def store_conversation(session_id: str, message: str, response: str):
"""Store conversation in memory skill."""
if not MEMORY_URL:
return
content = f"User: {message}\nAssistant: {response}"
try:
async with httpx.AsyncClient() as client:
await client.post(
f"{MEMORY_URL}/memory",
json={
"type": "conversation",
"content": content,
"metadata": {"session_id": session_id, "provider": LLM_PROVIDER},
},
timeout=5,
)
except Exception as e:
print(f"Failed to store conversation: {e}")
@asynccontextmanager
async def lifespan(app: FastAPI):
print(f"LLM Router starting on port {LLM_PORT}")
print(f"Provider: {LLM_PROVIDER} -> {get_provider_url()}")
yield
print("Shutting down...")
app = FastAPI(
title="LLM Router",
description="Unified LLM interface routing to provider skills",
version="1.0.0",
lifespan=lifespan,
)
@app.get("/health")
async def health():
"""Health check - also checks provider health."""
provider_url = get_provider_url()
provider_healthy = False
try:
async with httpx.AsyncClient() as client:
resp = await client.get(f"{provider_url}/health", timeout=5)
provider_healthy = resp.status_code == 200
except:
pass
return {
"status": "healthy" if provider_healthy else "degraded",
"provider": LLM_PROVIDER,
"provider_url": provider_url,
"provider_healthy": provider_healthy,
}
@app.get("/provider")
async def get_provider():
"""Get current provider info."""
return {
"provider": LLM_PROVIDER,
"url": get_provider_url(),
}
@app.post("/chat")
async def chat(request: ChatRequest):
"""Chat endpoint - proxies to provider skill."""
provider_url = get_provider_url()
session_id = request.session_id or str(ULID())
try:
async with httpx.AsyncClient() as client:
if LLM_PROVIDER == "claude":
# Claude skill format
resp = await client.post(
f"{provider_url}/chat",
json={"message": request.message, "session_id": session_id},
timeout=120,
)
data = resp.json()
if data.get("success"):
response_text = data.get("response", "")
await store_conversation(session_id, request.message, response_text)
return {
"success": True,
"response": response_text,
"session_id": session_id,
"provider": LLM_PROVIDER,
}
else:
return JSONResponse(
status_code=500,
content={"success": False, "error": data.get("error", "Unknown error")},
)
elif LLM_PROVIDER == "ollama":
# Ollama format
resp = await client.post(
f"{provider_url}/api/chat",
json={
"model": os.environ.get("OLLAMA_MODEL", "llama3.2"),
"messages": [{"role": "user", "content": request.message}],
"stream": False,
},
timeout=120,
)
data = resp.json()
response_text = data.get("message", {}).get("content", "")
await store_conversation(session_id, request.message, response_text)
return {
"success": True,
"response": response_text,
"session_id": session_id,
"provider": LLM_PROVIDER,
}
elif LLM_PROVIDER == "openai":
# OpenAI skill format
resp = await client.post(
f"{provider_url}/v1/chat/completions",
json={
"model": os.environ.get("OPENAI_MODEL", "gpt-4o"),
"messages": [{"role": "user", "content": request.message}],
},
timeout=120,
)
data = resp.json()
response_text = data.get("choices", [{}])[0].get("message", {}).get("content", "")
await store_conversation(session_id, request.message, response_text)
return {
"success": True,
"response": response_text,
"session_id": session_id,
"provider": LLM_PROVIDER,
}
else:
raise HTTPException(status_code=400, detail=f"Unknown provider: {LLM_PROVIDER}")
except httpx.RequestError as e:
return JSONResponse(
status_code=503,
content={"success": False, "error": f"Provider unavailable: {e}"},
)
@app.post("/execute")
async def execute(request: ExecuteRequest):
"""Execute endpoint - proxies to provider skill."""
provider_url = get_provider_url()
try:
async with httpx.AsyncClient() as client:
if LLM_PROVIDER == "claude":
# Claude skill execute endpoint
resp = await client.post(
f"{provider_url}/execute",
json={"prompt": request.prompt},
timeout=300, # Longer timeout for execution
)
return resp.json()
elif LLM_PROVIDER == "ollama":
# Use chat for ollama
resp = await client.post(
f"{provider_url}/api/chat",
json={
"model": os.environ.get("OLLAMA_MODEL", "llama3.2"),
"messages": [{"role": "user", "content": request.prompt}],
"stream": False,
},
timeout=300,
)
data = resp.json()
return {
"success": True,
"result": data.get("message", {}).get("content", ""),
}
else:
raise HTTPException(status_code=400, detail=f"Execute not supported for: {LLM_PROVIDER}")
except httpx.RequestError as e:
return JSONResponse(
status_code=503,
content={"success": False, "error": f"Provider unavailable: {e}"},
)
@app.websocket("/chat")
async def websocket_chat(websocket: WebSocket):
"""WebSocket chat endpoint with streaming proxy."""
await websocket.accept()
provider_url = get_provider_url()
session_id = str(ULID())
try:
while True:
data = await websocket.receive_json()
if data.get("type") == "ping":
await websocket.send_json({"type": "pong"})
continue
if data.get("type") != "message":
continue
content = data.get("content", "")
session_id = data.get("session_id") or session_id
# Send start
await websocket.send_json({
"type": "start",
"session_id": session_id,
"provider": LLM_PROVIDER,
})
try:
async with httpx.AsyncClient() as client:
if LLM_PROVIDER == "claude":
# Claude skill (non-streaming for now)
resp = await client.post(
f"{provider_url}/chat",
json={"message": content, "session_id": session_id},
timeout=120,
)
result = resp.json()
if result.get("success"):
response_text = result.get("response", "")
# Send as single token (claude skill doesn't stream yet)
await websocket.send_json({"type": "token", "content": response_text})
await store_conversation(session_id, content, response_text)
else:
await websocket.send_json({"type": "error", "message": result.get("error", "Unknown error")})
elif LLM_PROVIDER == "ollama":
# Ollama streaming
async with client.stream(
"POST",
f"{provider_url}/api/chat",
json={
"model": os.environ.get("OLLAMA_MODEL", "llama3.2"),
"messages": [{"role": "user", "content": content}],
"stream": True,
},
timeout=300,
) as resp:
full_response = ""
async for line in resp.aiter_lines():
if line:
chunk = json.loads(line)
if "message" in chunk and chunk["message"].get("content"):
token = chunk["message"]["content"]
full_response += token
await websocket.send_json({"type": "token", "content": token})
await store_conversation(session_id, content, full_response)
else:
await websocket.send_json({"type": "error", "message": f"Unknown provider: {LLM_PROVIDER}"})
except httpx.RequestError as e:
await websocket.send_json({"type": "error", "message": f"Provider unavailable: {e}"})
# Send end
await websocket.send_json({"type": "end"})
except WebSocketDisconnect:
print("WebSocket disconnected")
except Exception as e:
print(f"WebSocket error: {e}")
try:
await websocket.send_json({"type": "error", "message": str(e)})
except:
pass
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=LLM_PORT)