Files
llm/SKILL.md

142 lines
3.8 KiB
Markdown

---
name: llm
description: LLM router that proxies to provider skills (claude, openai, ollama)
metadata:
version: "1.0.0"
vibestack:
main: false
---
# LLM Skill
Unified LLM router that proxies requests to provider-specific skills. Abstracts away which LLM backend is being used.
## Architecture
```
┌─────────────┐ ┌─────────────┐
│ client │────▶│ llm │ (router)
└─────────────┘ └──────┬──────┘
┌──────────────────┼──────────────────┐
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ claude skill │ │ openai skill │ │ ollama skill │
│ localhost:8888│ │ localhost:8889│ │ localhost:11434
└───────────────┘ └───────────────┘ └───────────────┘
```
## Configuration
### Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `LLM_PORT` | `8082` | Router port |
| `LLM_PROVIDER` | `claude` | Active provider: `claude`, `openai`, `ollama` |
| `CLAUDE_URL` | `http://localhost:8888` | Claude skill URL |
| `OPENAI_URL` | `http://localhost:8889` | OpenAI skill URL |
| `OLLAMA_URL` | `http://localhost:11434` | Ollama URL |
| `MEMORY_URL` | (none) | Memory skill URL for conversation persistence |
## API
### WebSocket Chat
Connect to `ws://localhost:8082/chat` for unified chat interface.
**Send message:**
```json
{
"type": "message",
"content": "Hello!",
"session_id": "optional-session-id"
}
```
**Receive:**
```json
{"type": "start", "session_id": "abc123"}
{"type": "token", "content": "Hello"}
{"type": "token", "content": "!"}
{"type": "end"}
```
### REST API
```bash
# Chat (proxied to provider)
curl http://localhost:8082/chat \
-H "Content-Type: application/json" \
-d '{"message": "Hello!"}'
# Execute (one-shot, proxied to provider)
curl http://localhost:8082/execute \
-H "Content-Type: application/json" \
-d '{"prompt": "List all files"}'
# Health check
curl http://localhost:8082/health
# Get current provider
curl http://localhost:8082/provider
```
## Provider Skills
Each provider skill implements its own API. The LLM router translates:
### Claude Skill (port 8888)
- `POST /chat` - `{"message": "...", "session_id": "..."}`
- `POST /execute` - `{"prompt": "..."}`
### OpenAI Skill (port 8889)
- `POST /v1/chat/completions` - OpenAI format
### Ollama (port 11434)
- `POST /api/chat` - Ollama format
## Switching Providers
```bash
# Use Claude (default)
LLM_PROVIDER=claude
# Use OpenAI
LLM_PROVIDER=openai
# Use Ollama
LLM_PROVIDER=ollama
```
Clients connect to `localhost:8082` - they don't need to know which provider is active.
## Tool Calling (Pass-through)
Tools are passed to the provider skill. When the LLM wants to call a tool:
1. LLM router sends tool definitions to provider
2. Provider returns tool call request
3. Router passes tool call to client via WebSocket
4. Client executes tool, sends result back
5. Router forwards result to provider
6. Provider continues conversation
```json
// Client receives
{"type": "tool_call", "name": "read_file", "arguments": {"path": "/etc/hosts"}}
// Client sends back
{"type": "tool_result", "name": "read_file", "result": "127.0.0.1 localhost..."}
```
## Conversation Memory
If `MEMORY_URL` is set, conversations are stored:
```bash
MEMORY_URL=http://localhost:8081
```
Each conversation is saved to the memory skill for later retrieval.