142 lines
3.8 KiB
Markdown
142 lines
3.8 KiB
Markdown
---
|
|
name: llm
|
|
description: LLM router that proxies to provider skills (claude, openai, ollama)
|
|
metadata:
|
|
version: "1.0.0"
|
|
vibestack:
|
|
main: false
|
|
---
|
|
|
|
# LLM Skill
|
|
|
|
Unified LLM router that proxies requests to provider-specific skills. Abstracts away which LLM backend is being used.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌─────────────┐ ┌─────────────┐
|
|
│ client │────▶│ llm │ (router)
|
|
└─────────────┘ └──────┬──────┘
|
|
│
|
|
┌──────────────────┼──────────────────┐
|
|
▼ ▼ ▼
|
|
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
|
|
│ claude skill │ │ openai skill │ │ ollama skill │
|
|
│ localhost:8888│ │ localhost:8889│ │ localhost:11434
|
|
└───────────────┘ └───────────────┘ └───────────────┘
|
|
```
|
|
|
|
## Configuration
|
|
|
|
### Environment Variables
|
|
|
|
| Variable | Default | Description |
|
|
|----------|---------|-------------|
|
|
| `LLM_PORT` | `8082` | Router port |
|
|
| `LLM_PROVIDER` | `claude` | Active provider: `claude`, `openai`, `ollama` |
|
|
| `CLAUDE_URL` | `http://localhost:8888` | Claude skill URL |
|
|
| `OPENAI_URL` | `http://localhost:8889` | OpenAI skill URL |
|
|
| `OLLAMA_URL` | `http://localhost:11434` | Ollama URL |
|
|
| `MEMORY_URL` | (none) | Memory skill URL for conversation persistence |
|
|
|
|
## API
|
|
|
|
### WebSocket Chat
|
|
|
|
Connect to `ws://localhost:8082/chat` for unified chat interface.
|
|
|
|
**Send message:**
|
|
```json
|
|
{
|
|
"type": "message",
|
|
"content": "Hello!",
|
|
"session_id": "optional-session-id"
|
|
}
|
|
```
|
|
|
|
**Receive:**
|
|
```json
|
|
{"type": "start", "session_id": "abc123"}
|
|
{"type": "token", "content": "Hello"}
|
|
{"type": "token", "content": "!"}
|
|
{"type": "end"}
|
|
```
|
|
|
|
### REST API
|
|
|
|
```bash
|
|
# Chat (proxied to provider)
|
|
curl http://localhost:8082/chat \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"message": "Hello!"}'
|
|
|
|
# Execute (one-shot, proxied to provider)
|
|
curl http://localhost:8082/execute \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"prompt": "List all files"}'
|
|
|
|
# Health check
|
|
curl http://localhost:8082/health
|
|
|
|
# Get current provider
|
|
curl http://localhost:8082/provider
|
|
```
|
|
|
|
## Provider Skills
|
|
|
|
Each provider skill implements its own API. The LLM router translates:
|
|
|
|
### Claude Skill (port 8888)
|
|
- `POST /chat` - `{"message": "...", "session_id": "..."}`
|
|
- `POST /execute` - `{"prompt": "..."}`
|
|
|
|
### OpenAI Skill (port 8889)
|
|
- `POST /v1/chat/completions` - OpenAI format
|
|
|
|
### Ollama (port 11434)
|
|
- `POST /api/chat` - Ollama format
|
|
|
|
## Switching Providers
|
|
|
|
```bash
|
|
# Use Claude (default)
|
|
LLM_PROVIDER=claude
|
|
|
|
# Use OpenAI
|
|
LLM_PROVIDER=openai
|
|
|
|
# Use Ollama
|
|
LLM_PROVIDER=ollama
|
|
```
|
|
|
|
Clients connect to `localhost:8082` - they don't need to know which provider is active.
|
|
|
|
## Tool Calling (Pass-through)
|
|
|
|
Tools are passed to the provider skill. When the LLM wants to call a tool:
|
|
|
|
1. LLM router sends tool definitions to provider
|
|
2. Provider returns tool call request
|
|
3. Router passes tool call to client via WebSocket
|
|
4. Client executes tool, sends result back
|
|
5. Router forwards result to provider
|
|
6. Provider continues conversation
|
|
|
|
```json
|
|
// Client receives
|
|
{"type": "tool_call", "name": "read_file", "arguments": {"path": "/etc/hosts"}}
|
|
|
|
// Client sends back
|
|
{"type": "tool_result", "name": "read_file", "result": "127.0.0.1 localhost..."}
|
|
```
|
|
|
|
## Conversation Memory
|
|
|
|
If `MEMORY_URL` is set, conversations are stored:
|
|
|
|
```bash
|
|
MEMORY_URL=http://localhost:8081
|
|
```
|
|
|
|
Each conversation is saved to the memory skill for later retrieval.
|