Files
duckdb/SKILL.md
2026-02-02 22:37:55 +01:00

121 lines
2.4 KiB
Markdown

---
name: duckdb
description: DuckDB embedded analytical database with HTTP API
metadata:
version: "1.0.0"
vibestack:
main: false
---
# DuckDB Skill
[DuckDB](https://duckdb.org/) - fast in-process analytical database with a simple HTTP API.
## Features
- Embedded OLAP database (no separate server process)
- Query CSV, Parquet, JSON files directly
- SQL interface via HTTP API
- Persistent storage option
- Auto-registers with Caddy if present
## Configuration
| Variable | Description | Default |
|----------|-------------|---------|
| `DUCKDB_PORT` | HTTP API port | `8432` |
| `DUCKDB_DATABASE` | Database file path | `:memory:` |
| `DUCKDB_DATA_DIR` | Directory for data files | `/data/duckdb` |
| `DUCKDB_DOMAIN` | Domain for Caddy auto-config | (none) |
| `DUCKDB_READ_ONLY` | Read-only mode | `false` |
## HTTP API
### Execute Query
```bash
# Simple query
curl -X POST http://localhost:8432/query \
-H "Content-Type: application/json" \
-d '{"sql": "SELECT 1 + 1 AS result"}'
# Query CSV file
curl -X POST http://localhost:8432/query \
-d '{"sql": "SELECT * FROM read_csv_auto(\"/data/duckdb/sales.csv\") LIMIT 10"}'
# Query Parquet file
curl -X POST http://localhost:8432/query \
-d '{"sql": "SELECT * FROM read_parquet(\"/data/duckdb/events.parquet\")"}'
```
### Response Format
```json
{
"success": true,
"columns": ["result"],
"rows": [[2]],
"row_count": 1,
"time_ms": 0.5
}
```
## Use Cases
### Analytics on Log Data
```sql
-- Query JSON logs
SELECT
json_extract_string(line, '$.level') as level,
count(*) as count
FROM read_json_auto('/var/log/supervisor/*.log')
GROUP BY level;
```
### Query Remote Data
```sql
-- Query remote Parquet (S3, HTTP)
SELECT * FROM read_parquet('https://example.com/data.parquet');
-- Query remote CSV
SELECT * FROM read_csv_auto('https://example.com/data.csv');
```
### Create Persistent Tables
```sql
-- Create table
CREATE TABLE events AS
SELECT * FROM read_parquet('/data/duckdb/events.parquet');
-- Query table
SELECT date_trunc('hour', timestamp) as hour, count(*)
FROM events
GROUP BY 1 ORDER BY 1;
```
## CLI Access
```bash
# Interactive shell
duckdb /data/duckdb/analytics.db
# One-off query
duckdb -c "SELECT * FROM 'data.csv' LIMIT 5"
```
## Extensions
DuckDB supports extensions for additional functionality:
```sql
-- Install and load extensions
INSTALL httpfs;
LOAD httpfs;
-- Now query S3/HTTP directly
SELECT * FROM read_parquet('s3://bucket/data.parquet');
```