121 lines
2.4 KiB
Markdown
121 lines
2.4 KiB
Markdown
---
|
|
name: duckdb
|
|
description: DuckDB embedded analytical database with HTTP API
|
|
metadata:
|
|
version: "1.0.0"
|
|
vibestack:
|
|
main: false
|
|
---
|
|
|
|
# DuckDB Skill
|
|
|
|
[DuckDB](https://duckdb.org/) - fast in-process analytical database with a simple HTTP API.
|
|
|
|
## Features
|
|
|
|
- Embedded OLAP database (no separate server process)
|
|
- Query CSV, Parquet, JSON files directly
|
|
- SQL interface via HTTP API
|
|
- Persistent storage option
|
|
- Auto-registers with Caddy if present
|
|
|
|
## Configuration
|
|
|
|
| Variable | Description | Default |
|
|
|----------|-------------|---------|
|
|
| `DUCKDB_PORT` | HTTP API port | `8432` |
|
|
| `DUCKDB_DATABASE` | Database file path | `:memory:` |
|
|
| `DUCKDB_DATA_DIR` | Directory for data files | `/data/duckdb` |
|
|
| `DUCKDB_DOMAIN` | Domain for Caddy auto-config | (none) |
|
|
| `DUCKDB_READ_ONLY` | Read-only mode | `false` |
|
|
|
|
## HTTP API
|
|
|
|
### Execute Query
|
|
|
|
```bash
|
|
# Simple query
|
|
curl -X POST http://localhost:8432/query \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"sql": "SELECT 1 + 1 AS result"}'
|
|
|
|
# Query CSV file
|
|
curl -X POST http://localhost:8432/query \
|
|
-d '{"sql": "SELECT * FROM read_csv_auto(\"/data/duckdb/sales.csv\") LIMIT 10"}'
|
|
|
|
# Query Parquet file
|
|
curl -X POST http://localhost:8432/query \
|
|
-d '{"sql": "SELECT * FROM read_parquet(\"/data/duckdb/events.parquet\")"}'
|
|
```
|
|
|
|
### Response Format
|
|
|
|
```json
|
|
{
|
|
"success": true,
|
|
"columns": ["result"],
|
|
"rows": [[2]],
|
|
"row_count": 1,
|
|
"time_ms": 0.5
|
|
}
|
|
```
|
|
|
|
## Use Cases
|
|
|
|
### Analytics on Log Data
|
|
|
|
```sql
|
|
-- Query JSON logs
|
|
SELECT
|
|
json_extract_string(line, '$.level') as level,
|
|
count(*) as count
|
|
FROM read_json_auto('/var/log/supervisor/*.log')
|
|
GROUP BY level;
|
|
```
|
|
|
|
### Query Remote Data
|
|
|
|
```sql
|
|
-- Query remote Parquet (S3, HTTP)
|
|
SELECT * FROM read_parquet('https://example.com/data.parquet');
|
|
|
|
-- Query remote CSV
|
|
SELECT * FROM read_csv_auto('https://example.com/data.csv');
|
|
```
|
|
|
|
### Create Persistent Tables
|
|
|
|
```sql
|
|
-- Create table
|
|
CREATE TABLE events AS
|
|
SELECT * FROM read_parquet('/data/duckdb/events.parquet');
|
|
|
|
-- Query table
|
|
SELECT date_trunc('hour', timestamp) as hour, count(*)
|
|
FROM events
|
|
GROUP BY 1 ORDER BY 1;
|
|
```
|
|
|
|
## CLI Access
|
|
|
|
```bash
|
|
# Interactive shell
|
|
duckdb /data/duckdb/analytics.db
|
|
|
|
# One-off query
|
|
duckdb -c "SELECT * FROM 'data.csv' LIMIT 5"
|
|
```
|
|
|
|
## Extensions
|
|
|
|
DuckDB supports extensions for additional functionality:
|
|
|
|
```sql
|
|
-- Install and load extensions
|
|
INSTALL httpfs;
|
|
LOAD httpfs;
|
|
|
|
-- Now query S3/HTTP directly
|
|
SELECT * FROM read_parquet('s3://bucket/data.parquet');
|
|
```
|