Commit Graph

7 Commits

Author SHA1 Message Date
Mike Reeves 470b3bd4da Comingle Telegraf metrics into shared schema
Per-minion schemas cause table count to explode (N minions * M metrics)
and the per-minion revocation story isn't worth it when retention is
short. Move all minions to a shared 'telegraf' schema while keeping
per-minion login credentials for audit.

- New so_telegraf NOLOGIN group role owns the telegraf schema; each
  per-minion role is a member and inherits insert/select via role
  inheritance
- Telegraf connection string uses options='-c role=so_telegraf' so
  tables auto-created on first write belong to the group role
- so-telegraf-trim walks the flat telegraf.* table set instead of
  per-minion schemas
- so-stats-show filters by host tag; CLI arg is now the hostname as
  tagged by Telegraf rather than a sanitized schema suffix
- Also renames so-show-stats -> so-stats-show
2026-04-16 15:40:54 -04:00
Mike Reeves d24808ff98 Fix so-show-stats tag column resolution
Telegraf's postgresql output stores tag values either as individual
columns on <metric>_tag or as a single JSONB 'tags' column, depending
on plugin version. Introspect information_schema.columns and build the
right accessor per tag instead of assuming one layout.
2026-04-15 19:28:10 -04:00
Mike Reeves cefbe01333 Add telegraf_output selector for InfluxDB/Postgres dual-write
Introduces global.telegraf_output (INFLUXDB|POSTGRES|BOTH, default BOTH)
so Telegraf can write metrics to Postgres alongside or instead of
InfluxDB. Each minion authenticates with its own so_telegraf_<minion>
role and writes to a matching schema inside a shared so_telegraf
database, keeping blast radius per-credential to that minion's data.

- Per-minion credentials auto-generated and persisted in postgres/auth.sls
- postgres/telegraf_users.sls reconciles roles/schemas on every apply
- Firewall opens 5432 only to minion hostgroups when Postgres output is active
- Reactor on salt/auth + orch/telegraf_postgres_sync.sls provision new
  minions automatically on key accept
- soup post_to_3.1.0 backfills users for existing minions on upgrade
- so-show-stats prints latest CPU/mem/disk/load per minion for sanity checks
- so-telegraf-trim + nightly cron prune rows older than
  postgres.telegraf.retention_days (default 14)
2026-04-15 14:32:10 -04:00
Mike Reeves da1045e052 Fix init-users.sh password escaping for special characters
Use format() with %L for SQL literal escaping instead of raw
string interpolation. Also ALTER ROLE if user already exists
to keep password in sync with pillar.
2026-04-09 21:52:20 -04:00
Mike Reeves 46e38d39bb Enable postgres by default
Safe because postgres states are only applied to manager-type
nodes via top.sls and allowed_states.map.jinja.
2026-04-09 12:23:47 -04:00
Mike Reeves 762e73faf5 Add so-postgres host management scripts
- so-postgres-manage: wraps docker exec for psql operations
  (sql, sqlfile, shell, dblist, userlist)
- so-postgres-start/stop/restart: standard container lifecycle
- Scripts installed to /usr/sbin via file.recurse in config.sls
2026-04-09 09:55:42 -04:00
Mike Reeves 868cd11874 Add so-postgres Salt states and integration wiring
Phase 1 of the PostgreSQL central data platform:
- Salt states: init, enabled, disabled, config, ssl, auth, sostatus
- TLS via SO CA-signed certs with postgresql.conf template
- Two-tier auth: postgres superuser + so_postgres application user
- Firewall restricts port 5432 to manager-only (HA-ready)
- Wired into top.sls, pillar/top.sls, allowed_states, firewall
  containers map, docker defaults, CA signing policies, and setup
  scripts for all manager-type roles
2026-04-08 10:58:52 -04:00