5 Commits

Author SHA1 Message Date
Mike Reeves 8225d41661 Harden postgres secrets, TLS enforcement, and admin tooling
- Deliver postgres super and app passwords via mounted 0600 secret files
  (POSTGRES_PASSWORD_FILE, SO_POSTGRES_PASS_FILE) instead of plaintext env
  vars visible in docker inspect output
- Mount a managed pg_hba.conf that only allows local trust and hostssl
  scram-sha-256 so TCP clients cannot negotiate cleartext sessions
- Restrict postgres.key to 0400 and ensure owner/group 939
- Set umask 0077 on so-postgres-backup output
- Validate host values in so-stats-show against [A-Za-z0-9._-] before SQL
  interpolation so a compromised minion cannot inject SQL via a tag value
- Coerce postgres:telegraf:retention_days to int before rendering into SQL
- Escape single quotes when rendering pillar values into postgresql.conf
- Own postgres tooling in /usr/sbin as root:root so a container escape
  cannot rewrite admin scripts
- Gate ES migration TLS verification on esVerifyCert (default false,
  matching the elastic module's existing pattern)
2026-04-20 12:36:17 -04:00
Mike Reeves d9a9029ce5 Adopt pg_partman + pg_cron for Telegraf metric tables
Every telegraf.* metric table is now a daily time-range partitioned
parent managed by pg_partman. Retention drops old partitions instead
of the row-by-row DELETE that so-telegraf-trim used to run nightly,
and dashboards will benefit from partition pruning at query time.

- Load pg_cron at server start via shared_preload_libraries and point
  cron.database_name at so_telegraf so job metadata lives alongside
  the metrics
- Telegraf create_templates override makes every new metric table a
  PARTITION BY RANGE (time) parent registered with partman.create_parent
  in one transaction (1 day interval, 3 premade)
- postgres_telegraf_group_role now also creates pg_partman and pg_cron
  extensions and schedules hourly partman.run_maintenance_proc
- New retention reconcile state updates partman.part_config.retention
  from postgres.telegraf.retention_days on every apply
- so_telegraf_trim cron is now unconditionally absent; script stays on
  disk as a manual fallback
2026-04-16 17:27:15 -04:00
Mike Reeves cefbe01333 Add telegraf_output selector for InfluxDB/Postgres dual-write
Introduces global.telegraf_output (INFLUXDB|POSTGRES|BOTH, default BOTH)
so Telegraf can write metrics to Postgres alongside or instead of
InfluxDB. Each minion authenticates with its own so_telegraf_<minion>
role and writes to a matching schema inside a shared so_telegraf
database, keeping blast radius per-credential to that minion's data.

- Per-minion credentials auto-generated and persisted in postgres/auth.sls
- postgres/telegraf_users.sls reconciles roles/schemas on every apply
- Firewall opens 5432 only to minion hostgroups when Postgres output is active
- Reactor on salt/auth + orch/telegraf_postgres_sync.sls provision new
  minions automatically on key accept
- soup post_to_3.1.0 backfills users for existing minions on upgrade
- so-show-stats prints latest CPU/mem/disk/load per minion for sanity checks
- so-telegraf-trim + nightly cron prune rows older than
  postgres.telegraf.retention_days (default 14)
2026-04-15 14:32:10 -04:00
Mike Reeves 46e38d39bb Enable postgres by default
Safe because postgres states are only applied to manager-type
nodes via top.sls and allowed_states.map.jinja.
2026-04-09 12:23:47 -04:00
Mike Reeves 868cd11874 Add so-postgres Salt states and integration wiring
Phase 1 of the PostgreSQL central data platform:
- Salt states: init, enabled, disabled, config, ssl, auth, sostatus
- TLS via SO CA-signed certs with postgresql.conf template
- Two-tier auth: postgres superuser + so_postgres application user
- Firewall restricts port 5432 to manager-only (HA-ready)
- Wired into top.sls, pillar/top.sls, allowed_states, firewall
  containers map, docker defaults, CA signing policies, and setup
  scripts for all manager-type roles
2026-04-08 10:58:52 -04:00