The so-postgres-backup script and its cron were living under
salt/backup/config_backup.sls, which meant the backup script and cron
were deployed independently of whether postgres was enabled/disabled.
- Relocate salt/backup/tools/sbin/so-postgres-backup to
salt/postgres/tools/sbin/so-postgres-backup so the existing
postgres_sbin file.recurse in postgres/config.sls picks it up with
everything else — no separate file.managed needed.
- Remove postgres_backup_script and so_postgres_backup from
salt/backup/config_backup.sls.
- Add cron.present for so_postgres_backup to salt/postgres/enabled.sls
and the matching cron.absent to salt/postgres/disabled.sls so the
cron follows the container's lifecycle.
The Telegraf backend selector lived at global.telegraf_output but it is
a Telegraf-scoped setting, not a cross-cutting grid global. Move both
the value and the UI annotation under the telegraf pillar so it shows
up alongside the other Telegraf tuning knobs in the Configuration UI.
- salt/telegraf/defaults.yaml: add telegraf.output: BOTH
- salt/telegraf/soc_telegraf.yaml: add telegraf.output annotation
- salt/global/defaults.yaml: remove global.telegraf_output
- salt/global/soc_global.yaml: remove global.telegraf_output annotation
- salt/vars/globals.map.jinja: drop telegraf_output from GLOBALS
- salt/firewall/map.jinja: read via pillar.get('telegraf:output')
- salt/postgres/telegraf_users.sls: read via pillar.get('telegraf:output')
- salt/telegraf/etc/telegraf.conf: read via TELEGRAFMERGED.output
- salt/postgres/tools/sbin/so-stats-show: update user-facing docs
No behavioral change — default stays BOTH.
- Deliver postgres super and app passwords via mounted 0600 secret files
(POSTGRES_PASSWORD_FILE, SO_POSTGRES_PASS_FILE) instead of plaintext env
vars visible in docker inspect output
- Mount a managed pg_hba.conf that only allows local trust and hostssl
scram-sha-256 so TCP clients cannot negotiate cleartext sessions
- Restrict postgres.key to 0400 and ensure owner/group 939
- Set umask 0077 on so-postgres-backup output
- Validate host values in so-stats-show against [A-Za-z0-9._-] before SQL
interpolation so a compromised minion cannot inject SQL via a tag value
- Coerce postgres:telegraf:retention_days to int before rendering into SQL
- Escape single quotes when rendering pillar values into postgresql.conf
- Own postgres tooling in /usr/sbin as root:root so a container escape
cannot rewrite admin scripts
- Gate ES migration TLS verification on esVerifyCert (default false,
matching the elastic module's existing pattern)
High-cardinality inputs (docker, procstat, kafka) trigger ALTER TABLE
ADD COLUMN on every new field name, and with all minions writing into
a shared 'telegraf' schema the metric tables hit Postgres's 1600-column
per-table ceiling quickly. Setting fields_as_jsonb and tags_as_jsonb on
the postgresql output keeps metric tables fixed at (time, tag_id,
fields jsonb) and tag tables at (tag_id, tags jsonb).
- so-stats-show rewritten to use JSONB accessors
((fields->>'x')::numeric, tags->>'host', etc.) and cast memory/disk
sizes to bigint so pg_size_pretty works
- Drop regex/regexFailureMessage from telegraf_output SOC UI entry to
match the convention upstream used when removing them from
mdengine/pcapengine/pipeline; options: list drives validation
Per-minion schemas cause table count to explode (N minions * M metrics)
and the per-minion revocation story isn't worth it when retention is
short. Move all minions to a shared 'telegraf' schema while keeping
per-minion login credentials for audit.
- New so_telegraf NOLOGIN group role owns the telegraf schema; each
per-minion role is a member and inherits insert/select via role
inheritance
- Telegraf connection string uses options='-c role=so_telegraf' so
tables auto-created on first write belong to the group role
- so-telegraf-trim walks the flat telegraf.* table set instead of
per-minion schemas
- so-stats-show filters by host tag; CLI arg is now the hostname as
tagged by Telegraf rather than a sanitized schema suffix
- Also renames so-show-stats -> so-stats-show
Telegraf's postgresql output stores tag values either as individual
columns on <metric>_tag or as a single JSONB 'tags' column, depending
on plugin version. Introspect information_schema.columns and build the
right accessor per tag instead of assuming one layout.
Introduces global.telegraf_output (INFLUXDB|POSTGRES|BOTH, default BOTH)
so Telegraf can write metrics to Postgres alongside or instead of
InfluxDB. Each minion authenticates with its own so_telegraf_<minion>
role and writes to a matching schema inside a shared so_telegraf
database, keeping blast radius per-credential to that minion's data.
- Per-minion credentials auto-generated and persisted in postgres/auth.sls
- postgres/telegraf_users.sls reconciles roles/schemas on every apply
- Firewall opens 5432 only to minion hostgroups when Postgres output is active
- Reactor on salt/auth + orch/telegraf_postgres_sync.sls provision new
minions automatically on key accept
- soup post_to_3.1.0 backfills users for existing minions on upgrade
- so-show-stats prints latest CPU/mem/disk/load per minion for sanity checks
- so-telegraf-trim + nightly cron prune rows older than
postgres.telegraf.retention_days (default 14)