The active-push feature detected pillar/settings changes via an inotify
beacon on the manager watching /opt/so/saltstack/local/pillar. Replace
that pillar watch with a custom salt beacon (pillar_db) that polls the
SOC so_soc.audit_settings table on a monotonic id watermark, so changes
made through SOC drive immediate pushes from the database instead of the
files. The suricata/strelka rule inotify watches (and pyinotify) are kept
unchanged, since rule-file edits are not recorded in audit_settings.
- salt/_beacons/pillar_db.py: new beacon. Polls audit_settings via
`docker exec so-postgres psql` (unix-socket trust auth), tracks the last
processed id in /opt/so/state/pillar_db_watch.id, seeds to MAX(id) on
first run (no history replay), and emits one event per new row.
- salt/reactor/push_pillar.sls: consume setting_id/node_id from the beacon
event instead of a file path. App = first dotted segment of setting_id,
looked up in pillar_push_map.yaml. Empty node_id -> grid-wide actions as
is; populated node_id -> the app's state(s) retargeted to that one node.
- salt/manager/files/beacons_pushstate.conf.jinja: drop the pillar inotify
block, add the pillar_db beacon (interval = push.drain_interval); keep
the suricata/strelka inotify watches.
- salt/salt/files/reactor_pushstate.conf: map salt/beacon/*/pillar_db/
audit_settings to push_pillar.sls; remove the pillar inotify reactor
lines; keep suricata/strelka.
The intent -> so-push-drainer -> orch.push_batch pipeline is unchanged.
Verified end-to-end on a standalone: a grid-wide telegraf.output change
re-applied telegraf fleetwide (container replaced), and a per-host
ntp.config.servers change applied ntp to only that node.
Before removing from apply_hotfix function first verify that older installs < 3.1.0 are still upgradable when referencing 'so/0013_input_lumberjack_fleet.conf' via pillar. Failure to do so will prevent logstash from starting
The dump pipeline returned gzip's exit status, so a pg_dumpall that
died mid-stream still produced a valid .gz holding a truncated dump,
written straight to the final filename. The idempotency check then
blocked retries for the day and the corrupt file counted toward
retention, evicting a good backup each day until none remained.
- set -o pipefail so a failed pg_dumpall fails the pipeline
- dump to a .tmp file and atomically rename only after success, so
the final filename appears only for a complete backup
- gzip -t integrity check before publishing
- trap-based cleanup of the temp file; sweep stale temps at startup
- run retention only after a successful backup, with a glob
restricted to finished backups
- log timestamped OK/ERROR outcomes to /opt/so/log/postgres/backup.log