Two coupled changes that together let so_pillar.* be the canonical
config store, with config edits driving service reloads automatically:
so-yaml PG-canonical mode
- Adds /opt/so/conf/so-yaml/mode (and SO_YAML_BACKEND env override) with
three values: dual (legacy), postgres (PG-only for managed paths),
disk (emergency rollback). Bootstrap files (secrets.sls, ca/init.sls,
*.nodes.sls, top.sls, ...) stay disk-only regardless via the existing
SkipPath allowlist in so_yaml_postgres.locate.
- loadYaml/writeYaml/purgeFile now route to so_pillar.* in postgres
mode: replace/add/get all read+write the database with no disk file
ever appearing. PG failure is fatal in postgres mode (no silent
fallback); dual mode preserves the prior best-effort mirror.
- so_yaml_postgres gains read_yaml(path), is_pg_managed(path), and
is_enabled() so so-yaml can answer "is this path PG-managed and is
PG up" without reaching into private helpers.
- schema_pillar.sls writes /opt/so/conf/so-yaml/mode = postgres after
the importer succeeds, so flipping postgres:so_pillar:enabled flips
so-yaml's behavior in lockstep with the schema being live.
pg_notify-driven change fan-out
- 008_change_notify.sql adds so_pillar.change_queue + an AFTER trigger
on pillar_entry that enqueues the locator and pg_notifies
'so_pillar_change'. Queue is drained at-least-once so engine restarts
don't lose events; pg_notify is just the wakeup signal.
- New salt-master engine pg_notify_pillar.py LISTENs on the channel,
drains the queue with FOR UPDATE SKIP LOCKED, debounces bursts, and
fires 'so/pillar/changed' events grouped by (scope, role, minion).
- Reactor so_pillar_changed.sls catches the tag and dispatches to
orch.so_pillar_reload, which carries a DISPATCH map of pillar-path
prefix -> (state sls, role grain set) so adding a new service to
the auto-reload list is a one-line edit instead of a new reactor.
- Engine + reactor wiring is gated on the same postgres:so_pillar:enabled
flag as the schema and ext_pillar config so the whole stack flips
on/off together.
Tests: 21 new cases (112 total, all passing) covering mode resolution,
PG-managed detection, and PG-canonical read/write/purge routing with
the PG client stubbed.
Hooks every so-yaml.py write through a new so_yaml_postgres helper that
mirrors disk YAML mutations into so_pillar.pillar_entry via docker exec
psql. Disk remains canonical during the transition; PG mirror failures
are logged only when a real write error occurs (skipped paths and
postgres-unreachable cases stay silent so existing callers don't see
new noise on stderr).
Adds a `purge YAML_FILE` verb on so-yaml that deletes the file from
disk and removes the matching pillar_entry rows. For minion files it
also drops the so_pillar.minion row, which CASCADEs to pillar_entry +
role_member. Designed for so-minion's delete path (replaces rm -f) so
the audit log captures the deletion.
setup/so-functions::generate_passwords + secrets_pillar generate
secrets:pillar_master_pass and /opt/so/conf/postgres/so_pillar.key on
fresh installs, and append the password to existing secrets.sls files
on upgrade.
- salt/manager/tools/sbin/so_yaml_postgres.py: locate(), write_yaml(),
purge_yaml(), and a small CLI for diagnostics. Skips bootstrap and
mine-driven paths via the same allowlist used by so-pillar-import.
- salt/manager/tools/sbin/so-yaml.py: import the helper, hook
writeYaml() to mirror after every disk write, add purgeFile() and
the purge verb.
- salt/manager/tools/sbin/so-yaml_test.py: 16 new tests covering the
purge verb and the path-locator / write contract of so_yaml_postgres
without contacting Postgres. All 91 tests pass.
- setup/so-functions: generate_passwords adds PILLARMASTERPASS and
SO_PILLAR_KEY; secrets_pillar writes pillar_master_pass and the
pgcrypto master key file.
Idempotent importer that schema_pillar.sls runs once at end of postgres
state on first install, and that so-minion can call per-minion on add /
delete. UPSERTs into so_pillar.pillar_entry; the audit trigger handles
versioning so re-runs without SLS edits produce no version bumps.
Connects via docker exec so-postgres psql, so no DSN config is required
at first-install time. Skips bootstrap files (secrets.sls, postgres/
auth.sls, etc.), mine-driven nodes.sls files, and any file containing
Jinja templates — those stay disk-authoritative and ext_pillar_first:
False means they render before the PG overlay.
Auto-syncs to /usr/sbin via the existing manager_sbin file.recurse.
The manager's /etc/salt/minion (written by so-functions:configure_minion)
has no file_roots, so salt-call --local falls back to Salt's default
/srv/salt and fails with "No matching sls found for 'postgres.telegraf_users'
in env 'base'". || true was silently swallowing the error, which meant the
DB roles for the pillar entries just populated by the so-telegraf-cred
backfill loop never actually got created.
Route through salt-master instead; its file_roots already points at the
default/local salt trees.
pillar/top.sls now references postgres.soc_postgres / postgres.adv_postgres
unconditionally, but make_some_dirs only runs at install time so managers
upgrading from 3.0.0 have no local/pillar/postgres/ and salt-master fails
pillar render on the first post-upgrade restart. Similarly, secrets_pillar
is a no-op on upgrade (secrets.sls already exists), so secrets:postgres_pass
never gets seeded and the postgres container's POSTGRES_PASSWORD_FILE and
SOC's PG_ADMIN_PASS would land empty after highstate.
Add ensure_postgres_local_pillar and ensure_postgres_secret to up_to_3.1.0
so the stubs and secret exist before masterlock/salt-master restart. Both
are idempotent and safe to re-run.
Exercises the FileNotFoundError and generic-exception branches added to
loadYaml in the previous commit, restoring 100% coverage required by
the build.
so-telegraf-cred was committed with mode 644, causing
`so-telegraf-cred add "$MINION_ID"` in so-minion's add_telegraf_to_minion
to fail with "Permission denied" and log "Failed to provision postgres
telegraf cred for <minion>". Mark it executable.
Also bail early in seed_creds_file if mkdir/printf/chmod fail, and in
so-yaml.py loadYaml surface a clear stderr message with the filename
instead of an unhandled FileNotFoundError traceback.
Swap the ~150-line Python implementation for a 48-line bash script that
delegates YAML mutation to so-yaml.py — the same helper so-minion and
soup already use. Same semantics: seed the creds pillar on first use,
idempotent add, silent remove.
SO minion ids are dot-free by construction (setup/so-functions:1884
strips everything after the first '.'), so using the raw id as the
so-yaml.py key path is safe.
The old flow had two writers for each per-minion Telegraf password
(so-minion wrote the minion pillar; postgres.auth regenerated any
missing aggregate entries). They drifted on first-boot and there was
no trigger to create DB roles when a new minion joined.
Split responsibilities:
- pillar/postgres/auth.sls (manager-scoped) keeps only the so_postgres
admin cred.
- pillar/telegraf/creds.sls (grid-wide) holds a {minion_id: {user,
pass}} map, shadowed per-install by the local-pillar copy.
- salt/manager/tools/sbin/so-telegraf-cred is the single writer:
flock, atomic YAML write, PyYAML safe_dump so passwords never
round-trip through so-yaml.py's type coercion. Idempotent add, quiet
remove.
- so-minion's add/remove hooks now shell out to so-telegraf-cred
instead of editing pillar files directly.
- postgres.telegraf_users iterates the new pillar key and CREATE/ALTERs
roles from it; telegraf.conf reads its own entry via grains.id.
- orch.deploy_newnode runs postgres.telegraf_users on the manager and
refreshes the new minion's pillar before the new node highstates,
so the DB role is in place the first time telegraf tries to connect.
- soup's post_to_3.1.0 backfills the creds pillar from accepted salt
keys (idempotent) and runs postgres.telegraf_users once to reconcile
the DB.
The reactor path is gone; so-minion now owns add/delete for new
minions. The backfill itself is unchanged — postgres.auth's up_minions
fallback fills the aggregate, postgres.telegraf_users creates the
roles, and the bash loop fans to per-minion pillar files — so the
pre-feature upgrade story still works end-to-end. Just refresh the
comment so it isn't misleading.
Paired with the add path in add_telegraf_to_minion: when a minion is
removed, drop its entry from the aggregate postgres pillar and drop the
matching so_telegraf_<safe> role from the database. Without this, stale
entries and DB roles accumulate over time.
Makes rotate-password and compromise-recovery both a clean delete+add:
so-minion -o=delete -m=<id>
so-minion -o=add -m=<id>
The first call drops the role and clears the aggregate pillar; the
second generates a brand-new password.
The cleanup is best-effort — if so-postgres isn't running or the DROP
ROLE fails (e.g., the role owns unexpected objects), we log a warning
and continue so the minion delete itself never gets blocked by postgres
state. Admins can mop up stray roles manually if that happens.
Simpler, race-free replacement for the reactor + orch + fan-out chain.
- salt/manager/tools/sbin/so-minion: expand add_telegraf_to_minion to
generate a random 72-char password, reuse any existing password from
the aggregate pillar, write postgres.telegraf.{user,pass} into the
minion's own pillar file, and update the aggregate pillar so
postgres.telegraf_users can CREATE ROLE on the next manager apply.
Every create<ROLE> function already calls this hook, so add / addVM /
setup dispatches are all covered identically and synchronously.
- salt/postgres/auth.sls: strip the fanout_targets loop and the
postgres_telegraf_minion_pillar_<safe> cmd.run block — it's now
redundant. The state still manages the so_postgres admin user and
writes the aggregate pillar for postgres.telegraf_users to consume.
- salt/reactor/telegraf_user_sync.sls: deleted.
- salt/orch/telegraf_postgres_sync.sls: deleted.
- salt/salt/master.sls: drop the reactor_config_telegraf block that
registered the reactor on /etc/salt/master.d/reactor_telegraf.conf.
- salt/orch/deploy_newnode.sls: drop the manager_fanout_postgres_telegraf
step and the require: it added to the newnode highstate. Back to its
original 3/dev shape.
No more ephemeral postgres_fanout_minion pillar, no more async salt/key
reactor, no more so-minion setupMinionFiles race: the pillar write
happens inline inside setupMinionFiles itself.
replace calls removeKey before addKey, so running `so-yaml.py replace`
on a new dotted key whose parent doesn't exist — e.g., postgres.auth
fanning postgres.telegraf.user into a minion pillar file that has
never carried any postgres.* keys — crashed with
KeyError: 'postgres'
from removeKey recursing into a missing parent dict.
Make removeKey a no-op when an intermediate key is absent so that:
- `remove` has the natural "remove if exists" semantics, and
- `replace` works for brand-new nested keys.
postgres.auth was running an `unless` shell check per up-minion on every
manager highstate, even when nothing had changed — N fork+python starts
of so-yaml.py add up on large grids. The work is only needed when a
specific minion's key is accepted.
- salt/postgres/auth.sls: fan out only when postgres_fanout_minion
pillar is set (targets that single minion). Manager highstates with
no pillar take a zero-N code path.
- salt/reactor/telegraf_user_sync.sls: re-pass the accepted minion id
as postgres_fanout_minion to the orch.
- salt/orch/telegraf_postgres_sync.sls: forward the pillar to the
salt.state invocation so the state render sees it.
- salt/manager/tools/sbin/soup: for the one-time 3.1.0 backfill, drop
the per-minion state.apply and do an in-shell loop over the minion
pillar files using so-yaml.py directly. Skips minions that already
have postgres.telegraf.user set.
state.apply takes a single mods argument; space-separated names are not
a list, so `state.apply postgres.auth postgres.telegraf_users` was only
applying postgres.auth and silently dropping the telegraf_users state.
Use comma-separated mods and add queue=True to match the rest of soup.
feature/postgres had rewritten the 3.1.0 upgrade block, dropping the
elastic upgrade work 3/dev landed for 9.0.8→9.3.3: elasticsearch_backup_index_templates,
the component template state cleanup, and the /usr/sbin/so-kibana-space-defaults
post-upgrade call. It also carried an older ES upgrade mapping
(8.18.8→9.0.8) that was superseded on 3/dev (9.0.8→9.3.3 for
3.0.0-20260331), and a handful of latent shell-quoting regressions in
verify_es_version_compatibility and the intermediate-upgrade helpers.
Adopt the 3/dev soup verbatim and only add the new Telegraf Postgres
provisioning to post_to_3.1.0 on top of so-kibana-space-defaults.
Introduces global.telegraf_output (INFLUXDB|POSTGRES|BOTH, default BOTH)
so Telegraf can write metrics to Postgres alongside or instead of
InfluxDB. Each minion authenticates with its own so_telegraf_<minion>
role and writes to a matching schema inside a shared so_telegraf
database, keeping blast radius per-credential to that minion's data.
- Per-minion credentials auto-generated and persisted in postgres/auth.sls
- postgres/telegraf_users.sls reconciles roles/schemas on every apply
- Firewall opens 5432 only to minion hostgroups when Postgres output is active
- Reactor on salt/auth + orch/telegraf_postgres_sync.sls provision new
minions automatically on key accept
- soup post_to_3.1.0 backfills users for existing minions on upgrade
- so-show-stats prints latest CPU/mem/disk/load per minion for sanity checks
- so-telegraf-trim + nightly cron prune rows older than
postgres.telegraf.retention_days (default 14)
Simplifies salt states, map files, and modules to only support
Oracle Linux 9, removing all Debian/Ubuntu/CentOS/Rocky/AlmaLinux/RHEL
conditional branches.
Security Onion now exclusively supports Oracle Linux 9. This removes
detection, setup, and update logic for Ubuntu, Debian, CentOS, Rocky,
AlmaLinux, and RHEL.
Consolidate version checks to use regex patterns for 2.4.21X and 3.x
versions. Add migrate_pcap_to_suricata to move pcap.enabled to
suricata.pcap.enabled in minion and pcap pillar files during upgrade.