On a fresh install the surirulesync file.recurse creates .gitkeep before
SOC has generated all-rulesets.rules. That change satisfied the
surirulereload onchanges requisite, so the reload ran with no ruleset
present, failed to stat the file, and reported the state (and install)
as failed.
Add an onlyif guard so the reload only runs when all-rulesets.rules
exists. A .gitkeep-only sync now leaves the state a clean success
(onlyif condition false); once SOC writes the ruleset, the reload fires
normally.
Route the reload/verify output (ours plus so-common's retry/fail lines)
through a synchronous timestamping pipeline so every line in reload.log
is prefixed with a date/time, and preserve the real exit code via
PIPESTATUS.
Treating an in-progress reload as instant success could report success
while Suricata was still running a stale ruleset (the in-flight reload
may have started before the new all-rulesets.rules was written).
Make success conditional on Suricata actually having loaded the current
ruleset: capture the rules-file mtime up front, trigger a blocking
reload-rules, then query ruleset-reload-time and only succeed when
last_reload >= mtime. An in-progress reload now retries (waits for it to
clear so our own fresh reload runs) instead of short-circuiting, and a
ruleset that never catches up within the retry window fails via fail().
Also drop the redundant ruleset-reload-nonblocking call (the verified
blocking reload is authoritative and the async call was what left a
reload running) and log human-readable timestamps.
so-suricata-reload-rules failed the surirulereload state when a rule
reload was already running: suricatasc returns
{"message":"Reload already in progress","return":"NOK"}, which never
matched the expected output, so retry looped all 60 attempts (~3 min)
and called fail.
Wrap the suricatasc calls so an in-progress reload is treated as
success (the in-flight reload picks up the new rules) while genuine
container-not-ready conditions still retry and ultimately fail.
Rename the two custom push-detection beacons for clarity:
- pillar_db -> postgres_pillar_beacon
- rules_db -> rules_beacon
Salt resolves a beacon by its config-key name to a _beacons/ module of the
same filename and tags its events salt/beacon/<minion>/<name>/<tag>, so each
rename touches the module file, the beacon config key in
beacons_pushstate.conf.jinja, and the reactor tag patterns in
reactor_pushstate.conf together. Watermark filenames and log prefixes are
updated to match; reactor run() logic is unchanged.
Salt's stock inotify beacon leaks one kernel inotify instance every time
the minion rebuilds the beacon loader's __context__ (the orphaned
pyinotify.Notifier is never stopped), accumulating against
fs.inotify.max_user_instances=128 until inotify_init() fails with EMFILE
and rule-change push detection silently stops. This is independent of
disable_during_state_run.
Add a custom poll-based beacon (salt/_beacons/rules_db.py) modeled on
pillar_db.py: it fingerprints the suricata/strelka rule dirs each interval
(relpath + mtime_ns + size, temp files excluded) against a per-dir
watermark, emitting an event only on change. It holds zero inotify
instances, so the leak is impossible, and it keeps firing during state
runs. Swap the inotify beacon config and reactor tag mappings accordingly;
the push_suricata/push_strelka reactors are unchanged (they read only
data['path']).
The /nsm/kernelrepo bind mount exposed the files, but without a matching
location block external requests to /kernelrepo/ fell through to the SOC
app and returned HTML, so minions hit 'repomd.xml parser error'. Add a
/kernelrepo/ location mirroring /repo/.
highstate_interval_hours describes the per-minion highstate schedule, not the
active-push pipeline, so relocate it from salt.auto_apply to a new salt.schedule
settings subtree. Repoint so-salt-minion-check at the new pillar path (it had
been left on the stale global:push path) so its restart grace period tracks the
schedule again.
- Add salt.schedule.highstate_interval_hours to defaults.yaml/soc_salt.yaml and a
side-effect-free salt/salt/schedule.map.jinja (SCHEDULEMERGED), matching the
*MERGED map convention. Consumers read SCHEDULEMERGED.highstate_interval_hours.
- Split salt/schedule.sls into salt/salt/highstate_schedule.sls (every minion) and
salt/salt/push_drain_schedule.sls (managers); update top.sls to apply the
highstate schedule via '*' and the drainer schedule via the configured-manager
block. Remove the now-empty schedule.sls aggregator.
- pillar_push_map.yaml and so-push-drainer: comment/doc updates only.
During soup the grid is mid-salt-upgrade. Only assign the UEK8 kernel
repo once the node's grains.saltversion matches salt.minion.version from
minion.defaults.yaml, so the kernel repo and the update it enables don't
activate until the node is fully on the target salt.
Installing kernel-uek-core adds a UEK8 (6.x) boot entry but doesn't make
it the default, because grubby only auto-promotes within the running
kernel's flavor lineage and we cross from a 5.x kernel to the new UEK8
flavor. so-kernel-upgrade finds the newest installed 6.x UEK kernel and
grubby --set-default's it (idempotent, verifies the change, no reboot).
The SOC postgres database was renamed so_soc -> securityonion (see
POSTGRES_DB in salt/postgres/enabled.sls and the SOC postgres config in
salt/soc/defaults.yaml). The pillar_db beacon still hardcoded so_soc, so
every poll failed with 'database "so_soc" does not exist' (rc=2),
silently disabling active-push detection of audit_settings changes.
Update DATABASE to 'securityonion' and refresh the now-stale so_soc
references in the beacon and push_pillar reactor comments.
The active-push tunables (enabled, highstate_interval_hours, debounce_seconds,
drain_interval, batch, batch_wait) described how Salt auto-applies changes, not
general grid config, so relocate them from the global namespace to a new
salt.auto_apply settings module.
- Add salt/salt/{defaults.yaml,auto_apply.map.jinja,soc_salt.yaml,adv_salt.yaml}.
auto_apply.map.jinja is a dedicated, side-effect-free merge map (the existing
salt/salt/map.jinja dereferences pillar.host.mainint at import time).
- Remove the push blocks from salt/global/{defaults,soc_global}.yaml.
- Register salt.soc_salt/salt.adv_salt in pillar/top.sls; seed the local pillar
stubs for fresh installs (make_some_dirs) and upgrades (ensure_salt_local_pillar
in soup, wired into up_to_3.2.0).
- Repoint all consumers: GLOBALMERGED.push.* -> AUTOAPPLY.* (schedule, salt
master, manager beacons, beacons_pushstate, orch.push_batch) and
pillar.get('global:push...') -> 'salt:auto_apply...' (push reactors,
so-push-drainer).
- Add a salt: fleetwide-highstate entry to pillar_push_map.yaml so edits keep
applying immediately, matching the prior global-namespace behavior.
so-repo-sync only populates /nsm/kernelrepo after the highstate, so on a
manager the file:///nsm/kernelrepo repo could be assigned before any
repodata exists, failing every dnf op. Run createrepo on the dir when
repodata/repomd.xml is missing, leaving a synced repo untouched.
When the kernel repo is assigned but /nsm/kernelrepo isn't populated
yet, its missing repomd.xml makes every dnf/pkg operation fail (e.g.
pkg.held for salt during highstate). The kernel repo is supplementary,
so set skip_if_unavailable=1 in both the salt-managed client repo and
the four install-time bootstrap repo files; dnf ignores it until it is
populated instead of aborting. The main repo stays strict.
During soup, so-repo-sync runs before the highstate deploys the new
repodownload.conf. On the first upgrade to a kernel-aware version the
on-disk config lacks the [securityonionkernel] section, so dnf aborts
with "Unknown repo: 'securityonionkernel'" (set -e kills soup). Guard
the kernel reposync on the section being present; the next sync after
the highstate deploys it picks it up.
Mirror the kernel repo to full parity with the main package repo so the
grid can pull the Oracle UEK8 kernel:
- setup/so-functions: securityonion_repo() emits a [securityonionkernel]
section in every branch (mirrorlist on non-airgap, https://$MSRV/kernelrepo
for airgap/minion, file:///nsm/kernelrepo/ for manager); repo_sync_local()
and create_repo() sync and build /nsm/kernelrepo.
- manager/init.sls: create /nsm/kernelrepo and deploy mirror-kernel.txt.
- nginx/enabled.sls: serve /nsm/kernelrepo at https://<repo_host>/kernelrepo.
- repo/client/oracle.sls: add so_kernel_repo, gated by
onlyif test -e /opt/so/state/nic_names_pinned so the kernel repo is only
assigned once NICs are pinned by MAC.
- update_packages(): run so-nic-pin before the dnf update that pulls the
kernel, freezing interface names and dropping the pin marker so the kernel
isn't downgraded then re-upgraded on the first highstate.