Salt's stock inotify beacon leaks one kernel inotify instance every time
the minion rebuilds the beacon loader's __context__ (the orphaned
pyinotify.Notifier is never stopped), accumulating against
fs.inotify.max_user_instances=128 until inotify_init() fails with EMFILE
and rule-change push detection silently stops. This is independent of
disable_during_state_run.
Add a custom poll-based beacon (salt/_beacons/rules_db.py) modeled on
pillar_db.py: it fingerprints the suricata/strelka rule dirs each interval
(relpath + mtime_ns + size, temp files excluded) against a per-dir
watermark, emitting an event only on change. It holds zero inotify
instances, so the leak is impossible, and it keeps firing during state
runs. Swap the inotify beacon config and reactor tag mappings accordingly;
the push_suricata/push_strelka reactors are unchanged (they read only
data['path']).
The /nsm/kernelrepo bind mount exposed the files, but without a matching
location block external requests to /kernelrepo/ fell through to the SOC
app and returned HTML, so minions hit 'repomd.xml parser error'. Add a
/kernelrepo/ location mirroring /repo/.
highstate_interval_hours describes the per-minion highstate schedule, not the
active-push pipeline, so relocate it from salt.auto_apply to a new salt.schedule
settings subtree. Repoint so-salt-minion-check at the new pillar path (it had
been left on the stale global:push path) so its restart grace period tracks the
schedule again.
- Add salt.schedule.highstate_interval_hours to defaults.yaml/soc_salt.yaml and a
side-effect-free salt/salt/schedule.map.jinja (SCHEDULEMERGED), matching the
*MERGED map convention. Consumers read SCHEDULEMERGED.highstate_interval_hours.
- Split salt/schedule.sls into salt/salt/highstate_schedule.sls (every minion) and
salt/salt/push_drain_schedule.sls (managers); update top.sls to apply the
highstate schedule via '*' and the drainer schedule via the configured-manager
block. Remove the now-empty schedule.sls aggregator.
- pillar_push_map.yaml and so-push-drainer: comment/doc updates only.
During soup the grid is mid-salt-upgrade. Only assign the UEK8 kernel
repo once the node's grains.saltversion matches salt.minion.version from
minion.defaults.yaml, so the kernel repo and the update it enables don't
activate until the node is fully on the target salt.
Installing kernel-uek-core adds a UEK8 (6.x) boot entry but doesn't make
it the default, because grubby only auto-promotes within the running
kernel's flavor lineage and we cross from a 5.x kernel to the new UEK8
flavor. so-kernel-upgrade finds the newest installed 6.x UEK kernel and
grubby --set-default's it (idempotent, verifies the change, no reboot).
The SOC postgres database was renamed so_soc -> securityonion (see
POSTGRES_DB in salt/postgres/enabled.sls and the SOC postgres config in
salt/soc/defaults.yaml). The pillar_db beacon still hardcoded so_soc, so
every poll failed with 'database "so_soc" does not exist' (rc=2),
silently disabling active-push detection of audit_settings changes.
Update DATABASE to 'securityonion' and refresh the now-stale so_soc
references in the beacon and push_pillar reactor comments.
The active-push tunables (enabled, highstate_interval_hours, debounce_seconds,
drain_interval, batch, batch_wait) described how Salt auto-applies changes, not
general grid config, so relocate them from the global namespace to a new
salt.auto_apply settings module.
- Add salt/salt/{defaults.yaml,auto_apply.map.jinja,soc_salt.yaml,adv_salt.yaml}.
auto_apply.map.jinja is a dedicated, side-effect-free merge map (the existing
salt/salt/map.jinja dereferences pillar.host.mainint at import time).
- Remove the push blocks from salt/global/{defaults,soc_global}.yaml.
- Register salt.soc_salt/salt.adv_salt in pillar/top.sls; seed the local pillar
stubs for fresh installs (make_some_dirs) and upgrades (ensure_salt_local_pillar
in soup, wired into up_to_3.2.0).
- Repoint all consumers: GLOBALMERGED.push.* -> AUTOAPPLY.* (schedule, salt
master, manager beacons, beacons_pushstate, orch.push_batch) and
pillar.get('global:push...') -> 'salt:auto_apply...' (push reactors,
so-push-drainer).
- Add a salt: fleetwide-highstate entry to pillar_push_map.yaml so edits keep
applying immediately, matching the prior global-namespace behavior.
so-repo-sync only populates /nsm/kernelrepo after the highstate, so on a
manager the file:///nsm/kernelrepo repo could be assigned before any
repodata exists, failing every dnf op. Run createrepo on the dir when
repodata/repomd.xml is missing, leaving a synced repo untouched.
When the kernel repo is assigned but /nsm/kernelrepo isn't populated
yet, its missing repomd.xml makes every dnf/pkg operation fail (e.g.
pkg.held for salt during highstate). The kernel repo is supplementary,
so set skip_if_unavailable=1 in both the salt-managed client repo and
the four install-time bootstrap repo files; dnf ignores it until it is
populated instead of aborting. The main repo stays strict.
During soup, so-repo-sync runs before the highstate deploys the new
repodownload.conf. On the first upgrade to a kernel-aware version the
on-disk config lacks the [securityonionkernel] section, so dnf aborts
with "Unknown repo: 'securityonionkernel'" (set -e kills soup). Guard
the kernel reposync on the section being present; the next sync after
the highstate deploys it picks it up.
The install bootstrap appended the [securityonionkernel] section to the
shared /etc/yum.repos.d/securityonion.repo, but the salt state so_kernel_repo
(name securityonionkernel) manages its own canonical file
/etc/yum.repos.d/securityonionkernel.repo. At highstate both files defined the
same repo id, so dnf failed with "repository securityonionkernel is listed
more than 1 time".
Write the bootstrap kernel repo to /etc/yum.repos.d/securityonionkernel.repo
in all four securityonion_repo() branches so the id lives in exactly one file
and salt edits it in place. Mirrors how the main repo's runtime id matches its
file name.
Mirror the kernel repo to full parity with the main package repo so the
grid can pull the Oracle UEK8 kernel:
- setup/so-functions: securityonion_repo() emits a [securityonionkernel]
section in every branch (mirrorlist on non-airgap, https://$MSRV/kernelrepo
for airgap/minion, file:///nsm/kernelrepo/ for manager); repo_sync_local()
and create_repo() sync and build /nsm/kernelrepo.
- manager/init.sls: create /nsm/kernelrepo and deploy mirror-kernel.txt.
- nginx/enabled.sls: serve /nsm/kernelrepo at https://<repo_host>/kernelrepo.
- repo/client/oracle.sls: add so_kernel_repo, gated by
onlyif test -e /opt/so/state/nic_names_pinned so the kernel repo is only
assigned once NICs are pinned by MAC.
- update_packages(): run so-nic-pin before the dnf update that pulls the
kernel, freezing interface names and dropping the pin marker so the kernel
isn't downgraded then re-upgraded on the first highstate.
elastic_fleet_load_integrations_dir now buffers each concurrent job's
output (header + API response) to a per-job file and prints them in
submission order after wait, restoring the readable serial-style output
while keeping concurrent writes.
Add --retry-all-errors to the integration create/update curl calls so
transient 409 conflicts from concurrent writes to the same agent policy
are retried (curl --retry alone does not retry 409).