Commit Graph

1011 Commits

Author SHA1 Message Date
Josh Patterson 3310e19ee4 Merge remote-tracking branch 'origin/3/dev' into saltthangs 2026-07-02 10:27:54 -04:00
reyesj2 e7352eb841 duplicate repo name in so-repo-sync 2026-07-01 15:17:55 -05:00
reyesj2 24b75b4a2b typo 2026-07-01 12:50:23 -05:00
reyesj2 e88eb65a44 keep old packages for rollback ability 2026-07-01 10:29:05 -05:00
reyesj2 dc8c80633b update airgap soup to sync uek repo from iso and retain latest packages only 2026-07-01 10:23:04 -05:00
Josh Patterson f441d98e71 Merge remote-tracking branch 'origin/3/dev' into saltthangs 2026-07-01 10:34:56 -04:00
reyesj2 670d2b2757 casing 2026-06-30 12:57:56 -05:00
reyesj2 3b8459c6ec soup upgrade kafka cluster metadata v4 2026-06-30 12:43:42 -05:00
Josh Patterson a330bea25e Rename push-detection beacons to clearer names
Rename the two custom push-detection beacons for clarity:
- pillar_db -> postgres_pillar_beacon
- rules_db  -> rules_beacon

Salt resolves a beacon by its config-key name to a _beacons/ module of the
same filename and tags its events salt/beacon/<minion>/<name>/<tag>, so each
rename touches the module file, the beacon config key in
beacons_pushstate.conf.jinja, and the reactor tag patterns in
reactor_pushstate.conf together. Watermark filenames and log prefixes are
updated to match; reactor run() logic is unchanged.
2026-06-29 14:29:07 -04:00
Josh Patterson 33c24cd136 Merge remote-tracking branch 'origin/3/dev' into saltthangs 2026-06-26 15:42:56 -04:00
Josh Patterson 12f4447875 Replace inotify rule-watch beacon with poll-based rules_db beacon
Salt's stock inotify beacon leaks one kernel inotify instance every time
the minion rebuilds the beacon loader's __context__ (the orphaned
pyinotify.Notifier is never stopped), accumulating against
fs.inotify.max_user_instances=128 until inotify_init() fails with EMFILE
and rule-change push detection silently stops. This is independent of
disable_during_state_run.

Add a custom poll-based beacon (salt/_beacons/rules_db.py) modeled on
pillar_db.py: it fingerprints the suricata/strelka rule dirs each interval
(relpath + mtime_ns + size, temp files excluded) against a per-dir
watermark, emitting an event only on change. It holds zero inotify
instances, so the leak is impossible, and it keeps firing during state
runs. Swap the inotify beacon config and reactor tag mappings accordingly;
the push_suricata/push_strelka reactors are unchanged (they read only
data['path']).
2026-06-26 15:40:32 -04:00
Josh Patterson da94788255 Move highstate_interval_hours to salt.schedule and split schedule.sls
highstate_interval_hours describes the per-minion highstate schedule, not the
active-push pipeline, so relocate it from salt.auto_apply to a new salt.schedule
settings subtree. Repoint so-salt-minion-check at the new pillar path (it had
been left on the stale global:push path) so its restart grace period tracks the
schedule again.

- Add salt.schedule.highstate_interval_hours to defaults.yaml/soc_salt.yaml and a
  side-effect-free salt/salt/schedule.map.jinja (SCHEDULEMERGED), matching the
  *MERGED map convention. Consumers read SCHEDULEMERGED.highstate_interval_hours.
- Split salt/schedule.sls into salt/salt/highstate_schedule.sls (every minion) and
  salt/salt/push_drain_schedule.sls (managers); update top.sls to apply the
  highstate schedule via '*' and the drainer schedule via the configured-manager
  block. Remove the now-empty schedule.sls aggregator.
- pillar_push_map.yaml and so-push-drainer: comment/doc updates only.
2026-06-26 10:51:57 -04:00
Josh Patterson 5bf9751adf do not disable during state run 2026-06-25 11:44:38 -04:00
Josh Patterson 3effdbc91e do not disable during state run 2026-06-25 11:36:52 -04:00
Josh Patterson dfdb1fbaeb Move global.push config to salt.auto_apply
The active-push tunables (enabled, highstate_interval_hours, debounce_seconds,
drain_interval, batch, batch_wait) described how Salt auto-applies changes, not
general grid config, so relocate them from the global namespace to a new
salt.auto_apply settings module.

- Add salt/salt/{defaults.yaml,auto_apply.map.jinja,soc_salt.yaml,adv_salt.yaml}.
  auto_apply.map.jinja is a dedicated, side-effect-free merge map (the existing
  salt/salt/map.jinja dereferences pillar.host.mainint at import time).
- Remove the push blocks from salt/global/{defaults,soc_global}.yaml.
- Register salt.soc_salt/salt.adv_salt in pillar/top.sls; seed the local pillar
  stubs for fresh installs (make_some_dirs) and upgrades (ensure_salt_local_pillar
  in soup, wired into up_to_3.2.0).
- Repoint all consumers: GLOBALMERGED.push.* -> AUTOAPPLY.* (schedule, salt
  master, manager beacons, beacons_pushstate, orch.push_batch) and
  pillar.get('global:push...') -> 'salt:auto_apply...' (push reactors,
  so-push-drainer).
- Add a salt: fleetwide-highstate entry to pillar_push_map.yaml so edits keep
  applying immediately, matching the prior global-namespace behavior.
2026-06-24 15:17:48 -04:00
Mike Reeves b0b022c3ad Seed an empty /nsm/kernelrepo so the manager repo is always valid
so-repo-sync only populates /nsm/kernelrepo after the highstate, so on a
manager the file:///nsm/kernelrepo repo could be assigned before any
repodata exists, failing every dnf op. Run createrepo on the dir when
repodata/repomd.xml is missing, leaving a synced repo untouched.
2026-06-24 13:23:25 -04:00
Mike Reeves f45631af3a Guard kernel reposync on its config section existing
During soup, so-repo-sync runs before the highstate deploys the new
repodownload.conf. On the first upgrade to a kernel-aware version the
on-disk config lacks the [securityonionkernel] section, so dnf aborts
with "Unknown repo: 'securityonionkernel'" (set -e kills soup). Guard
the kernel reposync on the section being present; the next sync after
the highstate deploys it picks it up.
2026-06-24 12:15:10 -04:00
Mike Reeves 698a746d6d Add UEK8 kernel repo support across install and grid
Mirror the kernel repo to full parity with the main package repo so the
grid can pull the Oracle UEK8 kernel:

- setup/so-functions: securityonion_repo() emits a [securityonionkernel]
  section in every branch (mirrorlist on non-airgap, https://$MSRV/kernelrepo
  for airgap/minion, file:///nsm/kernelrepo/ for manager); repo_sync_local()
  and create_repo() sync and build /nsm/kernelrepo.
- manager/init.sls: create /nsm/kernelrepo and deploy mirror-kernel.txt.
- nginx/enabled.sls: serve /nsm/kernelrepo at https://<repo_host>/kernelrepo.
- repo/client/oracle.sls: add so_kernel_repo, gated by
  onlyif test -e /opt/so/state/nic_names_pinned so the kernel repo is only
  assigned once NICs are pinned by MAC.
- update_packages(): run so-nic-pin before the dnf update that pulls the
  kernel, freezing interface names and dropping the pin marker so the kernel
  isn't downgraded then re-upgraded on the first highstate.
2026-06-23 13:19:56 -04:00
Josh Patterson d71e80cf66 Merge remote-tracking branch 'origin/3/dev' into saltthangs 2026-06-23 10:32:32 -04:00
Josh Patterson a9f9d8bd0d Merge pull request #15985 from Security-Onion-Solutions/soupmod2
allow manager two full highstates during soup, improve elastic script runtime
2026-06-22 17:02:02 -04:00
reyesj2 22d5c96bd5 don't create stack trace when set -e is disabled 2026-06-18 14:56:29 -05:00
Josh Patterson 62c01a9756 Merge remote-tracking branch 'origin/3/dev' into soupmod2 2026-06-18 09:53:44 -04:00
reyesj2 16149df71f formatting 2026-06-16 18:21:28 -05:00
reyesj2 6a18f35020 add context to soup errors and optional soup debug log with xtrace output 2026-06-16 18:21:28 -05:00
Josh Patterson 8e33d0e1e9 Merge remote-tracking branch 'origin/3/dev' into soupmod2 2026-06-16 12:54:18 -04:00
reyesj2 3daed551df use --fail flag without set -x, since elasticsearch can return a 404 on the template lookup 2026-06-16 11:17:04 -05:00
reyesj2 4456bde1c8 check if template exists without --fail flag 2026-06-16 10:45:53 -05:00
Jorge Reyes 4a6c675223 skip kibana backport if the template doesn't exist 2026-06-16 10:33:11 -05:00
Jorge Reyes b81257bf45 Merge pull request #15973 from Security-Onion-Solutions/reyesj2/dlm-support
Data stream lifecycle management support
2026-06-15 14:47:51 -05:00
reyesj2 1a423a2434 update message 2026-06-15 14:17:34 -05:00
reyesj2 d10f21399c remove comments 2026-06-15 13:31:23 -05:00
Josh Patterson 83aaa76f98 allow full highstate on manager when locked 2026-06-10 16:34:10 -04:00
Josh Patterson 33a116357d Merge remote-tracking branch 'origin/3/dev' into saltthangs 2026-06-10 08:56:17 -04:00
reyesj2 9aa9ea3255 Iniitial DLM support 2026-06-09 23:19:26 -05:00
Josh Patterson f088a27159 so-boot-mine-update: warm master pillar cache before highstate
A complete mine is not enough: elasticsearch:nodes, redis:nodes,
logstash:nodes (tgt_type=pillar) and hypervisor:nodes (tgt_type=compound)
resolve their target against the master's per-minion data cache
(grains+pillar in data.p), which is populated only when a minion's pillar
is recompiled -- separately from the mine. After a reboot a node can be in
the mine (so node_data/glob sees it) yet absent from that cache, so it
fails the elasticsearch:enabled:true pillar match and is dropped from
elasticsearch:nodes -> so-elasticsearch ExtraHosts -> container recreate.

After the mine-completeness wait, run salt '*' saltutil.refresh_pillar
wait=True to synchronously cache every up node's pillar (the same lever
deploy_newnode.sls uses), then verify with salt-run cache.pillar and retry
stragglers, bounded by MINE_UPDATE_MAX_WAIT. Also log elasticsearch:nodes
alongside node_data for inspection.
2026-06-09 13:52:19 -04:00
Josh Patterson 27c7702325 so-boot-mine-update: wait for a complete mine before highstate
Mine-backed pillars (node_data, elasticsearch:nodes, redis:nodes,
logstash:nodes, hypervisor:nodes) include a node only if it returned an
IP from the mine, and the configs they build are rebuilt fresh every
highstate. After a manager reboot with a flushed mine, the first boot
highstate could run before an up node re-reported network.ip_addrs,
dropping it from e.g. so-elasticsearch ExtraHosts and forcing a
container recreate.

After the initial broad mine.update, poll until every currently-up
minion actually has network.ip_addrs in the mine, re-pushing mine.update
to stragglers, before releasing the boot highstate. Shares the existing
MINE_UPDATE_MAX_WAIT backstop so a slow/down node never blocks boot, and
still logs the rendered node_data for inspection.
2026-06-09 10:10:32 -04:00
Josh Patterson 8c306eb37d so-boot-mine-update: log the rendered node_data content
Dump the actual rendered node_data pillar (pretty-printed JSON) to the
journal instead of just a rendered/empty verdict, so the boot-time render
attempt is fully inspectable. Empty renders print false/null and still
emit the WARNING.
2026-06-09 09:49:19 -04:00
Josh Patterson e536ffa363 so-boot-mine-update: render node_data after mine.update before highstate
After the boot-time mine.update, have the manager actually render the
node_data pillar and log whether it came back populated. node_data: False
makes salt/top.sls apply the bootstrap recovery branch instead of the
manager's real config, so surfacing this in the journal makes the
condition visible before so-boot-highstate runs. Best-effort and
non-blocking: always exits 0 so highstate proceeds regardless.
2026-06-09 09:35:24 -04:00
Josh Patterson 9580976ba2 Add manager boot-time grid mine.update oneshot before highstate
so-boot-mine-update.service is a manager-only Type=oneshot unit that runs
once per boot after salt-master/salt-minion start and before
so-boot-highstate.service. It pushes mine.update to all reachable minions
so mine-backed pillars (node IPs, ES/Redis/Logstash discovery) are fresh
before the boot highstate renders them.

The helper waits for the responsive minion set to settle (plateau) rather
than for every accepted key to report up, so an intentionally powered-off
minion doesn't block the update; MAX_WAIT remains as a backstop.
2026-06-08 11:05:13 -04:00
Josh Patterson cb3631da81 Move setup-complete marker from /opt/so/conf to /opt/so/state
The setup-complete marker is a runtime-state file, not config, so move it
to /opt/so/state/setup-complete. Updates both writers (mark_setup_complete
in setup/so-functions and the upgrade-path state in minion/init.sls) and the
three readers (so-boot-highstate.service ConditionPathExists, boot_highstate.sls
enable gate, and the so-user_sync cron gate).
2026-06-04 15:07:27 -04:00
Josh Patterson 34fee25b0c Merge remote-tracking branch 'origin/3/dev' into nostartupstates 2026-06-03 15:44:41 -04:00
Jorge Reyes 534f0e639d Merge pull request #15954 from Security-Onion-Solutions/reyesj2-patch-4
run elastic agent regen installer script in post_to_3.2.0
2026-06-02 15:25:55 -05:00
reyesj2 559465b407 run elastic agent gen installers script in post_to_3.2.0 2026-06-02 15:18:00 -05:00
reyesj2 f9c2579261 remove logstash pipeline rename from hotfix moving to up_to_3.2.0 2026-06-02 15:18:00 -05:00
Jorge Reyes 33699a914b Merge pull request #15952 from Security-Onion-Solutions/reyesj2-patch-3
use so-config-backup script in soup
2026-06-02 15:02:27 -05:00
Jorge Reyes 0c2d8f8973 Merge pull request #15951 from Security-Onion-Solutions/reyesj2-patch-2
check if there is a version or hotfix to upgrade to before verifiying elasticsearch compatibility
2026-06-02 15:02:10 -05:00
reyesj2 f2996fb888 use so-config-backup script in soup 2026-06-01 11:52:35 -05:00
reyesj2 3c533cccbc and after free space check 2026-06-01 11:28:59 -05:00
reyesj2 79da9f9f2c check if there is a version or hotfix to upgrade to before verifiying elasticsearch compatibility 2026-06-01 11:26:52 -05:00
Josh Patterson f54939b444 Replace inotify pillar watch with postgres audit_settings beacon
The active-push feature detected pillar/settings changes via an inotify
beacon on the manager watching /opt/so/saltstack/local/pillar. Replace
that pillar watch with a custom salt beacon (pillar_db) that polls the
SOC so_soc.audit_settings table on a monotonic id watermark, so changes
made through SOC drive immediate pushes from the database instead of the
files. The suricata/strelka rule inotify watches (and pyinotify) are kept
unchanged, since rule-file edits are not recorded in audit_settings.

- salt/_beacons/pillar_db.py: new beacon. Polls audit_settings via
  `docker exec so-postgres psql` (unix-socket trust auth), tracks the last
  processed id in /opt/so/state/pillar_db_watch.id, seeds to MAX(id) on
  first run (no history replay), and emits one event per new row.
- salt/reactor/push_pillar.sls: consume setting_id/node_id from the beacon
  event instead of a file path. App = first dotted segment of setting_id,
  looked up in pillar_push_map.yaml. Empty node_id -> grid-wide actions as
  is; populated node_id -> the app's state(s) retargeted to that one node.
- salt/manager/files/beacons_pushstate.conf.jinja: drop the pillar inotify
  block, add the pillar_db beacon (interval = push.drain_interval); keep
  the suricata/strelka inotify watches.
- salt/salt/files/reactor_pushstate.conf: map salt/beacon/*/pillar_db/
  audit_settings to push_pillar.sls; remove the pillar inotify reactor
  lines; keep suricata/strelka.

The intent -> so-push-drainer -> orch.push_batch pipeline is unchanged.
Verified end-to-end on a standalone: a grid-wide telegraf.output change
re-applied telegraf fleetwide (container replaced), and a per-host
ntp.config.servers change applied ntp to only that node.
2026-05-29 14:55:13 -04:00