Commit Graph

793 Commits

Author SHA1 Message Date
Josh Patterson 0a69833669 Merge remote-tracking branch 'origin/3/dev' into soupmod 2026-06-10 16:19:17 -04:00
Josh Patterson f088a27159 so-boot-mine-update: warm master pillar cache before highstate
A complete mine is not enough: elasticsearch:nodes, redis:nodes,
logstash:nodes (tgt_type=pillar) and hypervisor:nodes (tgt_type=compound)
resolve their target against the master's per-minion data cache
(grains+pillar in data.p), which is populated only when a minion's pillar
is recompiled -- separately from the mine. After a reboot a node can be in
the mine (so node_data/glob sees it) yet absent from that cache, so it
fails the elasticsearch:enabled:true pillar match and is dropped from
elasticsearch:nodes -> so-elasticsearch ExtraHosts -> container recreate.

After the mine-completeness wait, run salt '*' saltutil.refresh_pillar
wait=True to synchronously cache every up node's pillar (the same lever
deploy_newnode.sls uses), then verify with salt-run cache.pillar and retry
stragglers, bounded by MINE_UPDATE_MAX_WAIT. Also log elasticsearch:nodes
alongside node_data for inspection.
2026-06-09 13:52:19 -04:00
Josh Patterson 27c7702325 so-boot-mine-update: wait for a complete mine before highstate
Mine-backed pillars (node_data, elasticsearch:nodes, redis:nodes,
logstash:nodes, hypervisor:nodes) include a node only if it returned an
IP from the mine, and the configs they build are rebuilt fresh every
highstate. After a manager reboot with a flushed mine, the first boot
highstate could run before an up node re-reported network.ip_addrs,
dropping it from e.g. so-elasticsearch ExtraHosts and forcing a
container recreate.

After the initial broad mine.update, poll until every currently-up
minion actually has network.ip_addrs in the mine, re-pushing mine.update
to stragglers, before releasing the boot highstate. Shares the existing
MINE_UPDATE_MAX_WAIT backstop so a slow/down node never blocks boot, and
still logs the rendered node_data for inspection.
2026-06-09 10:10:32 -04:00
Josh Patterson 8c306eb37d so-boot-mine-update: log the rendered node_data content
Dump the actual rendered node_data pillar (pretty-printed JSON) to the
journal instead of just a rendered/empty verdict, so the boot-time render
attempt is fully inspectable. Empty renders print false/null and still
emit the WARNING.
2026-06-09 09:49:19 -04:00
Josh Patterson e536ffa363 so-boot-mine-update: render node_data after mine.update before highstate
After the boot-time mine.update, have the manager actually render the
node_data pillar and log whether it came back populated. node_data: False
makes salt/top.sls apply the bootstrap recovery branch instead of the
manager's real config, so surfacing this in the journal makes the
condition visible before so-boot-highstate runs. Best-effort and
non-blocking: always exits 0 so highstate proceeds regardless.
2026-06-09 09:35:24 -04:00
Josh Patterson 9580976ba2 Add manager boot-time grid mine.update oneshot before highstate
so-boot-mine-update.service is a manager-only Type=oneshot unit that runs
once per boot after salt-master/salt-minion start and before
so-boot-highstate.service. It pushes mine.update to all reachable minions
so mine-backed pillars (node IPs, ES/Redis/Logstash discovery) are fresh
before the boot highstate renders them.

The helper waits for the responsive minion set to settle (plateau) rather
than for every accepted key to report up, so an intentionally powered-off
minion doesn't block the update; MAX_WAIT remains as a backstop.
2026-06-08 11:05:13 -04:00
Jorge Reyes 534f0e639d Merge pull request #15954 from Security-Onion-Solutions/reyesj2-patch-4
run elastic agent regen installer script in post_to_3.2.0
2026-06-02 15:25:55 -05:00
reyesj2 559465b407 run elastic agent gen installers script in post_to_3.2.0 2026-06-02 15:18:00 -05:00
reyesj2 f9c2579261 remove logstash pipeline rename from hotfix moving to up_to_3.2.0 2026-06-02 15:18:00 -05:00
Jorge Reyes 33699a914b Merge pull request #15952 from Security-Onion-Solutions/reyesj2-patch-3
use so-config-backup script in soup
2026-06-02 15:02:27 -05:00
Jorge Reyes 0c2d8f8973 Merge pull request #15951 from Security-Onion-Solutions/reyesj2-patch-2
check if there is a version or hotfix to upgrade to before verifiying elasticsearch compatibility
2026-06-02 15:02:10 -05:00
Josh Patterson 487e433589 allow full highstate on manager while master locked 2026-06-02 13:58:38 -04:00
Josh Patterson 3328ff362d add some logging 2026-06-02 10:44:17 -04:00
reyesj2 f2996fb888 use so-config-backup script in soup 2026-06-01 11:52:35 -05:00
reyesj2 3c533cccbc and after free space check 2026-06-01 11:28:59 -05:00
reyesj2 79da9f9f2c check if there is a version or hotfix to upgrade to before verifiying elasticsearch compatibility 2026-06-01 11:26:52 -05:00
Josh Patterson d48a22e37e Merge pull request #15944 from Security-Onion-Solutions/jertel/wip
Jertel/wip
2026-05-28 14:01:42 -04:00
Josh Patterson 9a70a06b3b Merge remote-tracking branch 'origin/3/dev' into jertel/wip 2026-05-28 13:55:12 -04:00
Josh Patterson bb8ae91d91 fix so-soc postgres bootstrap 2026-05-27 16:39:52 -04:00
reyesj2 b2a82fec29 fix_logstash_0013_lumberjack_pipeline_name
Before removing from apply_hotfix function first verify that older installs < 3.1.0 are still upgradable when referencing 'so/0013_input_lumberjack_fleet.conf' via pillar. Failure to do so will prevent logstash from starting
2026-05-27 13:24:23 -05:00
Josh Patterson 79987f3659 bootstrap so-soc db in postgres during soup 2026-05-27 13:55:30 -04:00
reyesj2 0b4a4de609 always run logstash pipeline rename 2026-05-27 12:21:22 -05:00
reyesj2 0834998cca usuable for next soup 2026-05-27 09:52:29 -05:00
reyesj2 473f93f0ee check for stale logstash pipeline name in pillars 2026-05-27 09:33:15 -05:00
reyesj2 d72219c586 use multiple or combined input 2026-05-22 20:04:21 -05:00
reyesj2 b485be4602 separate salt-key command from main es version compatiblity loop 2026-05-20 14:12:58 -05:00
reyesj2 7d13007aa9 block soup if all ES nodes are not online and reporting their ES version for compatibility check 2026-05-20 10:03:37 -05:00
reyesj2 d7a1b67095 use pipefail on heavynode versino command to pass through error 2026-05-20 09:16:57 -05:00
reyesj2 6c8997b28a verify all heavynodes and all searchnodes are at compatible ES version before attempting an elasticsearch upgrade 2026-05-19 22:27:31 -05:00
Josh Patterson 84decc1db6 Merge remote-tracking branch 'origin/3/dev' into saltthangs 2026-05-13 14:09:15 -04:00
reyesj2 ef79c63858 Merge branch '3/dev' of github.com:Security-Onion-Solutions/securityonion into reyesj2/strelkalnk 2026-05-12 15:20:09 -05:00
reyesj2 01fb1aa156 check pillars for ScanLNK and rename to ScanLnk 2026-05-12 15:19:44 -05:00
Doug Burks f19bdd7aae Merge pull request #15883 from Security-Onion-Solutions/reyesj2/transformhealth
use temp files to prevent jq arg too long
2026-05-12 15:36:12 -04:00
reyesj2 f637dc62d1 use temp files to prevent jq arg too long 2026-05-12 13:29:32 -05:00
Josh Brower 125610ed42 Additional test coverage 2026-05-12 10:11:22 -04:00
Josh Brower 306b0af4d0 Initial commit 2026-05-12 09:55:06 -04:00
Josh Patterson f774334b6c Merge remote-tracking branch 'origin/3/dev' into saltthangs 2026-05-06 08:16:41 -04:00
reyesj2 86966d2778 reauthorize unhealthy transform jobs using kibana 9.3.3 auth flow 2026-05-01 12:44:08 -05:00
Josh Patterson 9541024eb7 fix broken things 2026-04-30 15:35:24 -04:00
reyesj2 39d0947102 update default elastic agent logging level to warning 2026-04-29 17:38:40 -05:00
Josh Patterson 034711d148 Merge remote-tracking branch 'origin/3/dev' into saltthangs 2026-04-28 10:47:29 -04:00
Mike Reeves fa8162de02 Merge pull request #15749 from Security-Onion-Solutions/feature/postgres
Add so-postgres Salt states and infrastructure
2026-04-28 10:15:47 -04:00
Mike Reeves 0ecc7ae594 soup: drop --local from postgres.telegraf_users reconcile
The manager's /etc/salt/minion (written by so-functions:configure_minion)
has no file_roots, so salt-call --local falls back to Salt's default
/srv/salt and fails with "No matching sls found for 'postgres.telegraf_users'
in env 'base'". || true was silently swallowing the error, which meant the
DB roles for the pillar entries just populated by the so-telegraf-cred
backfill loop never actually got created.

Route through salt-master instead; its file_roots already points at the
default/local salt trees.
2026-04-23 11:25:44 -04:00
Mike Reeves eadad6c163 soup: bootstrap postgres pillar stubs and secret on 3.0.0 upgrade
pillar/top.sls now references postgres.soc_postgres / postgres.adv_postgres
unconditionally, but make_some_dirs only runs at install time so managers
upgrading from 3.0.0 have no local/pillar/postgres/ and salt-master fails
pillar render on the first post-upgrade restart. Similarly, secrets_pillar
is a no-op on upgrade (secrets.sls already exists), so secrets:postgres_pass
never gets seeded and the postgres container's POSTGRES_PASSWORD_FILE and
SOC's PG_ADMIN_PASS would land empty after highstate.

Add ensure_postgres_local_pillar and ensure_postgres_secret to up_to_3.1.0
so the stubs and secret exist before masterlock/salt-master restart. Both
are idempotent and safe to re-run.
2026-04-23 10:01:38 -04:00
Mike Reeves d5c0ec4404 so-yaml_test: cover loadYaml error paths
Exercises the FileNotFoundError and generic-exception branches added to
loadYaml in the previous commit, restoring 100% coverage required by
the build.
2026-04-22 14:30:51 -04:00
Mike Reeves e616b4c120 so-telegraf-cred: make executable and harden error handling
so-telegraf-cred was committed with mode 644, causing
`so-telegraf-cred add "$MINION_ID"` in so-minion's add_telegraf_to_minion
to fail with "Permission denied" and log "Failed to provision postgres
telegraf cred for <minion>". Mark it executable.

Also bail early in seed_creds_file if mkdir/printf/chmod fail, and in
so-yaml.py loadYaml surface a clear stderr message with the filename
instead of an unhandled FileNotFoundError traceback.
2026-04-22 14:25:19 -04:00
Mike Reeves f240a99e22 so-telegraf-cred: thin bash wrapper around so-yaml.py
Swap the ~150-line Python implementation for a 48-line bash script that
delegates YAML mutation to so-yaml.py — the same helper so-minion and
soup already use. Same semantics: seed the creds pillar on first use,
idempotent add, silent remove.

SO minion ids are dot-free by construction (setup/so-functions:1884
strips everything after the first '.'), so using the raw id as the
so-yaml.py key path is safe.
2026-04-22 11:09:53 -04:00
Mike Reeves 614f32c5e0 Split postgres auth from per-minion telegraf creds
The old flow had two writers for each per-minion Telegraf password
(so-minion wrote the minion pillar; postgres.auth regenerated any
missing aggregate entries). They drifted on first-boot and there was
no trigger to create DB roles when a new minion joined.

Split responsibilities:

- pillar/postgres/auth.sls (manager-scoped) keeps only the so_postgres
  admin cred.
- pillar/telegraf/creds.sls (grid-wide) holds a {minion_id: {user,
  pass}} map, shadowed per-install by the local-pillar copy.
- salt/manager/tools/sbin/so-telegraf-cred is the single writer:
  flock, atomic YAML write, PyYAML safe_dump so passwords never
  round-trip through so-yaml.py's type coercion. Idempotent add, quiet
  remove.
- so-minion's add/remove hooks now shell out to so-telegraf-cred
  instead of editing pillar files directly.
- postgres.telegraf_users iterates the new pillar key and CREATE/ALTERs
  roles from it; telegraf.conf reads its own entry via grains.id.
- orch.deploy_newnode runs postgres.telegraf_users on the manager and
  refreshes the new minion's pillar before the new node highstates,
  so the DB role is in place the first time telegraf tries to connect.
- soup's post_to_3.1.0 backfills the creds pillar from accepted salt
  keys (idempotent) and runs postgres.telegraf_users once to reconcile
  the DB.
2026-04-22 10:55:15 -04:00
Josh Patterson edd207a9d5 soup update socloud.conf 2026-04-22 09:20:53 -04:00
Mike Reeves 724d76965f soup: update postgres backfill comment to reflect reactor removal
The reactor path is gone; so-minion now owns add/delete for new
minions. The backfill itself is unchanged — postgres.auth's up_minions
fallback fills the aggregate, postgres.telegraf_users creates the
roles, and the bash loop fans to per-minion pillar files — so the
pre-feature upgrade story still works end-to-end. Just refresh the
comment so it isn't misleading.
2026-04-21 15:45:05 -04:00