securityonion

mirror of https://github.com/Security-Onion-Solutions/securityonion.git synced 2026-06-12 21:29:16 +02:00

Author	SHA1	Message	Date
Josh Patterson	f088a27159	so-boot-mine-update: warm master pillar cache before highstate A complete mine is not enough: elasticsearch:nodes, redis:nodes, logstash:nodes (tgt_type=pillar) and hypervisor:nodes (tgt_type=compound) resolve their target against the master's per-minion data cache (grains+pillar in data.p), which is populated only when a minion's pillar is recompiled -- separately from the mine. After a reboot a node can be in the mine (so node_data/glob sees it) yet absent from that cache, so it fails the elasticsearch:enabled:true pillar match and is dropped from elasticsearch:nodes -> so-elasticsearch ExtraHosts -> container recreate. After the mine-completeness wait, run salt '*' saltutil.refresh_pillar wait=True to synchronously cache every up node's pillar (the same lever deploy_newnode.sls uses), then verify with salt-run cache.pillar and retry stragglers, bounded by MINE_UPDATE_MAX_WAIT. Also log elasticsearch:nodes alongside node_data for inspection.	2026-06-09 13:52:19 -04:00
Josh Patterson	27c7702325	so-boot-mine-update: wait for a complete mine before highstate Mine-backed pillars (node_data, elasticsearch:nodes, redis:nodes, logstash:nodes, hypervisor:nodes) include a node only if it returned an IP from the mine, and the configs they build are rebuilt fresh every highstate. After a manager reboot with a flushed mine, the first boot highstate could run before an up node re-reported network.ip_addrs, dropping it from e.g. so-elasticsearch ExtraHosts and forcing a container recreate. After the initial broad mine.update, poll until every currently-up minion actually has network.ip_addrs in the mine, re-pushing mine.update to stragglers, before releasing the boot highstate. Shares the existing MINE_UPDATE_MAX_WAIT backstop so a slow/down node never blocks boot, and still logs the rendered node_data for inspection.	2026-06-09 10:10:32 -04:00
Josh Patterson	8c306eb37d	so-boot-mine-update: log the rendered node_data content Dump the actual rendered node_data pillar (pretty-printed JSON) to the journal instead of just a rendered/empty verdict, so the boot-time render attempt is fully inspectable. Empty renders print false/null and still emit the WARNING.	2026-06-09 09:49:19 -04:00
Josh Patterson	e536ffa363	so-boot-mine-update: render node_data after mine.update before highstate After the boot-time mine.update, have the manager actually render the node_data pillar and log whether it came back populated. node_data: False makes salt/top.sls apply the bootstrap recovery branch instead of the manager's real config, so surfacing this in the journal makes the condition visible before so-boot-highstate runs. Best-effort and non-blocking: always exits 0 so highstate proceeds regardless.	2026-06-09 09:35:24 -04:00
Josh Patterson	9580976ba2	Add manager boot-time grid mine.update oneshot before highstate so-boot-mine-update.service is a manager-only Type=oneshot unit that runs once per boot after salt-master/salt-minion start and before so-boot-highstate.service. It pushes mine.update to all reachable minions so mine-backed pillars (node IPs, ES/Redis/Logstash discovery) are fresh before the boot highstate renders them. The helper waits for the responsive minion set to settle (plateau) rather than for every accepted key to report up, so an intentionally powered-off minion doesn't block the update; MAX_WAIT remains as a backstop.	2026-06-08 11:05:13 -04:00
Jorge Reyes	534f0e639d	Merge pull request #15954 from Security-Onion-Solutions/reyesj2-patch-4 run elastic agent regen installer script in post_to_3.2.0	2026-06-02 15:25:55 -05:00
reyesj2	559465b407	run elastic agent gen installers script in post_to_3.2.0	2026-06-02 15:18:00 -05:00
reyesj2	f9c2579261	remove logstash pipeline rename from hotfix moving to up_to_3.2.0	2026-06-02 15:18:00 -05:00
Jorge Reyes	33699a914b	Merge pull request #15952 from Security-Onion-Solutions/reyesj2-patch-3 use so-config-backup script in soup	2026-06-02 15:02:27 -05:00
Jorge Reyes	0c2d8f8973	Merge pull request #15951 from Security-Onion-Solutions/reyesj2-patch-2 check if there is a version or hotfix to upgrade to before verifiying elasticsearch compatibility	2026-06-02 15:02:10 -05:00
reyesj2	f2996fb888	use so-config-backup script in soup	2026-06-01 11:52:35 -05:00
reyesj2	3c533cccbc	and after free space check	2026-06-01 11:28:59 -05:00
reyesj2	79da9f9f2c	check if there is a version or hotfix to upgrade to before verifiying elasticsearch compatibility	2026-06-01 11:26:52 -05:00
Josh Patterson	9a70a06b3b	Merge remote-tracking branch 'origin/3/dev' into jertel/wip	2026-05-28 13:55:12 -04:00
Josh Patterson	bb8ae91d91	fix so-soc postgres bootstrap	2026-05-27 16:39:52 -04:00
reyesj2	b2a82fec29	fix_logstash_0013_lumberjack_pipeline_name Before removing from apply_hotfix function first verify that older installs < 3.1.0 are still upgradable when referencing 'so/0013_input_lumberjack_fleet.conf' via pillar. Failure to do so will prevent logstash from starting	2026-05-27 13:24:23 -05:00
Josh Patterson	79987f3659	bootstrap so-soc db in postgres during soup	2026-05-27 13:55:30 -04:00
reyesj2	0b4a4de609	always run logstash pipeline rename	2026-05-27 12:21:22 -05:00
reyesj2	0834998cca	usuable for next soup	2026-05-27 09:52:29 -05:00
reyesj2	473f93f0ee	check for stale logstash pipeline name in pillars	2026-05-27 09:33:15 -05:00
reyesj2	d72219c586	use multiple or combined input	2026-05-22 20:04:21 -05:00
reyesj2	b485be4602	separate salt-key command from main es version compatiblity loop	2026-05-20 14:12:58 -05:00
reyesj2	7d13007aa9	block soup if all ES nodes are not online and reporting their ES version for compatibility check	2026-05-20 10:03:37 -05:00
reyesj2	d7a1b67095	use pipefail on heavynode versino command to pass through error	2026-05-20 09:16:57 -05:00
reyesj2	6c8997b28a	verify all heavynodes and all searchnodes are at compatible ES version before attempting an elasticsearch upgrade	2026-05-19 22:27:31 -05:00
reyesj2	ef79c63858	Merge branch '3/dev' of github.com:Security-Onion-Solutions/securityonion into reyesj2/strelkalnk	2026-05-12 15:20:09 -05:00
reyesj2	01fb1aa156	check pillars for ScanLNK and rename to ScanLnk	2026-05-12 15:19:44 -05:00
Doug Burks	f19bdd7aae	Merge pull request #15883 from Security-Onion-Solutions/reyesj2/transformhealth use temp files to prevent jq arg too long	2026-05-12 15:36:12 -04:00
reyesj2	f637dc62d1	use temp files to prevent jq arg too long	2026-05-12 13:29:32 -05:00
Josh Brower	125610ed42	Additional test coverage	2026-05-12 10:11:22 -04:00
Josh Brower	306b0af4d0	Initial commit	2026-05-12 09:55:06 -04:00
reyesj2	86966d2778	reauthorize unhealthy transform jobs using kibana 9.3.3 auth flow	2026-05-01 12:44:08 -05:00
reyesj2	39d0947102	update default elastic agent logging level to warning	2026-04-29 17:38:40 -05:00
Mike Reeves	fa8162de02	Merge pull request #15749 from Security-Onion-Solutions/feature/postgres Add so-postgres Salt states and infrastructure	2026-04-28 10:15:47 -04:00
Mike Reeves	0ecc7ae594	soup: drop --local from postgres.telegraf_users reconcile The manager's /etc/salt/minion (written by so-functions:configure_minion) has no file_roots, so salt-call --local falls back to Salt's default /srv/salt and fails with "No matching sls found for 'postgres.telegraf_users' in env 'base'". \|\| true was silently swallowing the error, which meant the DB roles for the pillar entries just populated by the so-telegraf-cred backfill loop never actually got created. Route through salt-master instead; its file_roots already points at the default/local salt trees.	2026-04-23 11:25:44 -04:00
Mike Reeves	eadad6c163	soup: bootstrap postgres pillar stubs and secret on 3.0.0 upgrade pillar/top.sls now references postgres.soc_postgres / postgres.adv_postgres unconditionally, but make_some_dirs only runs at install time so managers upgrading from 3.0.0 have no local/pillar/postgres/ and salt-master fails pillar render on the first post-upgrade restart. Similarly, secrets_pillar is a no-op on upgrade (secrets.sls already exists), so secrets:postgres_pass never gets seeded and the postgres container's POSTGRES_PASSWORD_FILE and SOC's PG_ADMIN_PASS would land empty after highstate. Add ensure_postgres_local_pillar and ensure_postgres_secret to up_to_3.1.0 so the stubs and secret exist before masterlock/salt-master restart. Both are idempotent and safe to re-run.	2026-04-23 10:01:38 -04:00
Mike Reeves	d5c0ec4404	so-yaml_test: cover loadYaml error paths Exercises the FileNotFoundError and generic-exception branches added to loadYaml in the previous commit, restoring 100% coverage required by the build.	2026-04-22 14:30:51 -04:00
Mike Reeves	e616b4c120	so-telegraf-cred: make executable and harden error handling so-telegraf-cred was committed with mode 644, causing `so-telegraf-cred add "$MINION_ID"` in so-minion's add_telegraf_to_minion to fail with "Permission denied" and log "Failed to provision postgres telegraf cred for <minion>". Mark it executable. Also bail early in seed_creds_file if mkdir/printf/chmod fail, and in so-yaml.py loadYaml surface a clear stderr message with the filename instead of an unhandled FileNotFoundError traceback.	2026-04-22 14:25:19 -04:00
Mike Reeves	f240a99e22	so-telegraf-cred: thin bash wrapper around so-yaml.py Swap the ~150-line Python implementation for a 48-line bash script that delegates YAML mutation to so-yaml.py — the same helper so-minion and soup already use. Same semantics: seed the creds pillar on first use, idempotent add, silent remove. SO minion ids are dot-free by construction (setup/so-functions:1884 strips everything after the first '.'), so using the raw id as the so-yaml.py key path is safe.	2026-04-22 11:09:53 -04:00
Mike Reeves	614f32c5e0	Split postgres auth from per-minion telegraf creds The old flow had two writers for each per-minion Telegraf password (so-minion wrote the minion pillar; postgres.auth regenerated any missing aggregate entries). They drifted on first-boot and there was no trigger to create DB roles when a new minion joined. Split responsibilities: - pillar/postgres/auth.sls (manager-scoped) keeps only the so_postgres admin cred. - pillar/telegraf/creds.sls (grid-wide) holds a {minion_id: {user, pass}} map, shadowed per-install by the local-pillar copy. - salt/manager/tools/sbin/so-telegraf-cred is the single writer: flock, atomic YAML write, PyYAML safe_dump so passwords never round-trip through so-yaml.py's type coercion. Idempotent add, quiet remove. - so-minion's add/remove hooks now shell out to so-telegraf-cred instead of editing pillar files directly. - postgres.telegraf_users iterates the new pillar key and CREATE/ALTERs roles from it; telegraf.conf reads its own entry via grains.id. - orch.deploy_newnode runs postgres.telegraf_users on the manager and refreshes the new minion's pillar before the new node highstates, so the DB role is in place the first time telegraf tries to connect. - soup's post_to_3.1.0 backfills the creds pillar from accepted salt keys (idempotent) and runs postgres.telegraf_users once to reconcile the DB.	2026-04-22 10:55:15 -04:00
Josh Patterson	edd207a9d5	soup update socloud.conf	2026-04-22 09:20:53 -04:00
Mike Reeves	724d76965f	soup: update postgres backfill comment to reflect reactor removal The reactor path is gone; so-minion now owns add/delete for new minions. The backfill itself is unchanged — postgres.auth's up_minions fallback fills the aggregate, postgres.telegraf_users creates the roles, and the bash loop fans to per-minion pillar files — so the pre-feature upgrade story still works end-to-end. Just refresh the comment so it isn't misleading.	2026-04-21 15:45:05 -04:00
Mike Reeves	dbf4fb66a4	Clean up postgres telegraf cred on so-minion delete Paired with the add path in add_telegraf_to_minion: when a minion is removed, drop its entry from the aggregate postgres pillar and drop the matching so_telegraf_<safe> role from the database. Without this, stale entries and DB roles accumulate over time. Makes rotate-password and compromise-recovery both a clean delete+add: so-minion -o=delete -m=<id> so-minion -o=add -m=<id> The first call drops the role and clears the aggregate pillar; the second generates a brand-new password. The cleanup is best-effort — if so-postgres isn't running or the DROP ROLE fails (e.g., the role owns unexpected objects), we log a warning and continue so the minion delete itself never gets blocked by postgres state. Admins can mop up stray roles manually if that happens.	2026-04-21 15:43:01 -04:00
Mike Reeves	5f28e9b191	Move per-minion telegraf cred provisioning into so-minion Simpler, race-free replacement for the reactor + orch + fan-out chain. - salt/manager/tools/sbin/so-minion: expand add_telegraf_to_minion to generate a random 72-char password, reuse any existing password from the aggregate pillar, write postgres.telegraf.{user,pass} into the minion's own pillar file, and update the aggregate pillar so postgres.telegraf_users can CREATE ROLE on the next manager apply. Every create<ROLE> function already calls this hook, so add / addVM / setup dispatches are all covered identically and synchronously. - salt/postgres/auth.sls: strip the fanout_targets loop and the postgres_telegraf_minion_pillar_<safe> cmd.run block — it's now redundant. The state still manages the so_postgres admin user and writes the aggregate pillar for postgres.telegraf_users to consume. - salt/reactor/telegraf_user_sync.sls: deleted. - salt/orch/telegraf_postgres_sync.sls: deleted. - salt/salt/master.sls: drop the reactor_config_telegraf block that registered the reactor on /etc/salt/master.d/reactor_telegraf.conf. - salt/orch/deploy_newnode.sls: drop the manager_fanout_postgres_telegraf step and the require: it added to the newnode highstate. Back to its original 3/dev shape. No more ephemeral postgres_fanout_minion pillar, no more async salt/key reactor, no more so-minion setupMinionFiles race: the pillar write happens inline inside setupMinionFiles itself.	2026-04-21 15:34:15 -04:00
Mike Reeves	81c0f2b464	so-yaml.py: tolerate missing ancestors in removeKey replace calls removeKey before addKey, so running `so-yaml.py replace` on a new dotted key whose parent doesn't exist — e.g., postgres.auth fanning postgres.telegraf.user into a minion pillar file that has never carried any postgres.* keys — crashed with KeyError: 'postgres' from removeKey recursing into a missing parent dict. Make removeKey a no-op when an intermediate key is absent so that: - `remove` has the natural "remove if exists" semantics, and - `replace` works for brand-new nested keys.	2026-04-21 14:43:10 -04:00
Mike Reeves	05f6503d61	Gate postgres telegraf fan-out on reactor-provided minion id postgres.auth was running an `unless` shell check per up-minion on every manager highstate, even when nothing had changed — N fork+python starts of so-yaml.py add up on large grids. The work is only needed when a specific minion's key is accepted. - salt/postgres/auth.sls: fan out only when postgres_fanout_minion pillar is set (targets that single minion). Manager highstates with no pillar take a zero-N code path. - salt/reactor/telegraf_user_sync.sls: re-pass the accepted minion id as postgres_fanout_minion to the orch. - salt/orch/telegraf_postgres_sync.sls: forward the pillar to the salt.state invocation so the state render sees it. - salt/manager/tools/sbin/soup: for the one-time 3.1.0 backfill, drop the per-minion state.apply and do an in-shell loop over the minion pillar files using so-yaml.py directly. Skips minions that already have postgres.telegraf.user set.	2026-04-21 10:05:08 -04:00
Mike Reeves	b6a3d1889c	Fix soup state.apply args for postgres provisioning state.apply takes a single mods argument; space-separated names are not a list, so `state.apply postgres.auth postgres.telegraf_users` was only applying postgres.auth and silently dropping the telegraf_users state. Use comma-separated mods and add queue=True to match the rest of soup.	2026-04-20 14:40:32 -04:00
Mike Reeves	1cb34b089c	Restore 3/dev soup and add postgres users to post_to_3.1.0 feature/postgres had rewritten the 3.1.0 upgrade block, dropping the elastic upgrade work 3/dev landed for 9.0.8→9.3.3: elasticsearch_backup_index_templates, the component template state cleanup, and the /usr/sbin/so-kibana-space-defaults post-upgrade call. It also carried an older ES upgrade mapping (8.18.8→9.0.8) that was superseded on 3/dev (9.0.8→9.3.3 for 3.0.0-20260331), and a handful of latent shell-quoting regressions in verify_es_version_compatibility and the intermediate-upgrade helpers. Adopt the 3/dev soup verbatim and only add the new Telegraf Postgres provisioning to post_to_3.1.0 on top of so-kibana-space-defaults.	2026-04-20 14:38:55 -04:00
Mike Reeves	f7b80f5931	Merge branch '3/dev' into feature/postgres	2026-04-16 16:37:02 -04:00
Mike Reeves	f11d315fea	Fix soup	2026-04-16 16:35:24 -04:00

1 2 3 4 5 ...

838 Commits