Commit Graph

11736 Commits

Author SHA1 Message Date
Josh Patterson 0a8f2e01a0 install pyinotify 2026-04-29 16:41:56 -04:00
Josh Patterson 4546d7bc52 Merge remote-tracking branch 'origin/3/dev' into saltthangs 2026-04-29 14:28:19 -04:00
Jorge Reyes 2f01ce3b23 so-elastic-fleet-outputs-update now checks for cert drift. Remove running --cert arg on cert change to prevent highstate from running outputs-update 2x 2026-04-29 12:33:28 -05:00
Mike Reeves 71b19c1b5f Merge pull request #15840 from Security-Onion-Solutions/fix/import-postgres-firewall
Open postgres in DOCKER-USER firewall everywhere influxdb is open
2026-04-29 09:20:03 -04:00
Mike Reeves 82e55ae87f Open postgres on every hostgroup that opens influxdb
The static defaults only listed postgres on each role's self-hostgroup,
leaving sensor/searchnode/heavynode/receiver/fleet/idh/desktop/hypervisor
hostgroups unable to reach the manager's so-postgres in distributed
grids. A dynamic block in firewall/map.jinja added postgres to those
hostgroups only when telegraf.output was switched to POSTGRES/BOTH,
which left postgres unreachable by default.

Mirror influxdb statically across manager/managerhype/managersearch/
standalone for every hostgroup that already lists influxdb, and drop
the now-redundant telegraf-gated dynamic block from firewall/map.jinja.
2026-04-29 09:09:50 -04:00
Mike Reeves 3e02001544 Open postgres port for import role in DOCKER-USER firewall
When so-postgres was wired in (868cd1187), the import role's firewall
defaults were missed while every other manager-class role (manager,
managerhype, managersearch, standalone, eval) had postgres added to
their DOCKER-USER manager-hostgroup portgroups. As a result, on a
fresh import install the so-postgres container starts but tcp/5432 is
dropped at DOCKER-USER, so soc/kratos/telegraf can't reach it.

Add postgres alongside the existing influxdb entry so import nodes
match the other roles.
2026-04-29 08:48:45 -04:00
Josh Patterson 17849d8758 Merge remote-tracking branch 'origin/3/dev' into saltthangs 2026-04-28 15:49:22 -04:00
Mike Reeves 2dcded6cca drop postgres module from soc defaults injection
The soc binary on 3/dev does not register a postgres module, so injecting
postgres into soc.config.server.modules makes soc abort at launch with
'Module does not exist: postgres'. The soc-side module is staged on
feature/postgres but is not landing this release. Drop the injection
until the module ships; salt/postgres state and pillars are unchanged.
2026-04-28 15:46:56 -04:00
Josh Patterson d3d30a587c Merge remote-tracking branch 'origin/3/dev' into saltthangs 2026-04-28 15:30:31 -04:00
Mike Reeves 8ca59e6f0c Merge pull request #15838 from Security-Onion-Solutions/fix/docker-refresh-multiarch-pull
Fix/docker refresh multiarch pull
2026-04-28 15:14:27 -04:00
Mike Reeves 82dac82d15 drop platform/digest pull resolution
The digest-pull logic was added to make `docker push` work for multi-arch
upstream tags. Now that the push step is `docker buildx imagetools create`
pinned to the gpg-verified RepoDigest, the registry-to-registry copy
handles single- and multi-arch sources without help. Reverts the pull
back to the original line and removes the unused PLATFORM_OS/_ARCH
detection.
2026-04-28 14:54:25 -04:00
Mike Reeves 288a823edf push images via buildx imagetools create
Replaces `docker push` with a registry-to-registry copy. On Docker 29.x
with the containerd image store, `docker push` of a freshly-pulled image
hits a path that wraps single-platform manifests in a synthetic index
and then can't push the layers it claims to reference, producing
`NotFound: content digest ...` even when the image is fully present.

Keep the local `docker tag` so so-image-pull's `docker images | grep :5000`
existence check continues to work.
2026-04-28 14:49:02 -04:00
reyesj2 9cec79b299 check current fleet policy cert against cert on disk
Co-authored-by: Copilot <copilot@github.com>
2026-04-28 13:34:39 -05:00
Mike Reeves c86399327b fix so-docker-refresh push for multi-arch source images
docker pull of a multi-arch tag on Docker 29.x leaves the local tag
pointing at the image index rather than the platform-specific manifest.
The subsequent docker push then tries to push every sub-manifest the
index references and fails on layers we never fetched.

Resolve the local-platform manifest digest from the upstream index via
docker buildx imagetools inspect, pull by that digest, and re-tag locally
to the canonical tag. The signing flow and the existing tag/push to the
embedded registry are unchanged.
2026-04-28 14:27:59 -04:00
Josh Patterson 034711d148 Merge remote-tracking branch 'origin/3/dev' into saltthangs 2026-04-28 10:47:29 -04:00
Mike Reeves fa8162de02 Merge pull request #15749 from Security-Onion-Solutions/feature/postgres
Add so-postgres Salt states and infrastructure
2026-04-28 10:15:47 -04:00
Josh Patterson 33abc429d1 Merge pull request #15835 from Security-Onion-Solutions/fix/reactor/sominon_setup
fix sominion_setup reactor
2026-04-28 08:55:58 -04:00
Jorge Reyes b22585ca90 Merge pull request #15833 from Security-Onion-Solutions/reyesj2-es933
exclude more transform job errors
2026-04-27 15:05:11 -05:00
reyesj2 9f2ca7012f exclude more transform job errors 2026-04-27 15:02:13 -05:00
Josh Patterson 21aeb68188 fix sominion_setup reactor 2026-04-27 14:30:41 -04:00
Jorge Reyes a45e59239f Merge pull request #15826 from Security-Onion-Solutions/reyesj2-es933
heavynode should run es cluster state
2026-04-24 13:07:48 -05:00
Josh Patterson 2ad0bcab7c Merge pull request #15828 from Security-Onion-Solutions/fix/annotations
readonly soc and kratos enabled
2026-04-24 14:00:02 -04:00
Josh Patterson 070d150420 readonly soc and kratos enabled 2026-04-24 13:56:35 -04:00
reyesj2 90ecbe90d8 allow heavynodes to run elasticsearch/cluster state 2026-04-24 12:56:27 -05:00
Jorge Reyes 88b30adf7f Merge pull request #15823 from Security-Onion-Solutions/reyesj2-es933
typo
2026-04-24 09:27:08 -05:00
reyesj2 b6acf3b522 typo 2026-04-24 09:24:58 -05:00
Jorge Reyes 810a582717 Merge pull request #15813 from Security-Onion-Solutions/reyesj2-es933
split up Elastic Fleet state
2026-04-23 14:51:32 -05:00
Mike Reeves a6948e8dcb Remove helpLink for influxdb in soc_global.yaml
Removed helpLink for influxdb from endgamehost configuration.
2026-04-23 13:56:41 -04:00
Mike Reeves 0ecc7ae594 soup: drop --local from postgres.telegraf_users reconcile
The manager's /etc/salt/minion (written by so-functions:configure_minion)
has no file_roots, so salt-call --local falls back to Salt's default
/srv/salt and fails with "No matching sls found for 'postgres.telegraf_users'
in env 'base'". || true was silently swallowing the error, which meant the
DB roles for the pillar entries just populated by the so-telegraf-cred
backfill loop never actually got created.

Route through salt-master instead; its file_roots already points at the
default/local salt trees.
2026-04-23 11:25:44 -04:00
reyesj2 fdfca469cc prevent non-manager nodes from running elasticsearch.cluster state manually 2026-04-23 09:53:07 -05:00
reyesj2 5f2ec76ba8 prevent fleetnode from being able to run elasticfleet.manager state manually 2026-04-23 09:50:45 -05:00
reyesj2 b015c8ff14 remove docker import 2026-04-23 09:31:30 -05:00
reyesj2 7e70870a9e remove globals import 2026-04-23 09:25:36 -05:00
Mike Reeves eadad6c163 soup: bootstrap postgres pillar stubs and secret on 3.0.0 upgrade
pillar/top.sls now references postgres.soc_postgres / postgres.adv_postgres
unconditionally, but make_some_dirs only runs at install time so managers
upgrading from 3.0.0 have no local/pillar/postgres/ and salt-master fails
pillar render on the first post-upgrade restart. Similarly, secrets_pillar
is a no-op on upgrade (secrets.sls already exists), so secrets:postgres_pass
never gets seeded and the postgres container's POSTGRES_PASSWORD_FILE and
SOC's PG_ADMIN_PASS would land empty after highstate.

Add ensure_postgres_local_pillar and ensure_postgres_secret to up_to_3.1.0
so the stubs and secret exist before masterlock/salt-master restart. Both
are idempotent and safe to re-run.
2026-04-23 10:01:38 -04:00
reyesj2 22b32a16dd include elasticfleet.config 2026-04-23 08:30:47 -05:00
reyesj2 22f869734e add check for files before attempting to use file pattern to load templates 2026-04-22 23:11:31 -05:00
reyesj2 398bc9e4ed update kibana discardCorruptObjects version 2026-04-22 20:38:13 -05:00
reyesj2 72dbb69a1c fix searchnodes running elasticsearch/cluster state 2026-04-22 20:37:48 -05:00
reyesj2 339959d1c0 split up elasticfleet/enabled state 2026-04-22 20:30:40 -05:00
Mike Reeves d5c0ec4404 so-yaml_test: cover loadYaml error paths
Exercises the FileNotFoundError and generic-exception branches added to
loadYaml in the previous commit, restoring 100% coverage required by
the build.
2026-04-22 14:30:51 -04:00
Mike Reeves e616b4c120 so-telegraf-cred: make executable and harden error handling
so-telegraf-cred was committed with mode 644, causing
`so-telegraf-cred add "$MINION_ID"` in so-minion's add_telegraf_to_minion
to fail with "Permission denied" and log "Failed to provision postgres
telegraf cred for <minion>". Mark it executable.

Also bail early in seed_creds_file if mkdir/printf/chmod fail, and in
so-yaml.py loadYaml surface a clear stderr message with the filename
instead of an unhandled FileNotFoundError traceback.
2026-04-22 14:25:19 -04:00
Mike Reeves f240a99e22 so-telegraf-cred: thin bash wrapper around so-yaml.py
Swap the ~150-line Python implementation for a 48-line bash script that
delegates YAML mutation to so-yaml.py — the same helper so-minion and
soup already use. Same semantics: seed the creds pillar on first use,
idempotent add, silent remove.

SO minion ids are dot-free by construction (setup/so-functions:1884
strips everything after the first '.'), so using the raw id as the
so-yaml.py key path is safe.
2026-04-22 11:09:53 -04:00
Mike Reeves 614f32c5e0 Split postgres auth from per-minion telegraf creds
The old flow had two writers for each per-minion Telegraf password
(so-minion wrote the minion pillar; postgres.auth regenerated any
missing aggregate entries). They drifted on first-boot and there was
no trigger to create DB roles when a new minion joined.

Split responsibilities:

- pillar/postgres/auth.sls (manager-scoped) keeps only the so_postgres
  admin cred.
- pillar/telegraf/creds.sls (grid-wide) holds a {minion_id: {user,
  pass}} map, shadowed per-install by the local-pillar copy.
- salt/manager/tools/sbin/so-telegraf-cred is the single writer:
  flock, atomic YAML write, PyYAML safe_dump so passwords never
  round-trip through so-yaml.py's type coercion. Idempotent add, quiet
  remove.
- so-minion's add/remove hooks now shell out to so-telegraf-cred
  instead of editing pillar files directly.
- postgres.telegraf_users iterates the new pillar key and CREATE/ALTERs
  roles from it; telegraf.conf reads its own entry via grains.id.
- orch.deploy_newnode runs postgres.telegraf_users on the manager and
  refreshes the new minion's pillar before the new node highstates,
  so the DB role is in place the first time telegraf tries to connect.
- soup's post_to_3.1.0 backfills the creds pillar from accepted salt
  keys (idempotent) and runs postgres.telegraf_users once to reconcile
  the DB.
2026-04-22 10:55:15 -04:00
Josh Patterson cd6707a566 Merge pull request #15800 from Security-Onion-Solutions/feature/vm-raid-status
monitor raid for vms
2026-04-22 09:42:44 -04:00
Josh Patterson edd207a9d5 soup update socloud.conf 2026-04-22 09:20:53 -04:00
Mike Reeves 724d76965f soup: update postgres backfill comment to reflect reactor removal
The reactor path is gone; so-minion now owns add/delete for new
minions. The backfill itself is unchanged — postgres.auth's up_minions
fallback fills the aggregate, postgres.telegraf_users creates the
roles, and the bash loop fans to per-minion pillar files — so the
pre-feature upgrade story still works end-to-end. Just refresh the
comment so it isn't misleading.
2026-04-21 15:45:05 -04:00
Mike Reeves dbf4fb66a4 Clean up postgres telegraf cred on so-minion delete
Paired with the add path in add_telegraf_to_minion: when a minion is
removed, drop its entry from the aggregate postgres pillar and drop the
matching so_telegraf_<safe> role from the database. Without this, stale
entries and DB roles accumulate over time.

Makes rotate-password and compromise-recovery both a clean delete+add:

  so-minion -o=delete -m=<id>
  so-minion -o=add    -m=<id>

The first call drops the role and clears the aggregate pillar; the
second generates a brand-new password.

The cleanup is best-effort — if so-postgres isn't running or the DROP
ROLE fails (e.g., the role owns unexpected objects), we log a warning
and continue so the minion delete itself never gets blocked by postgres
state. Admins can mop up stray roles manually if that happens.
2026-04-21 15:43:01 -04:00
Mike Reeves 5f28e9b191 Move per-minion telegraf cred provisioning into so-minion
Simpler, race-free replacement for the reactor + orch + fan-out chain.

- salt/manager/tools/sbin/so-minion: expand add_telegraf_to_minion to
  generate a random 72-char password, reuse any existing password from
  the aggregate pillar, write postgres.telegraf.{user,pass} into the
  minion's own pillar file, and update the aggregate pillar so
  postgres.telegraf_users can CREATE ROLE on the next manager apply.
  Every create<ROLE> function already calls this hook, so add / addVM /
  setup dispatches are all covered identically and synchronously.
- salt/postgres/auth.sls: strip the fanout_targets loop and the
  postgres_telegraf_minion_pillar_<safe> cmd.run block — it's now
  redundant. The state still manages the so_postgres admin user and
  writes the aggregate pillar for postgres.telegraf_users to consume.
- salt/reactor/telegraf_user_sync.sls: deleted.
- salt/orch/telegraf_postgres_sync.sls: deleted.
- salt/salt/master.sls: drop the reactor_config_telegraf block that
  registered the reactor on /etc/salt/master.d/reactor_telegraf.conf.
- salt/orch/deploy_newnode.sls: drop the manager_fanout_postgres_telegraf
  step and the require: it added to the newnode highstate. Back to its
  original 3/dev shape.

No more ephemeral postgres_fanout_minion pillar, no more async salt/key
reactor, no more so-minion setupMinionFiles race: the pillar write
happens inline inside setupMinionFiles itself.
2026-04-21 15:34:15 -04:00
Mike Reeves 1abfd77351 Hide telegraf password from console and close so-minion race
Two fixes on the postgres telegraf fan-out path:

1. postgres.auth cmd.run leaked the password to the console because
   Salt always prints the Name: field and `show_changes: False` does
   not apply to cmd.run. Move the user and password into the `env:`
   attribute so the shell body still sees them via $PG_USER / $PG_PASS
   but Salt's state reporter never renders them.

2. so-minion's addMinion -> setupMinionFiles sequence removes the
   minion pillar file and rewrites it from scratch, which wipes the
   postgres.telegraf.* entries the reactor may have already written on
   salt-key accept. Add a postgres.auth fan-out step to
   orch.deploy_newnode (the orch so-minion kicks off after
   setupMinionFiles) and require it from the new minion's highstate.
   Idempotent via the existing unless: guard in postgres.auth.
2026-04-21 15:10:57 -04:00
reyesj2 06a555fafb urlencode elasticsearch version 2026-04-21 14:01:31 -05:00