Compare commits

..

60 Commits

Author SHA1 Message Date
Mike Reeves 2dcded6cca drop postgres module from soc defaults injection
The soc binary on 3/dev does not register a postgres module, so injecting
postgres into soc.config.server.modules makes soc abort at launch with
'Module does not exist: postgres'. The soc-side module is staged on
feature/postgres but is not landing this release. Drop the injection
until the module ships; salt/postgres state and pillars are unchanged.
2026-04-28 15:46:56 -04:00
Mike Reeves 8ca59e6f0c Merge pull request #15838 from Security-Onion-Solutions/fix/docker-refresh-multiarch-pull
Fix/docker refresh multiarch pull
2026-04-28 15:14:27 -04:00
Mike Reeves 82dac82d15 drop platform/digest pull resolution
The digest-pull logic was added to make `docker push` work for multi-arch
upstream tags. Now that the push step is `docker buildx imagetools create`
pinned to the gpg-verified RepoDigest, the registry-to-registry copy
handles single- and multi-arch sources without help. Reverts the pull
back to the original line and removes the unused PLATFORM_OS/_ARCH
detection.
2026-04-28 14:54:25 -04:00
Mike Reeves 288a823edf push images via buildx imagetools create
Replaces `docker push` with a registry-to-registry copy. On Docker 29.x
with the containerd image store, `docker push` of a freshly-pulled image
hits a path that wraps single-platform manifests in a synthetic index
and then can't push the layers it claims to reference, producing
`NotFound: content digest ...` even when the image is fully present.

Keep the local `docker tag` so so-image-pull's `docker images | grep :5000`
existence check continues to work.
2026-04-28 14:49:02 -04:00
Jorge Reyes f9e3d30a71 Merge pull request #15837 from Security-Onion-Solutions/reyesj2/elastic-fleet-cert-check
check current fleet policy cert against cert on disk
2026-04-28 13:47:55 -05:00
reyesj2 9cec79b299 check current fleet policy cert against cert on disk
Co-authored-by: Copilot <copilot@github.com>
2026-04-28 13:34:39 -05:00
Mike Reeves c86399327b fix so-docker-refresh push for multi-arch source images
docker pull of a multi-arch tag on Docker 29.x leaves the local tag
pointing at the image index rather than the platform-specific manifest.
The subsequent docker push then tries to push every sub-manifest the
index references and fails on layers we never fetched.

Resolve the local-platform manifest digest from the upstream index via
docker buildx imagetools inspect, pull by that digest, and re-tag locally
to the canonical tag. The signing flow and the existing tag/push to the
embedded registry are unchanged.
2026-04-28 14:27:59 -04:00
Mike Reeves fa8162de02 Merge pull request #15749 from Security-Onion-Solutions/feature/postgres
Add so-postgres Salt states and infrastructure
2026-04-28 10:15:47 -04:00
Josh Patterson 33abc429d1 Merge pull request #15835 from Security-Onion-Solutions/fix/reactor/sominon_setup
fix sominion_setup reactor
2026-04-28 08:55:58 -04:00
Jorge Reyes b22585ca90 Merge pull request #15833 from Security-Onion-Solutions/reyesj2-es933
exclude more transform job errors
2026-04-27 15:05:11 -05:00
reyesj2 9f2ca7012f exclude more transform job errors 2026-04-27 15:02:13 -05:00
Josh Patterson 21aeb68188 fix sominion_setup reactor 2026-04-27 14:30:41 -04:00
Josh Patterson 81e60ec5bf Merge pull request #15829 from Security-Onion-Solutions/fix/reinstall2
fix reinstall
2026-04-24 16:20:53 -04:00
Josh Patterson 199c2746f1 stop salt-minion and salt-master regardless of install type. display reinstall on console and save to logfile 2026-04-24 15:24:11 -04:00
Josh Patterson 8eca465ef6 uninstall elastic-agent before stopping dockers on reinstall 2026-04-24 14:35:11 -04:00
Jorge Reyes a45e59239f Merge pull request #15826 from Security-Onion-Solutions/reyesj2-es933
heavynode should run es cluster state
2026-04-24 13:07:48 -05:00
Josh Patterson 2ad0bcab7c Merge pull request #15828 from Security-Onion-Solutions/fix/annotations
readonly soc and kratos enabled
2026-04-24 14:00:02 -04:00
Josh Patterson 070d150420 readonly soc and kratos enabled 2026-04-24 13:56:35 -04:00
reyesj2 90ecbe90d8 allow heavynodes to run elasticsearch/cluster state 2026-04-24 12:56:27 -05:00
Josh Patterson 813fa03dc3 Merge pull request #15824 from Security-Onion-Solutions/fix/reinstall2
fix reinstall issue with salt
2026-04-24 12:22:54 -04:00
Josh Patterson 02381fbbe9 stop salt-cloud , belt-and-suspenders against a broken/incomplete salt RPM 2026-04-24 11:33:21 -04:00
Josh Patterson 0722b681b1 redo service stop on reinstall 2026-04-24 11:04:46 -04:00
Josh Patterson 564815e836 redo how services are stopped during reinstall 2026-04-24 10:46:29 -04:00
Jorge Reyes 88b30adf7f Merge pull request #15823 from Security-Onion-Solutions/reyesj2-es933
typo
2026-04-24 09:27:08 -05:00
reyesj2 b6acf3b522 typo 2026-04-24 09:24:58 -05:00
Jason Ertel ba55468da8 Merge pull request #15822 from Security-Onion-Solutions/jertel/wip
numeric test description
2026-04-24 08:26:55 -04:00
Jason Ertel cdd217283d numeric test description 2026-04-24 08:13:36 -04:00
Jorge Reyes 810a582717 Merge pull request #15813 from Security-Onion-Solutions/reyesj2-es933
split up Elastic Fleet state
2026-04-23 14:51:32 -05:00
Mike Reeves a6948e8dcb Remove helpLink for influxdb in soc_global.yaml
Removed helpLink for influxdb from endgamehost configuration.
2026-04-23 13:56:41 -04:00
Mike Reeves 5f35554fdc Merge pull request #15712 from Security-Onion-Solutions/soupfix
Fix soup
2026-04-23 12:39:50 -04:00
Mike Reeves 0ecc7ae594 soup: drop --local from postgres.telegraf_users reconcile
The manager's /etc/salt/minion (written by so-functions:configure_minion)
has no file_roots, so salt-call --local falls back to Salt's default
/srv/salt and fails with "No matching sls found for 'postgres.telegraf_users'
in env 'base'". || true was silently swallowing the error, which meant the
DB roles for the pillar entries just populated by the so-telegraf-cred
backfill loop never actually got created.

Route through salt-master instead; its file_roots already points at the
default/local salt trees.
2026-04-23 11:25:44 -04:00
reyesj2 fdfca469cc prevent non-manager nodes from running elasticsearch.cluster state manually 2026-04-23 09:53:07 -05:00
reyesj2 5f2ec76ba8 prevent fleetnode from being able to run elasticfleet.manager state manually 2026-04-23 09:50:45 -05:00
reyesj2 b015c8ff14 remove docker import 2026-04-23 09:31:30 -05:00
reyesj2 7e70870a9e remove globals import 2026-04-23 09:25:36 -05:00
Mike Reeves eadad6c163 soup: bootstrap postgres pillar stubs and secret on 3.0.0 upgrade
pillar/top.sls now references postgres.soc_postgres / postgres.adv_postgres
unconditionally, but make_some_dirs only runs at install time so managers
upgrading from 3.0.0 have no local/pillar/postgres/ and salt-master fails
pillar render on the first post-upgrade restart. Similarly, secrets_pillar
is a no-op on upgrade (secrets.sls already exists), so secrets:postgres_pass
never gets seeded and the postgres container's POSTGRES_PASSWORD_FILE and
SOC's PG_ADMIN_PASS would land empty after highstate.

Add ensure_postgres_local_pillar and ensure_postgres_secret to up_to_3.1.0
so the stubs and secret exist before masterlock/salt-master restart. Both
are idempotent and safe to re-run.
2026-04-23 10:01:38 -04:00
reyesj2 22b32a16dd include elasticfleet.config 2026-04-23 08:30:47 -05:00
reyesj2 22f869734e add check for files before attempting to use file pattern to load templates 2026-04-22 23:11:31 -05:00
reyesj2 398bc9e4ed update kibana discardCorruptObjects version 2026-04-22 20:38:13 -05:00
reyesj2 72dbb69a1c fix searchnodes running elasticsearch/cluster state 2026-04-22 20:37:48 -05:00
reyesj2 339959d1c0 split up elasticfleet/enabled state 2026-04-22 20:30:40 -05:00
Mike Reeves d5c0ec4404 so-yaml_test: cover loadYaml error paths
Exercises the FileNotFoundError and generic-exception branches added to
loadYaml in the previous commit, restoring 100% coverage required by
the build.
2026-04-22 14:30:51 -04:00
Mike Reeves e616b4c120 so-telegraf-cred: make executable and harden error handling
so-telegraf-cred was committed with mode 644, causing
`so-telegraf-cred add "$MINION_ID"` in so-minion's add_telegraf_to_minion
to fail with "Permission denied" and log "Failed to provision postgres
telegraf cred for <minion>". Mark it executable.

Also bail early in seed_creds_file if mkdir/printf/chmod fail, and in
so-yaml.py loadYaml surface a clear stderr message with the filename
instead of an unhandled FileNotFoundError traceback.
2026-04-22 14:25:19 -04:00
Mike Reeves f240a99e22 so-telegraf-cred: thin bash wrapper around so-yaml.py
Swap the ~150-line Python implementation for a 48-line bash script that
delegates YAML mutation to so-yaml.py — the same helper so-minion and
soup already use. Same semantics: seed the creds pillar on first use,
idempotent add, silent remove.

SO minion ids are dot-free by construction (setup/so-functions:1884
strips everything after the first '.'), so using the raw id as the
so-yaml.py key path is safe.
2026-04-22 11:09:53 -04:00
Mike Reeves 614f32c5e0 Split postgres auth from per-minion telegraf creds
The old flow had two writers for each per-minion Telegraf password
(so-minion wrote the minion pillar; postgres.auth regenerated any
missing aggregate entries). They drifted on first-boot and there was
no trigger to create DB roles when a new minion joined.

Split responsibilities:

- pillar/postgres/auth.sls (manager-scoped) keeps only the so_postgres
  admin cred.
- pillar/telegraf/creds.sls (grid-wide) holds a {minion_id: {user,
  pass}} map, shadowed per-install by the local-pillar copy.
- salt/manager/tools/sbin/so-telegraf-cred is the single writer:
  flock, atomic YAML write, PyYAML safe_dump so passwords never
  round-trip through so-yaml.py's type coercion. Idempotent add, quiet
  remove.
- so-minion's add/remove hooks now shell out to so-telegraf-cred
  instead of editing pillar files directly.
- postgres.telegraf_users iterates the new pillar key and CREATE/ALTERs
  roles from it; telegraf.conf reads its own entry via grains.id.
- orch.deploy_newnode runs postgres.telegraf_users on the manager and
  refreshes the new minion's pillar before the new node highstates,
  so the DB role is in place the first time telegraf tries to connect.
- soup's post_to_3.1.0 backfills the creds pillar from accepted salt
  keys (idempotent) and runs postgres.telegraf_users once to reconcile
  the DB.
2026-04-22 10:55:15 -04:00
Josh Patterson cd6707a566 Merge pull request #15800 from Security-Onion-Solutions/feature/vm-raid-status
monitor raid for vms
2026-04-22 09:42:44 -04:00
Josh Patterson edd207a9d5 soup update socloud.conf 2026-04-22 09:20:53 -04:00
Mike Reeves 724d76965f soup: update postgres backfill comment to reflect reactor removal
The reactor path is gone; so-minion now owns add/delete for new
minions. The backfill itself is unchanged — postgres.auth's up_minions
fallback fills the aggregate, postgres.telegraf_users creates the
roles, and the bash loop fans to per-minion pillar files — so the
pre-feature upgrade story still works end-to-end. Just refresh the
comment so it isn't misleading.
2026-04-21 15:45:05 -04:00
Mike Reeves dbf4fb66a4 Clean up postgres telegraf cred on so-minion delete
Paired with the add path in add_telegraf_to_minion: when a minion is
removed, drop its entry from the aggregate postgres pillar and drop the
matching so_telegraf_<safe> role from the database. Without this, stale
entries and DB roles accumulate over time.

Makes rotate-password and compromise-recovery both a clean delete+add:

  so-minion -o=delete -m=<id>
  so-minion -o=add    -m=<id>

The first call drops the role and clears the aggregate pillar; the
second generates a brand-new password.

The cleanup is best-effort — if so-postgres isn't running or the DROP
ROLE fails (e.g., the role owns unexpected objects), we log a warning
and continue so the minion delete itself never gets blocked by postgres
state. Admins can mop up stray roles manually if that happens.
2026-04-21 15:43:01 -04:00
Mike Reeves 5f28e9b191 Move per-minion telegraf cred provisioning into so-minion
Simpler, race-free replacement for the reactor + orch + fan-out chain.

- salt/manager/tools/sbin/so-minion: expand add_telegraf_to_minion to
  generate a random 72-char password, reuse any existing password from
  the aggregate pillar, write postgres.telegraf.{user,pass} into the
  minion's own pillar file, and update the aggregate pillar so
  postgres.telegraf_users can CREATE ROLE on the next manager apply.
  Every create<ROLE> function already calls this hook, so add / addVM /
  setup dispatches are all covered identically and synchronously.
- salt/postgres/auth.sls: strip the fanout_targets loop and the
  postgres_telegraf_minion_pillar_<safe> cmd.run block — it's now
  redundant. The state still manages the so_postgres admin user and
  writes the aggregate pillar for postgres.telegraf_users to consume.
- salt/reactor/telegraf_user_sync.sls: deleted.
- salt/orch/telegraf_postgres_sync.sls: deleted.
- salt/salt/master.sls: drop the reactor_config_telegraf block that
  registered the reactor on /etc/salt/master.d/reactor_telegraf.conf.
- salt/orch/deploy_newnode.sls: drop the manager_fanout_postgres_telegraf
  step and the require: it added to the newnode highstate. Back to its
  original 3/dev shape.

No more ephemeral postgres_fanout_minion pillar, no more async salt/key
reactor, no more so-minion setupMinionFiles race: the pillar write
happens inline inside setupMinionFiles itself.
2026-04-21 15:34:15 -04:00
Jorge Reyes 01bd3b6e06 Merge pull request #15807 from Security-Onion-Solutions/reyesj2-es933
urlencode elasticsearch version
2026-04-21 14:11:04 -05:00
Mike Reeves 1abfd77351 Hide telegraf password from console and close so-minion race
Two fixes on the postgres telegraf fan-out path:

1. postgres.auth cmd.run leaked the password to the console because
   Salt always prints the Name: field and `show_changes: False` does
   not apply to cmd.run. Move the user and password into the `env:`
   attribute so the shell body still sees them via $PG_USER / $PG_PASS
   but Salt's state reporter never renders them.

2. so-minion's addMinion -> setupMinionFiles sequence removes the
   minion pillar file and rewrites it from scratch, which wipes the
   postgres.telegraf.* entries the reactor may have already written on
   salt-key accept. Add a postgres.auth fan-out step to
   orch.deploy_newnode (the orch so-minion kicks off after
   setupMinionFiles) and require it from the new minion's highstate.
   Idempotent via the existing unless: guard in postgres.auth.
2026-04-21 15:10:57 -04:00
reyesj2 06a555fafb urlencode elasticsearch version 2026-04-21 14:01:31 -05:00
Jason Ertel 7411031e11 Merge pull request #15803 from Security-Onion-Solutions/jertel/wip
more error handling during image updates
2026-04-21 10:21:56 -04:00
Jason Ertel 247091766c more error handling during image updates 2026-04-21 10:18:05 -04:00
Josh Patterson 7f93110d68 Merge remote-tracking branch 'origin/3/dev' into feature/vm-raid-status 2026-04-21 10:10:38 -04:00
Jason Ertel 33ef138866 Merge pull request #15797 from Security-Onion-Solutions/jertel/wip
fix template annotation
2026-04-20 17:14:53 -04:00
Jason Ertel 71da27dc8e fix template annotation 2026-04-20 17:02:25 -04:00
Josh Patterson ee437265fc monitor raid for vms 2026-04-20 12:00:02 -04:00
Mike Reeves 664f3fd18a Fix soup 2026-04-01 14:47:05 -04:00
34 changed files with 560 additions and 375 deletions
+12
View File
@@ -0,0 +1,12 @@
# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
# https://securityonion.net/license; you may not use this file except in compliance with the
# Elastic License 2.0.
# Per-minion Telegraf Postgres credentials. so-telegraf-cred on the manager is
# the single writer; it mutates /opt/so/saltstack/local/pillar/telegraf/creds.sls
# under flock. Pillar_roots order (local before default) means the populated
# copy shadows this default on any real grid; this file exists so the pillar
# key is always defined on fresh installs and when no minions have creds yet.
telegraf:
postgres_creds: {}
+1
View File
@@ -17,6 +17,7 @@ base:
- sensoroni.adv_sensoroni
- telegraf.soc_telegraf
- telegraf.adv_telegraf
- telegraf.creds
- versionlock.soc_versionlock
- versionlock.adv_versionlock
- soc.license
+3 -1
View File
@@ -35,6 +35,8 @@
'kratos',
'hydra',
'elasticfleet',
'elasticfleet.manager',
'elasticsearch.cluster',
'elastic-fleet-package-registry',
'utility'
] %}
@@ -79,7 +81,7 @@
),
'so-heavynode': (
sensor_states +
['elasticagent', 'elasticsearch', 'logstash', 'redis', 'nginx']
['elasticagent', 'elasticsearch', 'elasticsearch.cluster', 'logstash', 'redis', 'nginx']
),
'so-idh': (
['idh']
+23 -4
View File
@@ -164,8 +164,8 @@ update_docker_containers() {
# Pull down the trusted docker image
run_check_net_err \
"docker pull $CONTAINER_REGISTRY/$IMAGEREPO/$image" \
"Could not pull $image, please ensure connectivity to $CONTAINER_REGISTRY" >> "$LOG_FILE" 2>&1
"Could not pull $image, please ensure connectivity to $CONTAINER_REGISTRY" >> "$LOG_FILE" 2>&1
# Get signature
run_check_net_err \
"curl --retry 5 --retry-delay 60 -A '$CURLTYPE/$CURRENTVERSION/$OS/$(uname -r)' $sig_url --output $SIGNPATH/$image.sig" \
@@ -188,8 +188,27 @@ update_docker_containers() {
if [ -z "$HOSTNAME" ]; then
HOSTNAME=$(hostname)
fi
docker tag $CONTAINER_REGISTRY/$IMAGEREPO/$image $HOSTNAME:5000/$IMAGEREPO/$image >> "$LOG_FILE" 2>&1
docker push $HOSTNAME:5000/$IMAGEREPO/$image >> "$LOG_FILE" 2>&1
docker tag $CONTAINER_REGISTRY/$IMAGEREPO/$image $HOSTNAME:5000/$IMAGEREPO/$image >> "$LOG_FILE" 2>&1 || {
echo "Unable to tag $image" >> "$LOG_FILE" 2>&1
exit 1
}
# Push to the embedded registry via a registry-to-registry copy. Avoids
# `docker push`, which on Docker 29.x with the containerd image store
# represents freshly-pulled images as an index whose layer content
# isn't reachable through the push path. The local `docker tag` above
# is preserved so so-image-pull's `:5000` existence check still works.
# Pin to the digest already gpg-verified above so we copy exactly the
# bytes we approved.
local VERIFIED_REF
VERIFIED_REF=$(echo "$DOCKERINSPECT" | jq -r ".[0].RepoDigests[] | select(. | contains(\"$CONTAINER_REGISTRY\"))" | head -n 1)
if [ -z "$VERIFIED_REF" ] || [ "$VERIFIED_REF" = "null" ]; then
echo "Unable to determine verified digest for $image" >> "$LOG_FILE" 2>&1
exit 1
fi
docker buildx imagetools create --tag $HOSTNAME:5000/$IMAGEREPO/$image "$VERIFIED_REF" >> "$LOG_FILE" 2>&1 || {
echo "Unable to copy $image to embedded registry" >> "$LOG_FILE" 2>&1
exit 1
}
fi
else
echo "There is a problem downloading the $image image. Details: " >> "$LOG_FILE" 2>&1
+1 -1
View File
@@ -227,7 +227,7 @@ if [[ $EXCLUDE_KNOWN_ERRORS == 'Y' ]]; then
EXCLUDED_ERRORS="$EXCLUDED_ERRORS|from NIC checksum offloading" # zeek reporter.log
EXCLUDED_ERRORS="$EXCLUDED_ERRORS|marked for removal" # docker container getting recycled
EXCLUDED_ERRORS="$EXCLUDED_ERRORS|tcp 127.0.0.1:6791: bind: address already in use" # so-elastic-fleet agent restarting. Seen starting w/ 8.18.8 https://github.com/elastic/kibana/issues/201459
EXCLUDED_ERRORS="$EXCLUDED_ERRORS|TransformTask\] \[logs-(tychon|aws_billing|microsoft_defender_endpoint).*user so_kibana lacks the required permissions \[logs-\1" # Known issue with 3 integrations using kibana_system role vs creating unique api creds with proper permissions.
EXCLUDED_ERRORS="$EXCLUDED_ERRORS|TransformTask\] \[logs-(tychon|aws_billing|microsoft_defender_endpoint|armis|o365_metrics|microsoft_sentinel|snyk).*user so_kibana lacks the required permissions \[(logs|metrics)-\1" # Known issue with integrations starting transform jobs that are explicitly not allowed to start as a system user. (installed as so_elastic / so_kibana)
EXCLUDED_ERRORS="$EXCLUDED_ERRORS|manifest unknown" # appears in so-dockerregistry log for so-tcpreplay following docker upgrade to 29.2.1-1
fi
+9 -2
View File
@@ -9,7 +9,7 @@
. /usr/sbin/so-common
software_raid=("SOSMN" "SOSMN-DE02" "SOSSNNV" "SOSSNNV-DE02" "SOS10k-DE02" "SOS10KNV" "SOS10KNV-DE02" "SOS10KNV-DE02" "SOS2000-DE02" "SOS-GOFAST-LT-DE02" "SOS-GOFAST-MD-DE02" "SOS-GOFAST-HV-DE02")
software_raid=("SOSMN" "SOSMN-DE02" "SOSSNNV" "SOSSNNV-DE02" "SOS10k-DE02" "SOS10KNV" "SOS10KNV-DE02" "SOS10KNV-DE02" "SOS2000-DE02" "SOS-GOFAST-LT-DE02" "SOS-GOFAST-MD-DE02" "SOS-GOFAST-HV-DE02" "HVGUEST")
hardware_raid=("SOS1000" "SOS1000F" "SOSSN7200" "SOS5000" "SOS4000")
{%- if salt['grains.get']('sosmodel', '') %}
@@ -87,6 +87,11 @@ check_boss_raid() {
}
check_software_raid() {
if [[ ! -f /proc/mdstat ]]; then
SWRAID=0
return
fi
SWRC=$(grep "_" /proc/mdstat)
if [[ -n $SWRC ]]; then
# RAID is failed in some way
@@ -107,7 +112,9 @@ if [[ "$is_hwraid" == "true" ]]; then
fi
if [[ "$is_softwareraid" == "true" ]]; then
check_software_raid
check_boss_raid
if [ "$model" != "HVGUEST" ]; then
check_boss_raid
fi
fi
sum=$(($SWRAID + $BOSSRAID + $HWRAID))
+4 -103
View File
@@ -17,65 +17,17 @@ include:
- logstash.ssl
- elasticfleet.config
- elasticfleet.sostatus
{%- if GLOBALS.role != "so-fleet" %}
- elasticfleet.manager
{%- endif %}
{% if grains.role not in ['so-fleet'] %}
{% if GLOBALS.role != "so-fleet" %}
# Wait for Elasticsearch to be ready - no reason to try running Elastic Fleet server if ES is not ready
wait_for_elasticsearch_elasticfleet:
cmd.run:
- name: so-elasticsearch-wait
{% endif %}
# If enabled, automatically update Fleet Logstash Outputs
{% if ELASTICFLEETMERGED.config.server.enable_auto_configuration and grains.role not in ['so-import', 'so-eval', 'so-fleet'] %}
so-elastic-fleet-auto-configure-logstash-outputs:
cmd.run:
- name: /usr/sbin/so-elastic-fleet-outputs-update
- retry:
attempts: 4
interval: 30
{# Separate from above in order to catch elasticfleet-logstash.crt changes and force update to fleet output policy #}
so-elastic-fleet-auto-configure-logstash-outputs-force:
cmd.run:
- name: /usr/sbin/so-elastic-fleet-outputs-update --certs
- retry:
attempts: 4
interval: 30
- onchanges:
- x509: etc_elasticfleet_logstash_crt
- x509: elasticfleet_kafka_crt
{% endif %}
# If enabled, automatically update Fleet Server URLs & ES Connection
{% if ELASTICFLEETMERGED.config.server.enable_auto_configuration and grains.role not in ['so-fleet'] %}
so-elastic-fleet-auto-configure-server-urls:
cmd.run:
- name: /usr/sbin/so-elastic-fleet-urls-update
- retry:
attempts: 4
interval: 30
{% endif %}
# Automatically update Fleet Server Elasticsearch URLs & Agent Artifact URLs
{% if grains.role not in ['so-fleet'] %}
so-elastic-fleet-auto-configure-elasticsearch-urls:
cmd.run:
- name: /usr/sbin/so-elastic-fleet-es-url-update
- retry:
attempts: 4
interval: 30
so-elastic-fleet-auto-configure-artifact-urls:
cmd.run:
- name: /usr/sbin/so-elastic-fleet-artifacts-url-update
- retry:
attempts: 4
interval: 30
{% endif %}
# Sync Elastic Agent artifacts to Fleet Node
{% if grains.role in ['so-fleet'] %}
elasticagent_syncartifacts:
file.recurse:
- name: /nsm/elastic-fleet/artifacts/beats
@@ -149,57 +101,6 @@ so-elastic-fleet:
- x509: etc_elasticfleet_crt
{% endif %}
{% if GLOBALS.role != "so-fleet" %}
so-elastic-fleet-package-statefile:
file.managed:
- name: /opt/so/state/elastic_fleet_packages.txt
- contents: {{ELASTICFLEETMERGED.packages}}
so-elastic-fleet-package-upgrade:
cmd.run:
- name: /usr/sbin/so-elastic-fleet-package-upgrade
- retry:
attempts: 3
interval: 10
- onchanges:
- file: /opt/so/state/elastic_fleet_packages.txt
so-elastic-fleet-integrations:
cmd.run:
- name: /usr/sbin/so-elastic-fleet-integration-policy-load
- retry:
attempts: 3
interval: 10
so-elastic-agent-grid-upgrade:
cmd.run:
- name: /usr/sbin/so-elastic-agent-grid-upgrade
- retry:
attempts: 12
interval: 5
so-elastic-fleet-integration-upgrade:
cmd.run:
- name: /usr/sbin/so-elastic-fleet-integration-upgrade
- retry:
attempts: 3
interval: 10
{# Optional integrations script doesn't need the retries like so-elastic-fleet-integration-upgrade which loads the default integrations #}
so-elastic-fleet-addon-integrations:
cmd.run:
- name: /usr/sbin/so-elastic-fleet-optional-integrations-load
{% if ELASTICFLEETMERGED.config.defend_filters.enable_auto_configuration %}
so-elastic-defend-manage-filters-file-watch:
cmd.run:
- name: python3 /sbin/so-elastic-defend-manage-filters.py -c /opt/so/conf/elasticsearch/curl.config -d /opt/so/conf/elastic-fleet/defend-exclusions/disabled-filters.yaml -i /nsm/securityonion-resources/event_filters/ -i /opt/so/conf/elastic-fleet/defend-exclusions/rulesets/custom-filters/ &>> /opt/so/log/elasticfleet/elastic-defend-manage-filters.log
- onchanges:
- file: elasticdefendcustom
- file: elasticdefenddisabled
{% endif %}
{% endif %}
delete_so-elastic-fleet_so-status.disabled:
file.uncomment:
- name: /opt/so/conf/so-status/so-status.conf
+112
View File
@@ -0,0 +1,112 @@
# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
# https://securityonion.net/license; you may not use this file except in compliance with the
# Elastic License 2.0.
{% from 'allowed_states.map.jinja' import allowed_states %}
{% if sls in allowed_states %}
{% from 'elasticfleet/map.jinja' import ELASTICFLEETMERGED %}
include:
- elasticfleet.config
# If enabled, automatically update Fleet Logstash Outputs
{% if ELASTICFLEETMERGED.config.server.enable_auto_configuration and grains.role not in ['so-import', 'so-eval'] %}
so-elastic-fleet-auto-configure-logstash-outputs:
cmd.run:
- name: /usr/sbin/so-elastic-fleet-outputs-update
- retry:
attempts: 4
interval: 30
{# Separate from above in order to catch elasticfleet-logstash.crt changes and force update to fleet output policy #}
so-elastic-fleet-auto-configure-logstash-outputs-force:
cmd.run:
- name: /usr/sbin/so-elastic-fleet-outputs-update --certs
- retry:
attempts: 4
interval: 30
- onchanges:
- x509: etc_elasticfleet_logstash_crt
- x509: elasticfleet_kafka_crt
{% endif %}
# If enabled, automatically update Fleet Server URLs & ES Connection
so-elastic-fleet-auto-configure-server-urls:
cmd.run:
- name: /usr/sbin/so-elastic-fleet-urls-update
- retry:
attempts: 4
interval: 30
# Automatically update Fleet Server Elasticsearch URLs & Agent Artifact URLs
so-elastic-fleet-auto-configure-elasticsearch-urls:
cmd.run:
- name: /usr/sbin/so-elastic-fleet-es-url-update
- retry:
attempts: 4
interval: 30
so-elastic-fleet-auto-configure-artifact-urls:
cmd.run:
- name: /usr/sbin/so-elastic-fleet-artifacts-url-update
- retry:
attempts: 4
interval: 30
so-elastic-fleet-package-statefile:
file.managed:
- name: /opt/so/state/elastic_fleet_packages.txt
- contents: {{ELASTICFLEETMERGED.packages}}
so-elastic-fleet-package-upgrade:
cmd.run:
- name: /usr/sbin/so-elastic-fleet-package-upgrade
- retry:
attempts: 3
interval: 10
- onchanges:
- file: /opt/so/state/elastic_fleet_packages.txt
so-elastic-fleet-integrations:
cmd.run:
- name: /usr/sbin/so-elastic-fleet-integration-policy-load
- retry:
attempts: 3
interval: 10
so-elastic-agent-grid-upgrade:
cmd.run:
- name: /usr/sbin/so-elastic-agent-grid-upgrade
- retry:
attempts: 12
interval: 5
so-elastic-fleet-integration-upgrade:
cmd.run:
- name: /usr/sbin/so-elastic-fleet-integration-upgrade
- retry:
attempts: 3
interval: 10
{# Optional integrations script doesn't need the retries like so-elastic-fleet-integration-upgrade which loads the default integrations #}
so-elastic-fleet-addon-integrations:
cmd.run:
- name: /usr/sbin/so-elastic-fleet-optional-integrations-load
{% if ELASTICFLEETMERGED.config.defend_filters.enable_auto_configuration %}
so-elastic-defend-manage-filters-file-watch:
cmd.run:
- name: python3 /sbin/so-elastic-defend-manage-filters.py -c /opt/so/conf/elasticsearch/curl.config -d /opt/so/conf/elastic-fleet/defend-exclusions/disabled-filters.yaml -i /nsm/securityonion-resources/event_filters/ -i /opt/so/conf/elastic-fleet/defend-exclusions/rulesets/custom-filters/ &>> /opt/so/log/elasticfleet/elastic-defend-manage-filters.log
- onchanges:
- file: elasticdefendcustom
- file: elasticdefenddisabled
{% endif %}
{% else %}
{{sls}}_state_not_allowed:
test.fail_without_changes:
- name: {{sls}}_state_not_allowed
{% endif %}
@@ -5,11 +5,12 @@
# this file except in compliance with the Elastic License 2.0.
. /usr/sbin/so-common
. /usr/sbin/so-elastic-fleet-common
{%- import_yaml 'elasticsearch/defaults.yaml' as ELASTICSEARCHDEFAULTS %}
{%- import_yaml 'elasticfleet/defaults.yaml' as ELASTICFLEETDEFAULTS %}
{# Optionally override Elasticsearch version for Elastic Agent patch releases #}
{%- if ELASTICFLEETDEFAULTS.elasticfleet.patch_version is defined %}
{%- do ELASTICSEARCHDEFAULTS.update({'elasticsearch': {'version': ELASTICFLEETDEFAULTS.elasticfleet.patch_version}}) %}
{%- do ELASTICSEARCHDEFAULTS.elasticsearch.update({'version': ELASTICFLEETDEFAULTS.elasticfleet.patch_version}) %}
{%- endif %}
# Only run on Managers
@@ -19,13 +20,10 @@ if ! is_manager_node; then
fi
# Get current list of Grid Node Agents that need to be upgraded
RAW_JSON=$(curl -K /opt/so/conf/elasticsearch/curl.config -L "http://localhost:5601/api/fleet/agents?perPage=20&page=1&kuery=NOT%20agent.version%3A%20{{ELASTICSEARCHDEFAULTS.elasticsearch.version}}%20AND%20policy_id%3A%20so-grid-nodes_%2A&showInactive=false&getStatusSummary=true" --retry 3 --retry-delay 30 --fail 2>/dev/null)
if ! RAW_JSON=$(fleet_api "agents?perPage=20&page=1&kuery=NOT%20agent.version%3A%20{{ELASTICSEARCHDEFAULTS.elasticsearch.version | urlencode }}%20AND%20policy_id%3A%20so-grid-nodes_%2A&showInactive=false&getStatusSummary=true" -H 'kbn-xsrf: true' -H 'Content-Type: application/json'); then
# Check to make sure that the server responded with good data - else, bail from script
CHECKSUM=$(jq -r '.page' <<< "$RAW_JSON")
if [ "$CHECKSUM" -ne 1 ]; then
printf "Failed to query for current Grid Agents...\n"
exit 1
printf "Failed to query for current Grid Agents...\n"
exit 1
fi
# Generate list of Node Agents that need updates
@@ -36,10 +34,12 @@ if [ "$OUTDATED_LIST" != '[]' ]; then
printf "Initiating upgrades for $AGENTNUMBERS Agents to Elastic {{ELASTICSEARCHDEFAULTS.elasticsearch.version}}...\n\n"
# Generate updated JSON payload
JSON_STRING=$(jq -n --arg ELASTICVERSION {{ELASTICSEARCHDEFAULTS.elasticsearch.version}} --arg UPDATELIST $OUTDATED_LIST '{"version": $ELASTICVERSION,"agents": $UPDATELIST }')
JSON_STRING=$(jq -n --arg ELASTICVERSION "{{ELASTICSEARCHDEFAULTS.elasticsearch.version}}" --argjson UPDATELIST "$OUTDATED_LIST" '{"version": $ELASTICVERSION,"agents": $UPDATELIST }')
# Update Node Agents
curl -K /opt/so/conf/elasticsearch/curl.config -L -X POST "http://localhost:5601/api/fleet/agents/bulk_upgrade" -H 'kbn-xsrf: true' -H 'Content-Type: application/json' -d "$JSON_STRING"
if ! fleet_api "agents/bulk_upgrade" -XPOST -H 'kbn-xsrf: true' -H 'Content-Type: application/json' -d "$JSON_STRING"; then
printf "Failed to initiate Agent upgrades...\n"
fi
else
printf "No Agents need updates... Exiting\n\n"
exit 0
@@ -235,6 +235,16 @@ function update_kafka_outputs() {
{% endif %}
# Compare the current Elastic Fleet certificate against what is on disk
POLICY_CERT_SHA=$(jq -r '.item.ssl.certificate' <<< $RAW_JSON | openssl x509 -noout -sha256 -fingerprint)
DISK_CERT_SHA=$(openssl x509 -in /etc/pki/elasticfleet-logstash.crt -noout -sha256 -fingerprint)
if [[ "$POLICY_CERT_SHA" != "$DISK_CERT_SHA" ]]; then
printf "Certificate on disk doesn't match certificate in policy - forcing update\n"
UPDATE_CERTS=true
FORCE_UPDATE=true
fi
# Sort & hash the new list of Logstash Outputs
NEW_LIST_JSON=$(jq --compact-output --null-input '$ARGS.positional' --args -- "${NEW_LIST[@]}")
NEW_HASH=$(sha256sum <<< "$NEW_LIST_JSON" | awk '{print $1}')
+1 -1
View File
@@ -4,7 +4,7 @@
# Elastic License 2.0.
{% from 'allowed_states.map.jinja' import allowed_states %}
{% if sls.split('.')[0] in allowed_states %}
{% if sls in allowed_states %}
{% from 'vars/globals.map.jinja' import GLOBALS %}
{% from 'elasticsearch/config.map.jinja' import ELASTICSEARCHMERGED %}
{% from 'elasticsearch/template.map.jinja' import ES_INDEX_SETTINGS, SO_MANAGED_INDICES %}
+6 -7
View File
@@ -17,7 +17,7 @@ include:
- elasticsearch.ssl
- elasticsearch.config
- elasticsearch.sostatus
{%- if GLOBALS.role != 'so-searchode' %}
{%- if GLOBALS.role != "so-searchnode" %}
- elasticsearch.cluster
{%- endif%}
@@ -102,11 +102,6 @@ so-elasticsearch:
- cmd: auth_users_roles_inode
- cmd: auth_users_inode
delete_so-elasticsearch_so-status.disabled:
file.uncomment:
- name: /opt/so/conf/so-status/so-status.conf
- regex: ^so-elasticsearch$
wait_for_so-elasticsearch:
http.wait_for_successful_query:
- name: "https://localhost:9200/"
@@ -117,10 +112,14 @@ wait_for_so-elasticsearch:
- status: 200
- wait_for: 300
- request_interval: 15
- backend: requests
- require:
- docker_container: so-elasticsearch
delete_so-elasticsearch_so-status.disabled:
file.uncomment:
- name: /opt/so/conf/so-status/so-status.conf
- regex: ^so-elasticsearch$
{% else %}
{{sls}}_state_not_allowed:
@@ -103,11 +103,13 @@ load_component_templates() {
local pattern="${ELASTICSEARCH_TEMPLATES_DIR}/component/$2"
local append_mappings="${3:-"false"}"
# current state of nullglob shell option
shopt -q nullglob && nullglob_set=1 || nullglob_set=0
shopt -s nullglob
echo -e "\nLoading $printed_name component templates...\n"
if ! compgen -G "${pattern}/*.json" > /dev/null; then
echo "No $printed_name component templates found in ${pattern}, skipping."
return
fi
for component in "$pattern"/*.json; do
tmpl_name=$(basename "${component%.json}")
@@ -121,11 +123,6 @@ load_component_templates() {
SO_LOAD_FAILURES_NAMES+=("$component")
fi
done
# restore nullglob shell option if needed
if [[ $nullglob_set -eq 1 ]]; then
shopt -u nullglob
fi
}
check_elasticsearch_responsive() {
@@ -136,7 +133,32 @@ check_elasticsearch_responsive() {
fail "Elasticsearch is not responding. Please review Elasticsearch logs /opt/so/log/elasticsearch/securityonion.log for more details. Additionally, consider running so-elasticsearch-troubleshoot."
}
if [[ "$FORCE" == "true" || ! -f "$SO_STATEFILE_SUCCESS" ]]; then
index_templates_exist() {
local templates_dir="$1"
if [[ ! -d "$templates_dir" ]]; then
return 1
fi
compgen -G "${templates_dir}/*.json" > /dev/null
}
should_load_addon_templates() {
if [[ "$IS_HEAVYNODE" == "true" ]]; then
return 1
fi
# Skip statefile checks when forcing template load
if [[ "$FORCE" != "true" ]]; then
if [[ ! -f "$SO_STATEFILE_SUCCESS" || -f "$ADDON_STATEFILE_SUCCESS" ]]; then
return 1
fi
fi
index_templates_exist "$ADDON_TEMPLATES_DIR"
}
if [[ "$FORCE" == "true" || ! -f "$SO_STATEFILE_SUCCESS" ]] && index_templates_exist "$SO_TEMPLATES_DIR"; then
check_elasticsearch_responsive
if [[ "$IS_HEAVYNODE" == "false" ]]; then
@@ -201,13 +223,14 @@ if [[ "$FORCE" == "true" || ! -f "$SO_STATEFILE_SUCCESS" ]]; then
fail "Failed to load all Security Onion core templates successfully."
fi
fi
else
elif ! index_templates_exist "$SO_TEMPLATES_DIR"; then
echo "No Security Onion core index templates found in ${SO_TEMPLATES_DIR}, skipping."
elif [[ -f "$SO_STATEFILE_SUCCESS" ]]; then
echo "Security Onion core templates already loaded"
fi
# Start loading addon templates
if [[ (-d "$ADDON_TEMPLATES_DIR" && -f "$SO_STATEFILE_SUCCESS" && "$IS_HEAVYNODE" == "false" && ! -f "$ADDON_STATEFILE_SUCCESS") || (-d "$ADDON_TEMPLATES_DIR" && "$IS_HEAVYNODE" == "false" && "$FORCE" == "true") ]]; then
if should_load_addon_templates; then
check_elasticsearch_responsive
-1
View File
@@ -59,5 +59,4 @@ global:
description: Allows use of Endgame with Security Onion. This feature requires a license from Endgame.
global: True
advanced: True
helpLink: influxdb
+1 -1
View File
@@ -22,7 +22,7 @@ kibana:
- default
- file
migrations:
discardCorruptObjects: "8.18.8"
discardCorruptObjects: "9.3.3"
telemetry:
enabled: False
xpack:
+1 -1
View File
@@ -3,8 +3,8 @@ kratos:
description: Enables or disables the Kratos authentication system. WARNING - Disabling this process will cause the grid to malfunction. Re-enabling this setting will require manual effort via SSH.
forcedType: bool
advanced: True
readonly: True
helpLink: kratos
oidc:
enabled:
description: Set to True to enable OIDC / Single Sign-On (SSO) to SOC. Requires a valid Security Onion license key.
+46 -1
View File
@@ -273,7 +273,7 @@ function deleteMinionFiles () {
log "ERROR" "Failed to delete $PILLARFILE"
return 1
fi
rm -f $ADVPILLARFILE
if [ $? -ne 0 ]; then
log "ERROR" "Failed to delete $ADVPILLARFILE"
@@ -281,6 +281,39 @@ function deleteMinionFiles () {
fi
}
# Remove this minion's postgres Telegraf credential from the shared creds
# pillar and drop the matching role in Postgres. Always returns 0 so a dead
# or unreachable so-postgres doesn't block minion deletion — in that case we
# log a warning and leave the role behind for manual cleanup.
function remove_postgres_telegraf_from_minion() {
local MINION_SAFE
MINION_SAFE=$(echo "$MINION_ID" | tr '.-' '__' | tr '[:upper:]' '[:lower:]')
local PG_USER="so_telegraf_${MINION_SAFE}"
log "INFO" "Removing postgres telegraf cred for $MINION_ID"
so-telegraf-cred remove "$MINION_ID" >/dev/null 2>&1 || true
if docker ps --format '{{.Names}}' 2>/dev/null | grep -q '^so-postgres$'; then
if ! docker exec -i so-postgres psql -v ON_ERROR_STOP=1 -U postgres -d so_telegraf >/dev/null 2>&1 <<EOSQL
DO \$\$
BEGIN
IF EXISTS (SELECT FROM pg_catalog.pg_roles WHERE rolname = '$PG_USER') THEN
EXECUTE format('REASSIGN OWNED BY %I TO so_telegraf', '$PG_USER');
EXECUTE format('DROP OWNED BY %I', '$PG_USER');
EXECUTE format('DROP ROLE %I', '$PG_USER');
END IF;
END
\$\$;
EOSQL
then
log "WARN" "Failed to drop postgres role $PG_USER; pillar entry was removed — drop manually if the role persists"
fi
else
log "WARN" "so-postgres container is not running; skipping DB role cleanup for $PG_USER"
fi
}
# Create the minion file
function ensure_socore_ownership() {
log "INFO" "Setting socore ownership on minion files"
@@ -542,6 +575,17 @@ function add_telegraf_to_minion() {
log "ERROR" "Failed to add telegraf configuration to $PILLARFILE"
return 1
fi
# Provision the per-minion postgres Telegraf credential in the shared
# telegraf/creds.sls pillar. so-telegraf-cred is the only writer; it
# generates a password on first add and is a no-op on re-add so the cred
# is stable across repeated so-minion runs. postgres.telegraf_users on the
# manager creates/updates the DB role from the same pillar.
so-telegraf-cred add "$MINION_ID"
if [ $? -ne 0 ]; then
log "ERROR" "Failed to provision postgres telegraf cred for $MINION_ID"
return 1
fi
}
function add_influxdb_to_minion() {
@@ -1069,6 +1113,7 @@ case "$OPERATION" in
"delete")
log "INFO" "Removing minion $MINION_ID"
remove_postgres_telegraf_from_minion
deleteMinionFiles || {
log "ERROR" "Failed to delete minion files for $MINION_ID"
exit 1
+54
View File
@@ -0,0 +1,54 @@
#!/bin/bash
# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
# https://securityonion.net/license; you may not use this file except in compliance with the
# Elastic License 2.0.
# Single writer for the Telegraf Postgres credentials pillar. Thin wrapper
# around so-yaml.py that generates a password on first add and no-ops on
# re-add so the cred is stable across repeated so-minion runs.
#
# Note: so-yaml.py splits keys on '.' with no escape. SO minion ids are
# dot-free by construction (setup/so-functions:1884 takes the short_name
# before the first '.'), so using the raw minion id as the key is safe.
CREDS=/opt/so/saltstack/local/pillar/telegraf/creds.sls
usage() {
echo "Usage: $0 <add|remove> <minion_id>" >&2
exit 2
}
seed_creds_file() {
mkdir -p "$(dirname "$CREDS")" || return 1
if [[ ! -f "$CREDS" ]]; then
(umask 027 && printf 'telegraf:\n postgres_creds: {}\n' > "$CREDS") || return 1
chown socore:socore "$CREDS" 2>/dev/null || true
chmod 640 "$CREDS" || return 1
fi
}
OP=$1
MID=$2
[[ -z "$OP" || -z "$MID" ]] && usage
case "$OP" in
add)
SAFE=$(echo "$MID" | tr '.-' '__' | tr '[:upper:]' '[:lower:]')
seed_creds_file || exit 1
if so-yaml.py get -r "$CREDS" "telegraf.postgres_creds.${MID}.user" >/dev/null 2>&1; then
exit 0
fi
PASS=$(tr -dc 'A-Za-z0-9~!@#^&*()_=+[]|;:,.<>?-' < /dev/urandom | head -c 72)
so-yaml.py replace "$CREDS" "telegraf.postgres_creds.${MID}.user" "so_telegraf_${SAFE}" >/dev/null
so-yaml.py replace "$CREDS" "telegraf.postgres_creds.${MID}.pass" "$PASS" >/dev/null
;;
remove)
[[ -f "$CREDS" ]] || exit 0
so-yaml.py remove "$CREDS" "telegraf.postgres_creds.${MID}" >/dev/null 2>&1 || true
;;
*)
usage
;;
esac
+10 -3
View File
@@ -39,9 +39,16 @@ def showUsage(args):
def loadYaml(filename):
file = open(filename, "r")
content = file.read()
return yaml.safe_load(content)
try:
with open(filename, "r") as file:
content = file.read()
return yaml.safe_load(content)
except FileNotFoundError:
print(f"File not found: {filename}", file=sys.stderr)
sys.exit(1)
except Exception as e:
print(f"Error reading file {filename}: {e}", file=sys.stderr)
sys.exit(1)
def writeYaml(filename, content):
+18
View File
@@ -973,3 +973,21 @@ class TestReplaceListObject(unittest.TestCase):
expected = "key1:\n- id: '1'\n status: updated\n- id: '2'\n status: inactive\n"
self.assertEqual(actual, expected)
class TestLoadYaml(unittest.TestCase):
def test_load_yaml_missing_file(self):
with patch('sys.exit', new=MagicMock()) as sysmock:
with patch('sys.stderr', new=StringIO()) as mock_stderr:
soyaml.loadYaml("/tmp/so-yaml_test-does-not-exist.yaml")
sysmock.assert_called_with(1)
self.assertIn("File not found:", mock_stderr.getvalue())
def test_load_yaml_read_error(self):
with patch('sys.exit', new=MagicMock()) as sysmock:
with patch('sys.stderr', new=StringIO()) as mock_stderr:
with patch('builtins.open', side_effect=PermissionError("denied")):
soyaml.loadYaml("/tmp/so-yaml_test-unreadable.yaml")
sysmock.assert_called_with(1)
self.assertIn("Error reading file", mock_stderr.getvalue())
+62 -34
View File
@@ -24,6 +24,14 @@ BACKUPTOPFILE=/opt/so/saltstack/default/salt/top.sls.backup
SALTUPGRADED=false
SALT_CLOUD_INSTALLED=false
SALT_CLOUD_CONFIGURED=false
# Check if salt-cloud is installed
if rpm -q salt-cloud &>/dev/null; then
SALT_CLOUD_INSTALLED=true
fi
# Check if salt-cloud is configured
if [[ -f /etc/salt/cloud.profiles.d/socloud.conf ]]; then
SALT_CLOUD_CONFIGURED=true
fi
# used to display messages to the user at the end of soup
declare -a FINAL_MESSAGE_QUEUE=()
@@ -477,7 +485,44 @@ elasticsearch_backup_index_templates() {
tar -czf /nsm/backup/3.0.0_elasticsearch_index_templates.tar.gz -C /opt/so/conf/elasticsearch/templates/index/ .
}
ensure_postgres_local_pillar() {
# Postgres was added as a service after 3.0.0, so the new pillar/top.sls
# references postgres.soc_postgres / postgres.adv_postgres unconditionally.
# Managers upgrading from 3.0.0 have no /opt/so/saltstack/local/pillar/postgres/
# (make_some_dirs only runs at install time), so the stubs must be created
# here before salt-master restarts against the new top.sls.
echo "Ensuring postgres local pillar stubs exist."
local dir=/opt/so/saltstack/local/pillar/postgres
mkdir -p "$dir"
[[ -f "$dir/soc_postgres.sls" ]] || touch "$dir/soc_postgres.sls"
[[ -f "$dir/adv_postgres.sls" ]] || touch "$dir/adv_postgres.sls"
chown -R socore:socore "$dir"
}
ensure_postgres_secret() {
# On a fresh install, generate_passwords + secrets_pillar seed
# secrets:postgres_pass in /opt/so/saltstack/local/pillar/secrets.sls. That
# code path is skipped on upgrade (secrets.sls already exists from 3.0.0
# with import_pass/influx_pass but no postgres_pass), so the postgres
# container's POSTGRES_PASSWORD_FILE and SOC's PG_ADMIN_PASS would be empty
# after highstate. Generate one now if missing.
local secrets_file=/opt/so/saltstack/local/pillar/secrets.sls
if [[ ! -f "$secrets_file" ]]; then
echo "WARNING: $secrets_file missing; skipping postgres_pass backfill."
return 0
fi
if so-yaml.py get -r "$secrets_file" secrets.postgres_pass >/dev/null 2>&1; then
echo "secrets.postgres_pass already set; leaving as-is."
return 0
fi
echo "Seeding secrets.postgres_pass in $secrets_file."
so-yaml.py add "$secrets_file" secrets.postgres_pass "$(get_random_value)"
chown socore:socore "$secrets_file"
}
up_to_3.1.0() {
ensure_postgres_local_pillar
ensure_postgres_secret
determine_elastic_agent_upgrade
elasticsearch_backup_index_templates
# Clear existing component template state file.
@@ -489,33 +534,25 @@ up_to_3.1.0() {
post_to_3.1.0() {
/usr/sbin/so-kibana-space-defaults
# One-time backfill for minions that existed before the postgres Telegraf
# feature shipped. Generate the aggregate pillar on the manager and create
# the per-minion DB roles, then fan each minion's cred into its own pillar
# file. Going forward the reactor handles each new salt-key accept with a
# targeted fan-out, so a manager highstate no longer needs to iterate.
echo "Provisioning Telegraf Postgres users for existing minions."
salt-call --local state.apply postgres.auth,postgres.telegraf_users queue=True || true
AGGREGATE_PILLAR=/opt/so/saltstack/local/pillar/postgres/auth.sls
MINIONS_DIR=/opt/so/saltstack/local/pillar/minions
if [[ -f "$AGGREGATE_PILLAR" && -d "$MINIONS_DIR" ]]; then
for pillar_file in "$MINIONS_DIR"/*.sls; do
[[ -f "$pillar_file" ]] || continue
mid=$(basename "$pillar_file" .sls)
[[ "$mid" == adv_* ]] && continue
safe=$(echo "$mid" | tr '.-' '__' | tr '[:upper:]' '[:lower:]')
existing_user=$(so-yaml.py get -r "$pillar_file" postgres.telegraf.user 2>/dev/null || true)
[[ "$existing_user" == "so_telegraf_${safe}" ]] && continue
user=$(so-yaml.py get -r "$AGGREGATE_PILLAR" "postgres.auth.users.telegraf_${safe}.user" 2>/dev/null || true)
pass=$(so-yaml.py get -r "$AGGREGATE_PILLAR" "postgres.auth.users.telegraf_${safe}.pass" 2>/dev/null || true)
[[ -z "$user" || -z "$pass" ]] && continue
so-yaml.py replace "$pillar_file" postgres.telegraf.user "$user" >/dev/null
so-yaml.py replace "$pillar_file" postgres.telegraf.pass "$pass" >/dev/null
done
# ensure manager has new version of socloud.conf
if [[ $SALT_CLOUD_CONFIGURED == true ]]; then
salt-call state.apply salt.cloud.config concurrent=True
fi
# Backfill the Telegraf creds pillar for every accepted minion. so-telegraf-cred
# add is idempotent — it no-ops when an entry already exists — so this is safe
# to run on every soup. The subsequent state.apply creates/updates the matching
# Postgres roles from the reconciled pillar.
echo "Reconciling Telegraf Postgres creds for accepted minions."
for mid in $(salt-key --out=json --list=accepted 2>/dev/null | jq -r '.minions[]?' 2>/dev/null); do
[[ -n "$mid" ]] || continue
/usr/sbin/so-telegraf-cred add "$mid" || echo " warning: so-telegraf-cred add $mid failed" >&2
done
# Run through the master (not --local) so state compilation uses the
# master's configured file_roots; the manager's /etc/salt/minion has no
# file_roots of its own and --local would fail with "No matching sls found".
salt-call state.apply postgres.telegraf_users queue=True || true
POSTVERSION=3.1.0
}
@@ -689,15 +726,6 @@ upgrade_check_salt() {
upgrade_salt() {
echo "Performing upgrade of Salt from $INSTALLEDSALTVERSION to $NEWSALTVERSION."
echo ""
# Check if salt-cloud is installed
if rpm -q salt-cloud &>/dev/null; then
SALT_CLOUD_INSTALLED=true
fi
# Check if salt-cloud is configured
if [[ -f /etc/salt/cloud.profiles.d/socloud.conf ]]; then
SALT_CLOUD_CONFIGURED=true
fi
echo "Removing yum versionlock for Salt."
echo ""
yum versionlock delete "salt"
+25
View File
@@ -25,8 +25,33 @@ manager_run_es_soc:
- salt: {{NEWNODE}}_update_mine
{% endif %}
# so-minion has already added the new minion's entry to telegraf/creds.sls
# via so-telegraf-cred before this orch fires. Reconcile the Postgres role
# on the manager so the new minion can authenticate on its first highstate,
# then refresh the minion's pillar so its telegraf.conf renders with the
# freshly-written cred.
manager_create_postgres_telegraf_role:
salt.state:
- tgt: {{ MANAGER }}
- sls:
- postgres.telegraf_users
- queue: True
- require:
- salt: {{NEWNODE}}_update_mine
{{NEWNODE}}_refresh_pillar:
salt.function:
- name: saltutil.refresh_pillar
- tgt: {{ NEWNODE }}
- kwarg:
wait: True
- require:
- salt: manager_create_postgres_telegraf_role
{{NEWNODE}}_run_highstate:
salt.state:
- tgt: {{ NEWNODE }}
- highstate: True
- queue: True
- require:
- salt: {{NEWNODE}}_refresh_pillar
-28
View File
@@ -1,28 +0,0 @@
# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
# https://securityonion.net/license; you may not use this file except in compliance with the
# Elastic License 2.0.
# Fired by salt/reactor/telegraf_user_sync.sls when salt-key accepts a new
# minion. Only provisions the per-minion pillar entry and DB role on the
# manager; the minion itself will pick up its telegraf config on its first
# highstate during onboarding, so there's no need to push the telegraf state
# from here.
#
# Target the manager via role grains — same pattern as orch/delete_hypervisor.sls.
# The reactor doesn't know the manager's minion id, and grains.master on the
# runner is a hostname, not a targetable id.
{% set FANOUT_MINION = salt['pillar.get']('postgres_fanout_minion', '') %}
manager_sync_telegraf_pg_users:
salt.state:
- tgt: 'G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone or G@role:so-eval'
- tgt_type: compound
- sls:
- postgres.auth
- postgres.telegraf_users
- queue: True
{% if FANOUT_MINION %}
- pillar:
postgres_fanout_minion: {{ FANOUT_MINION }}
{% endif %}
+2 -68
View File
@@ -13,24 +13,8 @@
{% set CHARS = DIGITS~LOWERCASE~UPPERCASE~SYMBOLS %}
{% set so_postgres_user_pass = salt['pillar.get']('postgres:auth:users:so_postgres_user:pass', salt['random.get_str'](72, chars=CHARS)) %}
{# Per-minion Telegraf Postgres credentials. Merge currently-up minions with any #}
{# previously-known entries in pillar so existing passwords persist across runs. #}
{% set existing = salt['pillar.get']('postgres:auth:users', {}) %}
{% set up_minions = salt['saltutil.runner']('manage.up') or [] %}
{% set telegraf_users = {} %}
{% for key, entry in existing.items() %}
{%- if key.startswith('telegraf_') and entry.get('user') and entry.get('pass') %}
{%- do telegraf_users.update({key: entry}) %}
{%- endif %}
{% endfor %}
{% for mid in up_minions %}
{%- set safe = mid | replace('.','_') | replace('-','_') | lower %}
{%- set key = 'telegraf_' ~ safe %}
{%- if key not in telegraf_users %}
{%- do telegraf_users.update({key: {'user': 'so_telegraf_' ~ safe, 'pass': salt['random.get_str'](72, chars=CHARS)}}) %}
{%- endif %}
{% endfor %}
# Admin cred only. Per-minion Telegraf creds live in telegraf/creds.sls,
# managed by /usr/sbin/so-telegraf-cred (called from so-minion).
postgres_auth_pillar:
file.managed:
- name: /opt/so/saltstack/local/pillar/postgres/auth.sls
@@ -43,57 +27,7 @@ postgres_auth_pillar:
so_postgres_user:
user: so_postgres
pass: "{{ so_postgres_user_pass }}"
{% for key, entry in telegraf_users.items() %}
{{ key }}:
user: {{ entry.user }}
pass: "{{ entry.pass }}"
{% endfor %}
- show_changes: False
{# Fan a specific minion's telegraf cred out to its own pillar file.
Two triggers populate the target list:
- grains.id (always) so the manager's own pillar is populated on every
postgres.auth run — otherwise the manager's telegraf has no cred on
a fresh install and can't write to its own postgres.
- pillar postgres_fanout_minion (when the reactor fires on a new
minion's salt-key accept).
The `unless` guard keeps re-runs idempotent, so this is one so-yaml.py
check per target, not per minion in the grid. Bulk backfill for
already-accepted minions lives in soup. #}
{% set fanout_targets = [] %}
{% if grains.id %}
{%- do fanout_targets.append(grains.id) %}
{% endif %}
{% set fanout_mid = salt['pillar.get']('postgres_fanout_minion') %}
{% if fanout_mid and fanout_mid not in fanout_targets %}
{%- do fanout_targets.append(fanout_mid) %}
{% endif %}
{% for mid in fanout_targets %}
{%- set safe = mid | replace('.','_') | replace('-','_') | lower %}
{%- set key = 'telegraf_' ~ safe %}
{%- set entry = telegraf_users.get(key) %}
{%- if entry %}
postgres_telegraf_minion_pillar_{{ safe }}:
cmd.run:
- name: |
set -e
PILLAR_FILE=/opt/so/saltstack/local/pillar/minions/{{ mid }}.sls
if [ ! -f "$PILLAR_FILE" ]; then
echo '{}' > "$PILLAR_FILE"
chown socore:socore "$PILLAR_FILE" 2>/dev/null || true
chmod 640 "$PILLAR_FILE"
fi
/usr/sbin/so-yaml.py replace "$PILLAR_FILE" postgres.telegraf.user '{{ entry.user }}'
/usr/sbin/so-yaml.py replace "$PILLAR_FILE" postgres.telegraf.pass '{{ entry.pass }}'
- unless: |
[ "$(/usr/sbin/so-yaml.py get -r /opt/so/saltstack/local/pillar/minions/{{ mid }}.sls postgres.telegraf.user 2>/dev/null)" = '{{ entry.user }}' ]
- require:
- file: postgres_auth_pillar
{%- endif %}
{% endfor %}
{% else %}
{{sls}}_state_not_allowed:
+4 -4
View File
@@ -10,7 +10,7 @@
{# postgres_wait_ready below requires `docker_container: so-postgres`, which is
declared in postgres.enabled. Include it here so state.apply postgres.telegraf_users
on its own (from the reactor orch or from soup) still has that ID in scope. Salt
on its own (e.g. from orch.deploy_newnode) still has that ID in scope. Salt
de-duplicates the circular include. #}
include:
- postgres.enabled
@@ -96,9 +96,9 @@ postgres_telegraf_group_role:
- require:
- cmd: postgres_create_telegraf_db
{% set users = salt['pillar.get']('postgres:auth:users', {}) %}
{% for key, entry in users.items() %}
{% if key.startswith('telegraf_') and entry.get('user') and entry.get('pass') %}
{% set creds = salt['pillar.get']('telegraf:postgres_creds', {}) %}
{% for mid, entry in creds.items() %}
{% if entry.get('user') and entry.get('pass') %}
{% set u = entry.user %}
{% set p = entry.pass | replace("'", "''") %}
+53 -18
View File
@@ -6,39 +6,74 @@
# Elastic License 2.0.
import logging
from subprocess import call
import yaml
import os
import re
import shlex
import subprocess
log = logging.getLogger(__name__)
SO_MINION = '/usr/sbin/so-minion'
_NODETYPE_RE = re.compile(r'^[A-Z][A-Z0-9_]{0,31}$')
_MINIONID_RE = re.compile(r'^[A-Za-z0-9._-]{1,253}$')
_HOSTPART_RE = re.compile(r'^[A-Za-z0-9._-]{1,253}$')
_IPV4_RE = re.compile(
r'^(?:(?:25[0-5]|2[0-4]\d|[01]?\d?\d)\.){3}'
r'(?:25[0-5]|2[0-4]\d|[01]?\d?\d)$'
)
_HEAP_RE = re.compile(r'^\d{1,6}[kKmMgG]?$')
def _check(name, value, pattern):
s = str(value)
if not pattern.match(s):
raise ValueError("sominion_setup_reactor: refusing unsafe %s=%r" % (name, value))
return s
def run():
log.info('sominion_setup_reactor: Running')
minionid = data['id']
DATA = data['data']
hv_name = DATA['HYPERVISOR_HOST']
log.info('sominion_setup_reactor: DATA: %s' % DATA)
# Build the base command
cmd = "NODETYPE=" + DATA['NODETYPE'] + " /usr/sbin/so-minion -o=addVM -m=" + minionid + " -n=" + DATA['MNIC'] + " -i=" + DATA['MAINIP'] + " -c=" + str(DATA['CPUCORES']) + " -d='" + DATA['NODE_DESCRIPTION'] + "'"
# Add optional arguments only if they exist in DATA
nodetype = _check('NODETYPE', DATA['NODETYPE'], _NODETYPE_RE)
argv = [
SO_MINION,
'-o=addVM',
'-m=' + _check('minionid', minionid, _MINIONID_RE),
'-n=' + _check('MNIC', DATA['MNIC'], _HOSTPART_RE),
'-i=' + _check('MAINIP', DATA['MAINIP'], _IPV4_RE),
'-c=' + str(int(DATA['CPUCORES'])),
'-d=' + str(DATA['NODE_DESCRIPTION']),
]
if 'CORECOUNT' in DATA:
cmd += " -C=" + str(DATA['CORECOUNT'])
argv.append('-C=' + str(int(DATA['CORECOUNT'])))
if 'INTERFACE' in DATA:
cmd += " -a=" + DATA['INTERFACE']
argv.append('-a=' + _check('INTERFACE', DATA['INTERFACE'], _HOSTPART_RE))
if 'ES_HEAP_SIZE' in DATA:
cmd += " -e=" + DATA['ES_HEAP_SIZE']
argv.append('-e=' + _check('ES_HEAP_SIZE', DATA['ES_HEAP_SIZE'], _HEAP_RE))
if 'LS_HEAP_SIZE' in DATA:
cmd += " -l=" + DATA['LS_HEAP_SIZE']
argv.append('-l=' + _check('LS_HEAP_SIZE', DATA['LS_HEAP_SIZE'], _HEAP_RE))
if 'LSHOSTNAME' in DATA:
cmd += " -L=" + DATA['LSHOSTNAME']
log.info('sominion_setup_reactor: Command: %s' % cmd)
rc = call(cmd, shell=True)
argv.append('-L=' + _check('LSHOSTNAME', DATA['LSHOSTNAME'], _HOSTPART_RE))
env = os.environ.copy()
env['NODETYPE'] = nodetype
log.info(
'sominion_setup_reactor: argv: %s (NODETYPE=%s)',
' '.join(shlex.quote(a) for a in argv),
shlex.quote(nodetype),
)
rc = subprocess.call(argv, shell=False, env=env)
log.info('sominion_setup_reactor: rc: %s' % rc)
-18
View File
@@ -1,18 +0,0 @@
# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
# https://securityonion.net/license; you may not use this file except in compliance with the
# Elastic License 2.0.
{# Fires on salt/key. Only act on successful key acceptance — not reauth. #}
{% if data.get('act') == 'accept' and data.get('result') == True and data.get('id') %}
{{ data['id'] }}_telegraf_pg_sync:
runner.state.orchestrate:
- args:
- mods: orch.telegraf_postgres_sync
- pillar:
postgres_fanout_minion: {{ data['id'] }}
{% do salt.log.info('telegraf_user_sync reactor: syncing telegraf PG user for minion %s' % data['id']) %}
{% endif %}
@@ -27,6 +27,7 @@ sool9_{{host}}:
log_file: /opt/so/log/salt/minion
grains:
hypervisor_host: {{host ~ "_" ~ role}}
sosmodel: HVGUEST
preflight_cmds:
- |
{%- set hostnames = [MANAGERHOSTNAME] %}
-13
View File
@@ -62,19 +62,6 @@ engines_config:
- name: /etc/salt/master.d/engines.conf
- source: salt://salt/files/engines.conf
reactor_config_telegraf:
file.managed:
- name: /etc/salt/master.d/reactor_telegraf.conf
- contents: |
reactor:
- 'salt/key':
- /opt/so/saltstack/default/salt/reactor/telegraf_user_sync.sls
- user: root
- group: root
- mode: 644
- watch_in:
- service: salt_master_service
# update the bootstrap script when used for salt-cloud
salt_bootstrap_cloud:
file.managed:
-5
View File
@@ -24,11 +24,6 @@
{% do SOCDEFAULTS.soc.config.server.modules.elastic.update({'username': GLOBALS.elasticsearch.auth.users.so_elastic_user.user, 'password': GLOBALS.elasticsearch.auth.users.so_elastic_user.pass}) %}
{% if GLOBALS.postgres is defined and GLOBALS.postgres.auth is defined %}
{% set PG_ADMIN_PASS = salt['pillar.get']('secrets:postgres_pass', '') %}
{% do SOCDEFAULTS.soc.config.server.modules.update({'postgres': {'hostUrl': GLOBALS.manager_ip, 'port': 5432, 'username': GLOBALS.postgres.auth.users.so_postgres_user.user, 'password': GLOBALS.postgres.auth.users.so_postgres_user.pass, 'adminUser': 'postgres', 'adminPassword': PG_ADMIN_PASS, 'dbname': 'securityonion', 'sslMode': 'require', 'assistantEnabled': true, 'esHostUrl': 'https://' ~ GLOBALS.manager_ip ~ ':9200', 'esUsername': GLOBALS.elasticsearch.auth.users.so_elastic_user.user, 'esPassword': GLOBALS.elasticsearch.auth.users.so_elastic_user.pass, 'esVerifyCert': false}}) %}
{% endif %}
{% do SOCDEFAULTS.soc.config.server.modules.influxdb.update({'hostUrl': 'https://' ~ GLOBALS.influxdb_host ~ ':8086'}) %}
{% do SOCDEFAULTS.soc.config.server.modules.influxdb.update({'token': INFLUXDB_TOKEN}) %}
{% for tool in SOCDEFAULTS.soc.config.server.client.tools %}
+5
View File
@@ -3,6 +3,7 @@ soc:
description: Enables or disables SOC. WARNING - Disabling this setting is unsupported and will cause the grid to malfunction. Re-enabling this setting is a manual effort via SSH.
forcedType: bool
advanced: True
readonly: True
telemetryEnabled:
title: SOC Telemetry
description: When this setting is enabled and the grid is not in airgap mode, SOC will provide feature usage data to the Security Onion development team via Google Analytics. This data helps Security Onion developers determine which product features are being used and can also provide insight into improving the user interface. When changing this setting, wait for the grid to fully synchronize and then perform a hard browser refresh on SOC, to force the browser cache to update and reflect the new setting.
@@ -890,12 +891,16 @@ soc:
suricata:
description: The template used when creating a new Suricata detection. [publicId] will be replaced with an unused Public Id.
multiline: True
forcedType: string
strelka:
description: The template used when creating a new Strelka detection.
multiline: True
forcedType: string
elastalert:
description: The template used when creating a new ElastAlert detection. [publicId] will be replaced with an unused Public Id.
multiline: True
forcedType: string
grid:
maxUploadSize:
description: The maximum number of bytes for an uploaded PCAP import file.
+6 -6
View File
@@ -10,12 +10,12 @@
{%- set LOGSTASH_ENABLED = LOGSTASH_MERGED.enabled %}
{%- set TG_OUT = TELEGRAFMERGED.output | upper %}
{%- set PG_HOST = GLOBALS.manager_ip %}
{#- Per-minion telegraf creds are written into the minion's own pillar file
(/opt/so/saltstack/local/pillar/minions/<id>.sls) by postgres.auth on the
manager. Each minion only sees its own password — the aggregate map in
postgres:auth:users is manager-scoped. #}
{%- set PG_USER = salt['pillar.get']('postgres:telegraf:user', '') %}
{%- set PG_PASS = salt['pillar.get']('postgres:telegraf:pass', '') %}
{#- Per-minion telegraf creds live in the grid-wide telegraf/creds.sls pillar,
written by /usr/sbin/so-telegraf-cred on the manager. Each minion looks up
its own entry by grains.id. #}
{%- set PG_ENTRY = salt['pillar.get']('telegraf:postgres_creds:' ~ grains.id, {}) %}
{%- set PG_USER = PG_ENTRY.get('user', '') %}
{%- set PG_PASS = PG_ENTRY.get('pass', '') %}
# Global tags can be specified here in key="value" format.
[global_tags]
role = "{{ GLOBALS.role.split('-') | last }}"
+44 -32
View File
@@ -202,10 +202,10 @@ check_service_status() {
systemctl status $service_name > /dev/null 2>&1
local status=$?
if [ $status -gt 0 ]; then
info " $service_name is not running"
info "$service_name is not running"
return 1;
else
info " $service_name is running"
info "$service_name is running"
return 0;
fi
@@ -1549,13 +1549,8 @@ clear_previous_setup_results() {
reinstall_init() {
info "Putting system in state to run setup again"
if [[ $install_type =~ ^(MANAGER|EVAL|MANAGERSEARCH|MANAGERHYPE|STANDALONE|FLEET|IMPORT)$ ]]; then
local salt_services=( "salt-master" "salt-minion" )
else
local salt_services=( "salt-minion" )
fi
local service_retry_count=20
# Always include both services. check_service_status skips units that aren't present.
local salt_services=( "salt-master" "salt-minion" )
{
# remove all of root's cronjobs
@@ -1571,31 +1566,51 @@ reinstall_init() {
salt-call state.apply ca.remove -linfo --local --file-root=../salt
# Kill any salt processes (safely)
# Stop salt services and force-kill any lingering salt processes (including orphans
# from an earlier reinstall attempt where the unit file is gone but processes survive)
# so dnf remove salt can run cleanly
for service in "${salt_services[@]}"; do
# Stop the service in the background so we can exit after a certain amount of time
if check_service_status "$service"; then
systemctl stop "$service" &
info "Stopping $service via systemctl"
systemctl stop "$service"
fi
local pid=$!
local count=0
while check_service_status "$service"; do
if [[ $count -gt $service_retry_count ]]; then
echo "Could not stop $service after 1 minute, exiting setup."
# Stop the systemctl process trying to kill the service, show user a message, then exit setup
kill -9 $pid
fail_setup
fi
sleep 5
((count++))
done
done
# Unconditionally force-kill any remaining salt binaries — these may be orphaned
# from a prior aborted reinstall (no unit file, so systemctl can't see them).
for salt_bin in salt-master salt-minion salt-call salt-cloud; do
if pgrep -f "/usr/bin/${salt_bin}" > /dev/null 2>&1; then
info "Force-killing lingering $salt_bin processes"
pkill -9 -ef "/usr/bin/${salt_bin}" 2>/dev/null
fi
done
# Catch stray `salt` CLI children from saltutil.kill_all_jobs / state.apply invocations
pkill -9 -ef "/usr/bin/python3 /bin/salt" 2>/dev/null
# Give the kernel a moment to reap the killed processes before dnf removes the binaries
local kill_wait=0
while pgrep -f "/usr/bin/salt-" > /dev/null 2>&1; do
if [[ $kill_wait -gt 10 ]]; then
info "Salt processes still present after SIGKILL + 10s wait; proceeding anyway"
pgrep -af "/usr/bin/salt-" | while read -r line; do info " lingering: $line"; done
break
fi
sleep 1
((kill_wait++))
done
# Clear the 'failed' state SIGKILL left on the units before removing the package
systemctl reset-failed salt-master.service salt-minion.service 2>/dev/null || true
# Remove all salt configs
rm -rf /etc/salt/engines/* /etc/salt/grains /etc/salt/master /etc/salt/master.d/* /etc/salt/minion /etc/salt/minion.d/* /etc/salt/pki/* /etc/salt/proxy /etc/salt/proxy.d/* /var/cache/salt/
dnf -y remove salt
rm -rf /etc/salt/ /var/cache/salt/
# Drop systemd's in-memory references to the now-removed units
systemctl daemon-reload
# Uninstall local Elastic Agent, if installed
elastic-agent uninstall -f
if command -v docker &> /dev/null; then
# Stop and remove all so-* containers so files can be changed with more safety
@@ -1619,10 +1634,7 @@ reinstall_init() {
backup_dir /nsm/hydra "$date_string"
backup_dir /nsm/influxdb "$date_string"
# Uninstall local Elastic Agent, if installed
elastic-agent uninstall -f
} >> "$setup_log" 2>&1
} 2>&1 | tee -a "$setup_log"
info "System reinstall init has been completed."
}
+1 -1
View File
@@ -219,7 +219,7 @@ if [ -n "$test_profile" ]; then
WEBUSER=onionuser@somewhere.invalid
WEBPASSWD1=0n10nus3r
WEBPASSWD2=0n10nus3r
NODE_DESCRIPTION="${HOSTNAME} - ${install_type} - ${MAINIP}"
NODE_DESCRIPTION="${HOSTNAME} - ${install_type} - ${MSRVIP_OFFSET}"
update_sudoers_for_testing
fi