drop postgres module from soc defaults injection

The soc binary on 3/dev does not register a postgres module, so injecting postgres into soc.config.server.modules makes soc abort at launch with 'Module does not exist: postgres'. The soc-side module is staged on feature/postgres but is not landing this release. Drop the injection until the module ships; salt/postgres state and pillars are unchanged.
Merge pull request #15838 from Security-Onion-Solutions/fix/docker-refresh-multiarch-pull
2026-05-06 19:38:51 +02:00 · 2026-04-28 15:46:56 -04:00 · 2026-04-28 15:14:27 -04:00 · 2026-04-28 14:54:25 -04:00 · 2026-04-28 14:49:02 -04:00 · 2026-04-28 13:47:55 -05:00
34 changed files with 560 additions and 375 deletions
@@ -0,0 +1,12 @@
+# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
+# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
+# https://securityonion.net/license; you may not use this file except in compliance with the
+# Elastic License 2.0.
+
+# Per-minion Telegraf Postgres credentials. so-telegraf-cred on the manager is
+# the single writer; it mutates /opt/so/saltstack/local/pillar/telegraf/creds.sls
+# under flock. Pillar_roots order (local before default) means the populated
+# copy shadows this default on any real grid; this file exists so the pillar
+# key is always defined on fresh installs and when no minions have creds yet.
+telegraf:
+  postgres_creds: {}
@@ -17,6 +17,7 @@ base:
    - sensoroni.adv_sensoroni
    - telegraf.soc_telegraf
    - telegraf.adv_telegraf
+    - telegraf.creds
    - versionlock.soc_versionlock
    - versionlock.adv_versionlock
    - soc.license
@@ -35,6 +35,8 @@
    'kratos',
    'hydra',
    'elasticfleet',
+    'elasticfleet.manager',
+    'elasticsearch.cluster',
    'elastic-fleet-package-registry',
    'utility'
 ] %}
@@ -79,7 +81,7 @@
        ),
        'so-heavynode': (
            sensor_states +
-            ['elasticagent', 'elasticsearch', 'logstash', 'redis', 'nginx']
+            ['elasticagent', 'elasticsearch', 'elasticsearch.cluster', 'logstash', 'redis', 'nginx']
        ),
        'so-idh': (
            ['idh']
@@ -164,8 +164,8 @@ update_docker_containers() {
    # Pull down the trusted docker image
    run_check_net_err \
    "docker pull $CONTAINER_REGISTRY/$IMAGEREPO/$image" \
-    "Could not pull $image, please ensure connectivity to $CONTAINER_REGISTRY" >> "$LOG_FILE" 2>&1 
-    
+    "Could not pull $image, please ensure connectivity to $CONTAINER_REGISTRY" >> "$LOG_FILE" 2>&1
+
    # Get signature
    run_check_net_err \
    "curl --retry 5 --retry-delay 60 -A '$CURLTYPE/$CURRENTVERSION/$OS/$(uname -r)' $sig_url --output $SIGNPATH/$image.sig" \
@@ -188,8 +188,27 @@ update_docker_containers() {
        if [ -z "$HOSTNAME" ]; then
          HOSTNAME=$(hostname)
        fi
-        docker tag $CONTAINER_REGISTRY/$IMAGEREPO/$image $HOSTNAME:5000/$IMAGEREPO/$image >> "$LOG_FILE" 2>&1 
-        docker push $HOSTNAME:5000/$IMAGEREPO/$image >> "$LOG_FILE" 2>&1 
+        docker tag $CONTAINER_REGISTRY/$IMAGEREPO/$image $HOSTNAME:5000/$IMAGEREPO/$image >> "$LOG_FILE" 2>&1 || {
+          echo "Unable to tag $image" >> "$LOG_FILE" 2>&1
+          exit 1
+        }
+        # Push to the embedded registry via a registry-to-registry copy. Avoids
+        # `docker push`, which on Docker 29.x with the containerd image store
+        # represents freshly-pulled images as an index whose layer content
+        # isn't reachable through the push path. The local `docker tag` above
+        # is preserved so so-image-pull's `:5000` existence check still works.
+        # Pin to the digest already gpg-verified above so we copy exactly the
+        # bytes we approved.
+        local VERIFIED_REF
+        VERIFIED_REF=$(echo "$DOCKERINSPECT" | jq -r ".[0].RepoDigests[] | select(. | contains(\"$CONTAINER_REGISTRY\"))" | head -n 1)
+        if [ -z "$VERIFIED_REF" ] || [ "$VERIFIED_REF" = "null" ]; then
+          echo "Unable to determine verified digest for $image" >> "$LOG_FILE" 2>&1
+          exit 1
+        fi
+        docker buildx imagetools create --tag $HOSTNAME:5000/$IMAGEREPO/$image "$VERIFIED_REF" >> "$LOG_FILE" 2>&1 || {
+          echo "Unable to copy $image to embedded registry" >> "$LOG_FILE" 2>&1
+          exit 1
+        }
      fi
    else
      echo "There is a problem downloading the $image image. Details: " >> "$LOG_FILE" 2>&1 
@@ -227,7 +227,7 @@ if [[ $EXCLUDE_KNOWN_ERRORS == 'Y' ]]; then
    EXCLUDED_ERRORS="$EXCLUDED_ERRORS|from NIC checksum offloading" # zeek reporter.log
    EXCLUDED_ERRORS="$EXCLUDED_ERRORS|marked for removal"           # docker container getting recycled
    EXCLUDED_ERRORS="$EXCLUDED_ERRORS|tcp 127.0.0.1:6791: bind: address already in use" # so-elastic-fleet agent restarting. Seen starting w/ 8.18.8 https://github.com/elastic/kibana/issues/201459
-    EXCLUDED_ERRORS="$EXCLUDED_ERRORS|TransformTask\] \[logs-(tychon|aws_billing|microsoft_defender_endpoint).*user so_kibana lacks the required permissions \[logs-\1" # Known issue with 3 integrations using kibana_system role vs creating unique api creds with proper permissions.
+    EXCLUDED_ERRORS="$EXCLUDED_ERRORS|TransformTask\] \[logs-(tychon|aws_billing|microsoft_defender_endpoint|armis|o365_metrics|microsoft_sentinel|snyk).*user so_kibana lacks the required permissions \[(logs|metrics)-\1" # Known issue with integrations starting transform jobs that are explicitly not allowed to start as a system user. (installed as so_elastic / so_kibana)
    EXCLUDED_ERRORS="$EXCLUDED_ERRORS|manifest unknown"             # appears in so-dockerregistry log for so-tcpreplay following docker upgrade to 29.2.1-1
 fi

@@ -9,7 +9,7 @@

 . /usr/sbin/so-common

-software_raid=("SOSMN" "SOSMN-DE02" "SOSSNNV" "SOSSNNV-DE02" "SOS10k-DE02" "SOS10KNV" "SOS10KNV-DE02" "SOS10KNV-DE02" "SOS2000-DE02" "SOS-GOFAST-LT-DE02" "SOS-GOFAST-MD-DE02" "SOS-GOFAST-HV-DE02")
+software_raid=("SOSMN" "SOSMN-DE02" "SOSSNNV" "SOSSNNV-DE02" "SOS10k-DE02" "SOS10KNV" "SOS10KNV-DE02" "SOS10KNV-DE02" "SOS2000-DE02" "SOS-GOFAST-LT-DE02" "SOS-GOFAST-MD-DE02" "SOS-GOFAST-HV-DE02" "HVGUEST")
 hardware_raid=("SOS1000" "SOS1000F" "SOSSN7200" "SOS5000" "SOS4000")

 {%- if salt['grains.get']('sosmodel', '') %}
@@ -87,6 +87,11 @@ check_boss_raid() {
 }

 check_software_raid() {
+  if [[ ! -f /proc/mdstat ]]; then
+    SWRAID=0
+    return
+  fi
+
  SWRC=$(grep "_" /proc/mdstat)
  if [[ -n $SWRC ]]; then
      # RAID is failed in some way
@@ -107,7 +112,9 @@ if [[ "$is_hwraid" == "true" ]]; then
 fi
 if [[ "$is_softwareraid" == "true" ]]; then
 	check_software_raid
-  check_boss_raid
+  if [ "$model" != "HVGUEST" ]; then
+    check_boss_raid
+  fi
 fi

 sum=$(($SWRAID + $BOSSRAID + $HWRAID))
@@ -17,65 +17,17 @@ include:
  - logstash.ssl
  - elasticfleet.config
  - elasticfleet.sostatus
+{%- if GLOBALS.role != "so-fleet" %}
+  - elasticfleet.manager
+{%- endif %}

-{% if grains.role not in ['so-fleet'] %}
+{% if GLOBALS.role != "so-fleet" %}
 # Wait for Elasticsearch to be ready - no reason to try running Elastic Fleet server if ES is not ready
 wait_for_elasticsearch_elasticfleet:
  cmd.run:
    - name: so-elasticsearch-wait
-{% endif %}
-
-# If enabled, automatically update Fleet Logstash Outputs
-{% if ELASTICFLEETMERGED.config.server.enable_auto_configuration and grains.role not in ['so-import', 'so-eval', 'so-fleet'] %}
-so-elastic-fleet-auto-configure-logstash-outputs:
-  cmd.run:
-    - name: /usr/sbin/so-elastic-fleet-outputs-update
-    - retry:
-        attempts: 4
-        interval: 30
-
-{# Separate from above in order to catch elasticfleet-logstash.crt changes and force update to fleet output policy #}
-so-elastic-fleet-auto-configure-logstash-outputs-force:
-  cmd.run:
-    - name: /usr/sbin/so-elastic-fleet-outputs-update --certs
-    - retry:
-        attempts: 4
-        interval: 30
-    - onchanges:
-        - x509: etc_elasticfleet_logstash_crt
-        - x509: elasticfleet_kafka_crt
-{% endif %}
-
-# If enabled, automatically update Fleet Server URLs & ES Connection
-{% if ELASTICFLEETMERGED.config.server.enable_auto_configuration and grains.role not in ['so-fleet'] %}
-so-elastic-fleet-auto-configure-server-urls:
-  cmd.run:
-    - name: /usr/sbin/so-elastic-fleet-urls-update
-    - retry:
-        attempts: 4
-        interval: 30
-{% endif %}
-
-# Automatically update Fleet Server Elasticsearch URLs & Agent Artifact URLs
-{% if grains.role not in ['so-fleet'] %}
-so-elastic-fleet-auto-configure-elasticsearch-urls:
-  cmd.run:
-    - name: /usr/sbin/so-elastic-fleet-es-url-update
-    - retry:
-        attempts: 4
-        interval: 30
-
-so-elastic-fleet-auto-configure-artifact-urls:
-  cmd.run:
-    - name: /usr/sbin/so-elastic-fleet-artifacts-url-update
-    - retry:
-        attempts: 4
-        interval: 30
-
-{% endif %}

 # Sync Elastic Agent artifacts to Fleet Node
-{% if grains.role in ['so-fleet'] %}
 elasticagent_syncartifacts:
  file.recurse:
    - name: /nsm/elastic-fleet/artifacts/beats
@@ -149,57 +101,6 @@ so-elastic-fleet:
      - x509: etc_elasticfleet_crt
 {%   endif %}

-{%  if GLOBALS.role != "so-fleet" %}
-so-elastic-fleet-package-statefile:
-  file.managed:
-    - name: /opt/so/state/elastic_fleet_packages.txt
-    - contents: {{ELASTICFLEETMERGED.packages}}
-
-so-elastic-fleet-package-upgrade:
-  cmd.run:
-    - name: /usr/sbin/so-elastic-fleet-package-upgrade
-    - retry:
-        attempts: 3
-        interval: 10
-    - onchanges:
-      - file: /opt/so/state/elastic_fleet_packages.txt
-
-so-elastic-fleet-integrations:
-  cmd.run:
-    - name: /usr/sbin/so-elastic-fleet-integration-policy-load
-    - retry:
-        attempts: 3
-        interval: 10
-
-so-elastic-agent-grid-upgrade:
-  cmd.run:
-    - name: /usr/sbin/so-elastic-agent-grid-upgrade
-    - retry:
-        attempts: 12
-        interval: 5
-
-so-elastic-fleet-integration-upgrade:
-  cmd.run:
-    - name: /usr/sbin/so-elastic-fleet-integration-upgrade
-    - retry:
-        attempts: 3
-        interval: 10
-
-{# Optional integrations script doesn't need the retries like so-elastic-fleet-integration-upgrade which loads the default integrations #}
-so-elastic-fleet-addon-integrations:
-  cmd.run:
-    - name: /usr/sbin/so-elastic-fleet-optional-integrations-load
-
-{%   if ELASTICFLEETMERGED.config.defend_filters.enable_auto_configuration %}
-so-elastic-defend-manage-filters-file-watch:
-  cmd.run:
-    - name: python3 /sbin/so-elastic-defend-manage-filters.py -c /opt/so/conf/elasticsearch/curl.config -d /opt/so/conf/elastic-fleet/defend-exclusions/disabled-filters.yaml -i /nsm/securityonion-resources/event_filters/ -i /opt/so/conf/elastic-fleet/defend-exclusions/rulesets/custom-filters/ &>> /opt/so/log/elasticfleet/elastic-defend-manage-filters.log
-    - onchanges:
-      - file: elasticdefendcustom
-      - file: elasticdefenddisabled
-{%    endif %}
-{%  endif %}
-
 delete_so-elastic-fleet_so-status.disabled:
  file.uncomment:
    - name: /opt/so/conf/so-status/so-status.conf
@@ -0,0 +1,112 @@
+# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
+# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at 
+# https://securityonion.net/license; you may not use this file except in compliance with the
+# Elastic License 2.0.
+
+{% from 'allowed_states.map.jinja' import allowed_states %}
+{% if sls in allowed_states %}
+{%   from 'elasticfleet/map.jinja' import ELASTICFLEETMERGED %}
+
+include:
+  - elasticfleet.config
+
+# If enabled, automatically update Fleet Logstash Outputs
+{% if ELASTICFLEETMERGED.config.server.enable_auto_configuration and grains.role not in ['so-import', 'so-eval'] %}
+so-elastic-fleet-auto-configure-logstash-outputs:
+  cmd.run:
+    - name: /usr/sbin/so-elastic-fleet-outputs-update
+    - retry:
+        attempts: 4
+        interval: 30
+
+{# Separate from above in order to catch elasticfleet-logstash.crt changes and force update to fleet output policy #}
+so-elastic-fleet-auto-configure-logstash-outputs-force:
+  cmd.run:
+    - name: /usr/sbin/so-elastic-fleet-outputs-update --certs
+    - retry:
+        attempts: 4
+        interval: 30
+    - onchanges:
+        - x509: etc_elasticfleet_logstash_crt
+        - x509: elasticfleet_kafka_crt
+{% endif %}
+
+# If enabled, automatically update Fleet Server URLs & ES Connection
+so-elastic-fleet-auto-configure-server-urls:
+  cmd.run:
+    - name: /usr/sbin/so-elastic-fleet-urls-update
+    - retry:
+        attempts: 4
+        interval: 30
+
+# Automatically update Fleet Server Elasticsearch URLs & Agent Artifact URLs
+so-elastic-fleet-auto-configure-elasticsearch-urls:
+  cmd.run:
+    - name: /usr/sbin/so-elastic-fleet-es-url-update
+    - retry:
+        attempts: 4
+        interval: 30
+
+so-elastic-fleet-auto-configure-artifact-urls:
+  cmd.run:
+    - name: /usr/sbin/so-elastic-fleet-artifacts-url-update
+    - retry:
+        attempts: 4
+        interval: 30
+
+so-elastic-fleet-package-statefile:
+  file.managed:
+    - name: /opt/so/state/elastic_fleet_packages.txt
+    - contents: {{ELASTICFLEETMERGED.packages}}
+
+so-elastic-fleet-package-upgrade:
+  cmd.run:
+    - name: /usr/sbin/so-elastic-fleet-package-upgrade
+    - retry:
+        attempts: 3
+        interval: 10
+    - onchanges:
+      - file: /opt/so/state/elastic_fleet_packages.txt
+
+so-elastic-fleet-integrations:
+  cmd.run:
+    - name: /usr/sbin/so-elastic-fleet-integration-policy-load
+    - retry:
+        attempts: 3
+        interval: 10
+
+so-elastic-agent-grid-upgrade:
+  cmd.run:
+    - name: /usr/sbin/so-elastic-agent-grid-upgrade
+    - retry:
+        attempts: 12
+        interval: 5
+
+so-elastic-fleet-integration-upgrade:
+  cmd.run:
+    - name: /usr/sbin/so-elastic-fleet-integration-upgrade
+    - retry:
+        attempts: 3
+        interval: 10
+
+{# Optional integrations script doesn't need the retries like so-elastic-fleet-integration-upgrade which loads the default integrations #}
+so-elastic-fleet-addon-integrations:
+  cmd.run:
+    - name: /usr/sbin/so-elastic-fleet-optional-integrations-load
+
+{% if ELASTICFLEETMERGED.config.defend_filters.enable_auto_configuration %}
+so-elastic-defend-manage-filters-file-watch:
+  cmd.run:
+    - name: python3 /sbin/so-elastic-defend-manage-filters.py -c /opt/so/conf/elasticsearch/curl.config -d /opt/so/conf/elastic-fleet/defend-exclusions/disabled-filters.yaml -i /nsm/securityonion-resources/event_filters/ -i /opt/so/conf/elastic-fleet/defend-exclusions/rulesets/custom-filters/ &>> /opt/so/log/elasticfleet/elastic-defend-manage-filters.log
+    - onchanges:
+      - file: elasticdefendcustom
+      - file: elasticdefenddisabled
+{% endif %}
+
+{% else %}
+
+{{sls}}_state_not_allowed:
+  test.fail_without_changes:
+    - name: {{sls}}_state_not_allowed
+
+{% endif %}
@@ -5,11 +5,12 @@
 # this file except in compliance with the Elastic License 2.0.

 . /usr/sbin/so-common
+. /usr/sbin/so-elastic-fleet-common
 {%- import_yaml 'elasticsearch/defaults.yaml' as ELASTICSEARCHDEFAULTS %}
 {%- import_yaml 'elasticfleet/defaults.yaml' as ELASTICFLEETDEFAULTS %}
 {# Optionally override Elasticsearch version for Elastic Agent patch releases #}
 {%- if ELASTICFLEETDEFAULTS.elasticfleet.patch_version is defined %}
-{%-   do ELASTICSEARCHDEFAULTS.update({'elasticsearch': {'version': ELASTICFLEETDEFAULTS.elasticfleet.patch_version}}) %}
+{%-   do ELASTICSEARCHDEFAULTS.elasticsearch.update({'version': ELASTICFLEETDEFAULTS.elasticfleet.patch_version}) %}
 {%- endif %}

 # Only run on Managers
@@ -19,13 +20,10 @@ if ! is_manager_node; then
 fi

 # Get current list of Grid Node Agents that need to be upgraded
-RAW_JSON=$(curl -K /opt/so/conf/elasticsearch/curl.config -L "http://localhost:5601/api/fleet/agents?perPage=20&page=1&kuery=NOT%20agent.version%3A%20{{ELASTICSEARCHDEFAULTS.elasticsearch.version}}%20AND%20policy_id%3A%20so-grid-nodes_%2A&showInactive=false&getStatusSummary=true" --retry 3 --retry-delay 30 --fail 2>/dev/null)
+if ! RAW_JSON=$(fleet_api "agents?perPage=20&page=1&kuery=NOT%20agent.version%3A%20{{ELASTICSEARCHDEFAULTS.elasticsearch.version | urlencode }}%20AND%20policy_id%3A%20so-grid-nodes_%2A&showInactive=false&getStatusSummary=true" -H 'kbn-xsrf: true' -H 'Content-Type: application/json'); then

-# Check to make sure that the server responded with good data - else, bail from script
-CHECKSUM=$(jq -r '.page' <<< "$RAW_JSON")
-if [ "$CHECKSUM" -ne 1 ]; then
- printf "Failed to query for current Grid Agents...\n"
- exit 1
+    printf "Failed to query for current Grid Agents...\n"
+    exit 1
 fi

 # Generate list of Node Agents that need updates
@@ -36,10 +34,12 @@ if [ "$OUTDATED_LIST" != '[]' ]; then
   printf "Initiating upgrades for $AGENTNUMBERS Agents to Elastic {{ELASTICSEARCHDEFAULTS.elasticsearch.version}}...\n\n"

   # Generate updated JSON payload
-   JSON_STRING=$(jq -n --arg ELASTICVERSION {{ELASTICSEARCHDEFAULTS.elasticsearch.version}} --arg UPDATELIST $OUTDATED_LIST '{"version": $ELASTICVERSION,"agents": $UPDATELIST }')
+   JSON_STRING=$(jq -n --arg ELASTICVERSION "{{ELASTICSEARCHDEFAULTS.elasticsearch.version}}" --argjson UPDATELIST "$OUTDATED_LIST" '{"version": $ELASTICVERSION,"agents": $UPDATELIST }')

   # Update Node Agents
-   curl -K /opt/so/conf/elasticsearch/curl.config -L -X POST "http://localhost:5601/api/fleet/agents/bulk_upgrade" -H 'kbn-xsrf: true' -H 'Content-Type: application/json' -d "$JSON_STRING"
+   if ! fleet_api "agents/bulk_upgrade" -XPOST -H 'kbn-xsrf: true' -H 'Content-Type: application/json' -d "$JSON_STRING"; then
+       printf "Failed to initiate Agent upgrades...\n"
+   fi
 else
    printf "No Agents need updates... Exiting\n\n"
    exit 0
@@ -235,6 +235,16 @@ function update_kafka_outputs() {

 {% endif %}

+# Compare the current Elastic Fleet certificate against what is on disk
+POLICY_CERT_SHA=$(jq -r '.item.ssl.certificate' <<< $RAW_JSON | openssl x509 -noout -sha256 -fingerprint)
+DISK_CERT_SHA=$(openssl x509 -in /etc/pki/elasticfleet-logstash.crt -noout -sha256 -fingerprint)
+
+if [[ "$POLICY_CERT_SHA" != "$DISK_CERT_SHA" ]]; then
+    printf "Certificate on disk doesn't match certificate in policy - forcing update\n"
+    UPDATE_CERTS=true
+    FORCE_UPDATE=true
+fi
+
 # Sort & hash the new list of Logstash Outputs
 NEW_LIST_JSON=$(jq --compact-output --null-input '$ARGS.positional' --args -- "${NEW_LIST[@]}")
 NEW_HASH=$(sha256sum <<< "$NEW_LIST_JSON" | awk '{print $1}')
@@ -4,7 +4,7 @@
 # Elastic License 2.0.

 {% from 'allowed_states.map.jinja' import allowed_states %}
-{% if sls.split('.')[0] in allowed_states %}
+{% if sls in allowed_states %}
 {%   from 'vars/globals.map.jinja' import GLOBALS %}
 {%   from 'elasticsearch/config.map.jinja' import ELASTICSEARCHMERGED %}
 {%   from 'elasticsearch/template.map.jinja' import ES_INDEX_SETTINGS, SO_MANAGED_INDICES %}
@@ -17,7 +17,7 @@ include:
  - elasticsearch.ssl
  - elasticsearch.config
  - elasticsearch.sostatus
-{%- if GLOBALS.role != 'so-searchode' %}
+{%- if GLOBALS.role != "so-searchnode" %}
  - elasticsearch.cluster
 {%- endif%}

@@ -102,11 +102,6 @@ so-elasticsearch:
      - cmd: auth_users_roles_inode
      - cmd: auth_users_inode

-delete_so-elasticsearch_so-status.disabled:
-  file.uncomment:
-    - name: /opt/so/conf/so-status/so-status.conf
-    - regex: ^so-elasticsearch$
-
 wait_for_so-elasticsearch:
  http.wait_for_successful_query:
    - name: "https://localhost:9200/"
@@ -117,10 +112,14 @@ wait_for_so-elasticsearch:
    - status: 200
    - wait_for: 300
    - request_interval: 15
-    - backend: requests
    - require:
      - docker_container: so-elasticsearch

+delete_so-elasticsearch_so-status.disabled:
+  file.uncomment:
+    - name: /opt/so/conf/so-status/so-status.conf
+    - regex: ^so-elasticsearch$
+
 {% else %}

 {{sls}}_state_not_allowed:
@@ -103,11 +103,13 @@ load_component_templates() {
    local pattern="${ELASTICSEARCH_TEMPLATES_DIR}/component/$2"
    local append_mappings="${3:-"false"}"

-    # current state of nullglob shell option
-    shopt -q nullglob && nullglob_set=1 || nullglob_set=0
-
-    shopt -s nullglob
    echo -e "\nLoading $printed_name component templates...\n"
+
+    if ! compgen -G "${pattern}/*.json" > /dev/null; then
+        echo "No $printed_name component templates found in ${pattern}, skipping."
+        return
+    fi
+
    for component in "$pattern"/*.json; do
        tmpl_name=$(basename "${component%.json}")

@@ -121,11 +123,6 @@ load_component_templates() {
            SO_LOAD_FAILURES_NAMES+=("$component")
        fi
    done
-
-    # restore nullglob shell option if needed
-    if [[ $nullglob_set -eq 1 ]]; then
-        shopt -u nullglob
-    fi
 }

 check_elasticsearch_responsive() {
@@ -136,7 +133,32 @@ check_elasticsearch_responsive() {
        fail "Elasticsearch is not responding. Please review Elasticsearch logs /opt/so/log/elasticsearch/securityonion.log for more details. Additionally, consider running so-elasticsearch-troubleshoot."
 }

-if [[ "$FORCE" == "true" || ! -f "$SO_STATEFILE_SUCCESS" ]]; then
+index_templates_exist() {
+    local templates_dir="$1"
+
+    if [[ ! -d "$templates_dir" ]]; then
+        return 1
+    fi
+
+    compgen -G "${templates_dir}/*.json" > /dev/null
+}
+
+should_load_addon_templates() {
+    if [[ "$IS_HEAVYNODE" == "true" ]]; then
+        return 1
+    fi
+
+    # Skip statefile checks when forcing template load
+    if [[ "$FORCE" != "true" ]]; then
+        if [[ ! -f "$SO_STATEFILE_SUCCESS" || -f "$ADDON_STATEFILE_SUCCESS" ]]; then
+            return 1
+        fi
+    fi
+
+    index_templates_exist "$ADDON_TEMPLATES_DIR"
+}
+
+if [[ "$FORCE" == "true" || ! -f "$SO_STATEFILE_SUCCESS" ]] && index_templates_exist "$SO_TEMPLATES_DIR"; then
    check_elasticsearch_responsive

    if [[ "$IS_HEAVYNODE" == "false" ]]; then
@@ -201,13 +223,14 @@ if [[ "$FORCE" == "true" || ! -f "$SO_STATEFILE_SUCCESS" ]]; then
            fail "Failed to load all Security Onion core templates successfully."
        fi
    fi
-else
-
+elif ! index_templates_exist "$SO_TEMPLATES_DIR"; then
+    echo "No Security Onion core index templates found in ${SO_TEMPLATES_DIR}, skipping."
+elif [[ -f "$SO_STATEFILE_SUCCESS" ]]; then
    echo "Security Onion core templates already loaded"
 fi

 # Start loading addon templates
-if [[ (-d "$ADDON_TEMPLATES_DIR" && -f "$SO_STATEFILE_SUCCESS" && "$IS_HEAVYNODE" == "false" && ! -f "$ADDON_STATEFILE_SUCCESS") || (-d "$ADDON_TEMPLATES_DIR" && "$IS_HEAVYNODE" == "false" && "$FORCE" == "true") ]]; then
+if should_load_addon_templates; then

    check_elasticsearch_responsive

@@ -59,5 +59,4 @@ global:
    description: Allows use of Endgame with Security Onion. This feature requires a license from Endgame.
    global: True
    advanced: True
-    helpLink: influxdb

@@ -22,7 +22,7 @@ kibana:
          - default
          - file
    migrations:
-      discardCorruptObjects: "8.18.8"
+      discardCorruptObjects: "9.3.3"
    telemetry:
      enabled: False
    xpack:
@@ -3,8 +3,8 @@ kratos:
    description: Enables or disables the Kratos authentication system. WARNING - Disabling this process will cause the grid to malfunction. Re-enabling this setting will require manual effort via SSH.
    forcedType: bool
    advanced: True
+    readonly: True
    helpLink: kratos
-
  oidc:
    enabled:
      description: Set to True to enable OIDC / Single Sign-On (SSO) to SOC. Requires a valid Security Onion license key.
@@ -273,7 +273,7 @@ function deleteMinionFiles () {
 		log "ERROR" "Failed to delete $PILLARFILE"
 		return 1
 	fi
-	
+
 	rm -f $ADVPILLARFILE
 	if [ $? -ne 0 ]; then
 		log "ERROR" "Failed to delete $ADVPILLARFILE"
@@ -281,6 +281,39 @@ function deleteMinionFiles () {
 	fi
 }

+# Remove this minion's postgres Telegraf credential from the shared creds
+# pillar and drop the matching role in Postgres. Always returns 0 so a dead
+# or unreachable so-postgres doesn't block minion deletion — in that case we
+# log a warning and leave the role behind for manual cleanup.
+function remove_postgres_telegraf_from_minion() {
+	local MINION_SAFE
+	MINION_SAFE=$(echo "$MINION_ID" | tr '.-' '__' | tr '[:upper:]' '[:lower:]')
+	local PG_USER="so_telegraf_${MINION_SAFE}"
+
+	log "INFO" "Removing postgres telegraf cred for $MINION_ID"
+
+	so-telegraf-cred remove "$MINION_ID" >/dev/null 2>&1 || true
+
+	if docker ps --format '{{.Names}}' 2>/dev/null | grep -q '^so-postgres$'; then
+		if ! docker exec -i so-postgres psql -v ON_ERROR_STOP=1 -U postgres -d so_telegraf >/dev/null 2>&1 <<EOSQL
+DO \$\$
+BEGIN
+    IF EXISTS (SELECT FROM pg_catalog.pg_roles WHERE rolname = '$PG_USER') THEN
+        EXECUTE format('REASSIGN OWNED BY %I TO so_telegraf', '$PG_USER');
+        EXECUTE format('DROP OWNED BY %I', '$PG_USER');
+        EXECUTE format('DROP ROLE %I', '$PG_USER');
+    END IF;
+END
+\$\$;
+EOSQL
+		then
+			log "WARN" "Failed to drop postgres role $PG_USER; pillar entry was removed — drop manually if the role persists"
+		fi
+	else
+		log "WARN" "so-postgres container is not running; skipping DB role cleanup for $PG_USER"
+	fi
+}
+
 # Create the minion file
 function ensure_socore_ownership() {
 	log "INFO" "Setting socore ownership on minion files"
@@ -542,6 +575,17 @@ function add_telegraf_to_minion() {
        log "ERROR" "Failed to add telegraf configuration to $PILLARFILE"
        return 1
    fi
+
+    # Provision the per-minion postgres Telegraf credential in the shared
+    # telegraf/creds.sls pillar. so-telegraf-cred is the only writer; it
+    # generates a password on first add and is a no-op on re-add so the cred
+    # is stable across repeated so-minion runs. postgres.telegraf_users on the
+    # manager creates/updates the DB role from the same pillar.
+    so-telegraf-cred add "$MINION_ID"
+    if [ $? -ne 0 ]; then
+        log "ERROR" "Failed to provision postgres telegraf cred for $MINION_ID"
+        return 1
+    fi
 }

 function add_influxdb_to_minion() {
@@ -1069,6 +1113,7 @@ case "$OPERATION" in

 	"delete")
 		log "INFO" "Removing minion $MINION_ID"
+		remove_postgres_telegraf_from_minion
 		deleteMinionFiles || {
 			log "ERROR" "Failed to delete minion files for $MINION_ID"
 			exit 1
@@ -0,0 +1,54 @@
+#!/bin/bash
+
+# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
+# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
+# https://securityonion.net/license; you may not use this file except in compliance with the
+# Elastic License 2.0.
+
+# Single writer for the Telegraf Postgres credentials pillar. Thin wrapper
+# around so-yaml.py that generates a password on first add and no-ops on
+# re-add so the cred is stable across repeated so-minion runs.
+#
+# Note: so-yaml.py splits keys on '.' with no escape. SO minion ids are
+# dot-free by construction (setup/so-functions:1884 takes the short_name
+# before the first '.'), so using the raw minion id as the key is safe.
+
+CREDS=/opt/so/saltstack/local/pillar/telegraf/creds.sls
+
+usage() {
+    echo "Usage: $0 <add|remove> <minion_id>" >&2
+    exit 2
+}
+
+seed_creds_file() {
+    mkdir -p "$(dirname "$CREDS")" || return 1
+    if [[ ! -f "$CREDS" ]]; then
+        (umask 027 && printf 'telegraf:\n  postgres_creds: {}\n' > "$CREDS") || return 1
+        chown socore:socore "$CREDS" 2>/dev/null || true
+        chmod 640 "$CREDS" || return 1
+    fi
+}
+
+OP=$1
+MID=$2
+[[ -z "$OP" || -z "$MID" ]] && usage
+
+case "$OP" in
+    add)
+        SAFE=$(echo "$MID" | tr '.-' '__' | tr '[:upper:]' '[:lower:]')
+        seed_creds_file || exit 1
+        if so-yaml.py get -r "$CREDS" "telegraf.postgres_creds.${MID}.user" >/dev/null 2>&1; then
+            exit 0
+        fi
+        PASS=$(tr -dc 'A-Za-z0-9~!@#^&*()_=+[]|;:,.<>?-' < /dev/urandom | head -c 72)
+        so-yaml.py replace "$CREDS" "telegraf.postgres_creds.${MID}.user" "so_telegraf_${SAFE}" >/dev/null
+        so-yaml.py replace "$CREDS" "telegraf.postgres_creds.${MID}.pass" "$PASS" >/dev/null
+        ;;
+    remove)
+        [[ -f "$CREDS" ]] || exit 0
+        so-yaml.py remove "$CREDS" "telegraf.postgres_creds.${MID}" >/dev/null 2>&1 || true
+        ;;
+    *)
+        usage
+        ;;
+esac
@@ -39,9 +39,16 @@ def showUsage(args):


 def loadYaml(filename):
-    file = open(filename, "r")
-    content = file.read()
-    return yaml.safe_load(content)
+    try:
+        with open(filename, "r") as file:
+            content = file.read()
+            return yaml.safe_load(content)
+    except FileNotFoundError:
+        print(f"File not found: {filename}", file=sys.stderr)
+        sys.exit(1)
+    except Exception as e:
+        print(f"Error reading file {filename}: {e}", file=sys.stderr)
+        sys.exit(1)


 def writeYaml(filename, content):
@@ -973,3 +973,21 @@ class TestReplaceListObject(unittest.TestCase):

        expected = "key1:\n- id: '1'\n  status: updated\n- id: '2'\n  status: inactive\n"
        self.assertEqual(actual, expected)
+
+
+class TestLoadYaml(unittest.TestCase):
+
+    def test_load_yaml_missing_file(self):
+        with patch('sys.exit', new=MagicMock()) as sysmock:
+            with patch('sys.stderr', new=StringIO()) as mock_stderr:
+                soyaml.loadYaml("/tmp/so-yaml_test-does-not-exist.yaml")
+                sysmock.assert_called_with(1)
+                self.assertIn("File not found:", mock_stderr.getvalue())
+
+    def test_load_yaml_read_error(self):
+        with patch('sys.exit', new=MagicMock()) as sysmock:
+            with patch('sys.stderr', new=StringIO()) as mock_stderr:
+                with patch('builtins.open', side_effect=PermissionError("denied")):
+                    soyaml.loadYaml("/tmp/so-yaml_test-unreadable.yaml")
+                    sysmock.assert_called_with(1)
+                    self.assertIn("Error reading file", mock_stderr.getvalue())
@@ -24,6 +24,14 @@ BACKUPTOPFILE=/opt/so/saltstack/default/salt/top.sls.backup
 SALTUPGRADED=false
 SALT_CLOUD_INSTALLED=false
 SALT_CLOUD_CONFIGURED=false
+# Check if salt-cloud is installed
+if rpm -q salt-cloud &>/dev/null; then
+  SALT_CLOUD_INSTALLED=true
+fi
+# Check if salt-cloud is configured
+if [[ -f /etc/salt/cloud.profiles.d/socloud.conf ]]; then
+  SALT_CLOUD_CONFIGURED=true
+fi
 # used to display messages to the user at the end of soup
 declare -a FINAL_MESSAGE_QUEUE=()

@@ -477,7 +485,44 @@ elasticsearch_backup_index_templates() {
  tar -czf /nsm/backup/3.0.0_elasticsearch_index_templates.tar.gz -C /opt/so/conf/elasticsearch/templates/index/ .
 }

+ensure_postgres_local_pillar() {
+  # Postgres was added as a service after 3.0.0, so the new pillar/top.sls
+  # references postgres.soc_postgres / postgres.adv_postgres unconditionally.
+  # Managers upgrading from 3.0.0 have no /opt/so/saltstack/local/pillar/postgres/
+  # (make_some_dirs only runs at install time), so the stubs must be created
+  # here before salt-master restarts against the new top.sls.
+  echo "Ensuring postgres local pillar stubs exist."
+  local dir=/opt/so/saltstack/local/pillar/postgres
+  mkdir -p "$dir"
+  [[ -f "$dir/soc_postgres.sls" ]] || touch "$dir/soc_postgres.sls"
+  [[ -f "$dir/adv_postgres.sls" ]] || touch "$dir/adv_postgres.sls"
+  chown -R socore:socore "$dir"
+}
+
+ensure_postgres_secret() {
+  # On a fresh install, generate_passwords + secrets_pillar seed
+  # secrets:postgres_pass in /opt/so/saltstack/local/pillar/secrets.sls. That
+  # code path is skipped on upgrade (secrets.sls already exists from 3.0.0
+  # with import_pass/influx_pass but no postgres_pass), so the postgres
+  # container's POSTGRES_PASSWORD_FILE and SOC's PG_ADMIN_PASS would be empty
+  # after highstate. Generate one now if missing.
+  local secrets_file=/opt/so/saltstack/local/pillar/secrets.sls
+  if [[ ! -f "$secrets_file" ]]; then
+    echo "WARNING: $secrets_file missing; skipping postgres_pass backfill."
+    return 0
+  fi
+  if so-yaml.py get -r "$secrets_file" secrets.postgres_pass >/dev/null 2>&1; then
+    echo "secrets.postgres_pass already set; leaving as-is."
+    return 0
+  fi
+  echo "Seeding secrets.postgres_pass in $secrets_file."
+  so-yaml.py add "$secrets_file" secrets.postgres_pass "$(get_random_value)"
+  chown socore:socore "$secrets_file"
+}
+
 up_to_3.1.0() {
+  ensure_postgres_local_pillar
+  ensure_postgres_secret
  determine_elastic_agent_upgrade
  elasticsearch_backup_index_templates
  # Clear existing component template state file.
@@ -489,33 +534,25 @@ up_to_3.1.0() {

 post_to_3.1.0() {
  /usr/sbin/so-kibana-space-defaults
-
-  # One-time backfill for minions that existed before the postgres Telegraf
-  # feature shipped. Generate the aggregate pillar on the manager and create
-  # the per-minion DB roles, then fan each minion's cred into its own pillar
-  # file. Going forward the reactor handles each new salt-key accept with a
-  # targeted fan-out, so a manager highstate no longer needs to iterate.
-  echo "Provisioning Telegraf Postgres users for existing minions."
-  salt-call --local state.apply postgres.auth,postgres.telegraf_users queue=True || true
-
-  AGGREGATE_PILLAR=/opt/so/saltstack/local/pillar/postgres/auth.sls
-  MINIONS_DIR=/opt/so/saltstack/local/pillar/minions
-  if [[ -f "$AGGREGATE_PILLAR" && -d "$MINIONS_DIR" ]]; then
-    for pillar_file in "$MINIONS_DIR"/*.sls; do
-      [[ -f "$pillar_file" ]] || continue
-      mid=$(basename "$pillar_file" .sls)
-      [[ "$mid" == adv_* ]] && continue
-      safe=$(echo "$mid" | tr '.-' '__' | tr '[:upper:]' '[:lower:]')
-      existing_user=$(so-yaml.py get -r "$pillar_file" postgres.telegraf.user 2>/dev/null || true)
-      [[ "$existing_user" == "so_telegraf_${safe}" ]] && continue
-      user=$(so-yaml.py get -r "$AGGREGATE_PILLAR" "postgres.auth.users.telegraf_${safe}.user" 2>/dev/null || true)
-      pass=$(so-yaml.py get -r "$AGGREGATE_PILLAR" "postgres.auth.users.telegraf_${safe}.pass" 2>/dev/null || true)
-      [[ -z "$user" || -z "$pass" ]] && continue
-      so-yaml.py replace "$pillar_file" postgres.telegraf.user "$user" >/dev/null
-      so-yaml.py replace "$pillar_file" postgres.telegraf.pass "$pass" >/dev/null
-    done
+  # ensure manager has new version of socloud.conf
+  if [[ $SALT_CLOUD_CONFIGURED == true ]]; then
+    salt-call state.apply salt.cloud.config concurrent=True
  fi

+  # Backfill the Telegraf creds pillar for every accepted minion. so-telegraf-cred
+  # add is idempotent — it no-ops when an entry already exists — so this is safe
+  # to run on every soup. The subsequent state.apply creates/updates the matching
+  # Postgres roles from the reconciled pillar.
+  echo "Reconciling Telegraf Postgres creds for accepted minions."
+  for mid in $(salt-key --out=json --list=accepted 2>/dev/null | jq -r '.minions[]?' 2>/dev/null); do
+    [[ -n "$mid" ]] || continue
+    /usr/sbin/so-telegraf-cred add "$mid" || echo "  warning: so-telegraf-cred add $mid failed" >&2
+  done
+  # Run through the master (not --local) so state compilation uses the
+  # master's configured file_roots; the manager's /etc/salt/minion has no
+  # file_roots of its own and --local would fail with "No matching sls found".
+  salt-call state.apply postgres.telegraf_users queue=True || true
+
  POSTVERSION=3.1.0
 }

@@ -689,15 +726,6 @@ upgrade_check_salt() {
 upgrade_salt() {
  echo "Performing upgrade of Salt from $INSTALLEDSALTVERSION to $NEWSALTVERSION."
  echo ""
-  # Check if salt-cloud is installed
-  if rpm -q salt-cloud &>/dev/null; then
-    SALT_CLOUD_INSTALLED=true
-  fi
-  # Check if salt-cloud is configured
-  if [[ -f /etc/salt/cloud.profiles.d/socloud.conf ]]; then
-    SALT_CLOUD_CONFIGURED=true
-  fi
-
  echo "Removing yum versionlock for Salt."
  echo ""
  yum versionlock delete "salt"
@@ -25,8 +25,33 @@ manager_run_es_soc:
        - salt: {{NEWNODE}}_update_mine
 {% endif %}

+# so-minion has already added the new minion's entry to telegraf/creds.sls
+# via so-telegraf-cred before this orch fires. Reconcile the Postgres role
+# on the manager so the new minion can authenticate on its first highstate,
+# then refresh the minion's pillar so its telegraf.conf renders with the
+# freshly-written cred.
+manager_create_postgres_telegraf_role:
+  salt.state:
+    - tgt: {{ MANAGER }}
+    - sls:
+      - postgres.telegraf_users
+    - queue: True
+    - require:
+      - salt: {{NEWNODE}}_update_mine
+
+{{NEWNODE}}_refresh_pillar:
+  salt.function:
+    - name: saltutil.refresh_pillar
+    - tgt: {{ NEWNODE }}
+    - kwarg:
+        wait: True
+    - require:
+      - salt: manager_create_postgres_telegraf_role
+
 {{NEWNODE}}_run_highstate:
  salt.state:
    - tgt: {{ NEWNODE }}
    - highstate: True
    - queue: True
+    - require:
+      - salt: {{NEWNODE}}_refresh_pillar
@@ -1,28 +0,0 @@
-# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
-# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
-# https://securityonion.net/license; you may not use this file except in compliance with the
-# Elastic License 2.0.
-
-# Fired by salt/reactor/telegraf_user_sync.sls when salt-key accepts a new
-# minion. Only provisions the per-minion pillar entry and DB role on the
-# manager; the minion itself will pick up its telegraf config on its first
-# highstate during onboarding, so there's no need to push the telegraf state
-# from here.
-#
-# Target the manager via role grains — same pattern as orch/delete_hypervisor.sls.
-# The reactor doesn't know the manager's minion id, and grains.master on the
-# runner is a hostname, not a targetable id.
-{% set FANOUT_MINION = salt['pillar.get']('postgres_fanout_minion', '') %}
-
-manager_sync_telegraf_pg_users:
-  salt.state:
-    - tgt: 'G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone or G@role:so-eval'
-    - tgt_type: compound
-    - sls:
-      - postgres.auth
-      - postgres.telegraf_users
-    - queue: True
-    {% if FANOUT_MINION %}
-    - pillar:
-        postgres_fanout_minion: {{ FANOUT_MINION }}
-    {% endif %}
@@ -13,24 +13,8 @@
  {% set CHARS = DIGITS~LOWERCASE~UPPERCASE~SYMBOLS %}
  {% set so_postgres_user_pass = salt['pillar.get']('postgres:auth:users:so_postgres_user:pass', salt['random.get_str'](72, chars=CHARS)) %}

-  {# Per-minion Telegraf Postgres credentials. Merge currently-up minions with any #}
-  {# previously-known entries in pillar so existing passwords persist across runs. #}
-  {% set existing = salt['pillar.get']('postgres:auth:users', {}) %}
-  {% set up_minions = salt['saltutil.runner']('manage.up') or [] %}
-  {% set telegraf_users = {} %}
-  {% for key, entry in existing.items() %}
-    {%- if key.startswith('telegraf_') and entry.get('user') and entry.get('pass') %}
-      {%- do telegraf_users.update({key: entry}) %}
-    {%- endif %}
-  {% endfor %}
-  {% for mid in up_minions %}
-    {%- set safe = mid | replace('.','_') | replace('-','_') | lower %}
-    {%- set key = 'telegraf_' ~ safe %}
-    {%- if key not in telegraf_users %}
-      {%- do telegraf_users.update({key: {'user': 'so_telegraf_' ~ safe, 'pass': salt['random.get_str'](72, chars=CHARS)}}) %}
-    {%- endif %}
-  {% endfor %}
-
+# Admin cred only. Per-minion Telegraf creds live in telegraf/creds.sls,
+# managed by /usr/sbin/so-telegraf-cred (called from so-minion).
 postgres_auth_pillar:
  file.managed:
    - name: /opt/so/saltstack/local/pillar/postgres/auth.sls
@@ -43,57 +27,7 @@ postgres_auth_pillar:
              so_postgres_user:
                user: so_postgres
                pass: "{{ so_postgres_user_pass }}"
-              {% for key, entry in telegraf_users.items() %}
-              {{ key }}:
-                user: {{ entry.user }}
-                pass: "{{ entry.pass }}"
-              {% endfor %}
    - show_changes: False
-
-  {# Fan a specific minion's telegraf cred out to its own pillar file.
-     Two triggers populate the target list:
-       - grains.id (always) so the manager's own pillar is populated on every
-         postgres.auth run — otherwise the manager's telegraf has no cred on
-         a fresh install and can't write to its own postgres.
-       - pillar postgres_fanout_minion (when the reactor fires on a new
-         minion's salt-key accept).
-     The `unless` guard keeps re-runs idempotent, so this is one so-yaml.py
-     check per target, not per minion in the grid. Bulk backfill for
-     already-accepted minions lives in soup. #}
-  {% set fanout_targets = [] %}
-  {% if grains.id %}
-  {%-   do fanout_targets.append(grains.id) %}
-  {% endif %}
-  {% set fanout_mid = salt['pillar.get']('postgres_fanout_minion') %}
-  {% if fanout_mid and fanout_mid not in fanout_targets %}
-  {%-   do fanout_targets.append(fanout_mid) %}
-  {% endif %}
-
-  {% for mid in fanout_targets %}
-    {%- set safe = mid | replace('.','_') | replace('-','_') | lower %}
-    {%- set key = 'telegraf_' ~ safe %}
-    {%- set entry = telegraf_users.get(key) %}
-    {%- if entry %}
-
-postgres_telegraf_minion_pillar_{{ safe }}:
-  cmd.run:
-    - name: |
-        set -e
-        PILLAR_FILE=/opt/so/saltstack/local/pillar/minions/{{ mid }}.sls
-        if [ ! -f "$PILLAR_FILE" ]; then
-          echo '{}' > "$PILLAR_FILE"
-          chown socore:socore "$PILLAR_FILE" 2>/dev/null || true
-          chmod 640 "$PILLAR_FILE"
-        fi
-        /usr/sbin/so-yaml.py replace "$PILLAR_FILE" postgres.telegraf.user '{{ entry.user }}'
-        /usr/sbin/so-yaml.py replace "$PILLAR_FILE" postgres.telegraf.pass '{{ entry.pass }}'
-    - unless: |
-        [ "$(/usr/sbin/so-yaml.py get -r /opt/so/saltstack/local/pillar/minions/{{ mid }}.sls postgres.telegraf.user 2>/dev/null)" = '{{ entry.user }}' ]
-    - require:
-      - file: postgres_auth_pillar
-
-    {%- endif %}
-  {% endfor %}
 {% else %}

 {{sls}}_state_not_allowed:
@@ -10,7 +10,7 @@

 {# postgres_wait_ready below requires `docker_container: so-postgres`, which is
   declared in postgres.enabled. Include it here so state.apply postgres.telegraf_users
-   on its own (from the reactor orch or from soup) still has that ID in scope. Salt
+   on its own (e.g. from orch.deploy_newnode) still has that ID in scope. Salt
   de-duplicates the circular include. #}
 include:
  - postgres.enabled
@@ -96,9 +96,9 @@ postgres_telegraf_group_role:
    - require:
      - cmd: postgres_create_telegraf_db

-{%   set users = salt['pillar.get']('postgres:auth:users', {}) %}
-{%   for key, entry in users.items() %}
-{%     if key.startswith('telegraf_') and entry.get('user') and entry.get('pass') %}
+{%   set creds = salt['pillar.get']('telegraf:postgres_creds', {}) %}
+{%   for mid, entry in creds.items() %}
+{%     if entry.get('user') and entry.get('pass') %}
 {%       set u = entry.user %}
 {%       set p = entry.pass | replace("'", "''") %}

@@ -6,39 +6,74 @@
 # Elastic License 2.0.

 import logging
-from subprocess import call
-import yaml
+import os
+import re
+import shlex
+import subprocess

 log = logging.getLogger(__name__)

+SO_MINION = '/usr/sbin/so-minion'
+
+_NODETYPE_RE = re.compile(r'^[A-Z][A-Z0-9_]{0,31}$')
+_MINIONID_RE = re.compile(r'^[A-Za-z0-9._-]{1,253}$')
+_HOSTPART_RE = re.compile(r'^[A-Za-z0-9._-]{1,253}$')
+_IPV4_RE = re.compile(
+    r'^(?:(?:25[0-5]|2[0-4]\d|[01]?\d?\d)\.){3}'
+    r'(?:25[0-5]|2[0-4]\d|[01]?\d?\d)$'
+)
+_HEAP_RE = re.compile(r'^\d{1,6}[kKmMgG]?$')
+
+
+def _check(name, value, pattern):
+  s = str(value)
+  if not pattern.match(s):
+    raise ValueError("sominion_setup_reactor: refusing unsafe %s=%r" % (name, value))
+  return s
+
+
 def run():
  log.info('sominion_setup_reactor: Running')
  minionid = data['id']
  DATA = data['data']
-  hv_name = DATA['HYPERVISOR_HOST']
  log.info('sominion_setup_reactor: DATA: %s' % DATA)

-  # Build the base command
-  cmd = "NODETYPE=" + DATA['NODETYPE'] + " /usr/sbin/so-minion -o=addVM -m=" + minionid + " -n=" + DATA['MNIC'] + " -i=" + DATA['MAINIP'] + " -c=" + str(DATA['CPUCORES']) + " -d='" + DATA['NODE_DESCRIPTION'] + "'"
-  
-  # Add optional arguments only if they exist in DATA
+  nodetype = _check('NODETYPE', DATA['NODETYPE'], _NODETYPE_RE)
+
+  argv = [
+    SO_MINION,
+    '-o=addVM',
+    '-m=' + _check('minionid', minionid,        _MINIONID_RE),
+    '-n=' + _check('MNIC',     DATA['MNIC'],    _HOSTPART_RE),
+    '-i=' + _check('MAINIP',   DATA['MAINIP'],  _IPV4_RE),
+    '-c=' + str(int(DATA['CPUCORES'])),
+    '-d=' + str(DATA['NODE_DESCRIPTION']),
+  ]
+
  if 'CORECOUNT' in DATA:
-    cmd += " -C=" + str(DATA['CORECOUNT'])
-    
+    argv.append('-C=' + str(int(DATA['CORECOUNT'])))
+
  if 'INTERFACE' in DATA:
-    cmd += " -a=" + DATA['INTERFACE']
-  
+    argv.append('-a=' + _check('INTERFACE', DATA['INTERFACE'], _HOSTPART_RE))
+
  if 'ES_HEAP_SIZE' in DATA:
-    cmd += " -e=" + DATA['ES_HEAP_SIZE']
-  
+    argv.append('-e=' + _check('ES_HEAP_SIZE', DATA['ES_HEAP_SIZE'], _HEAP_RE))
+
  if 'LS_HEAP_SIZE' in DATA:
-    cmd += " -l=" + DATA['LS_HEAP_SIZE']
+    argv.append('-l=' + _check('LS_HEAP_SIZE', DATA['LS_HEAP_SIZE'], _HEAP_RE))

  if 'LSHOSTNAME' in DATA:
-    cmd += " -L=" + DATA['LSHOSTNAME']
-  
-  log.info('sominion_setup_reactor: Command: %s' % cmd)
-  rc = call(cmd, shell=True)
+    argv.append('-L=' + _check('LSHOSTNAME', DATA['LSHOSTNAME'], _HOSTPART_RE))
+
+  env = os.environ.copy()
+  env['NODETYPE'] = nodetype
+
+  log.info(
+    'sominion_setup_reactor: argv: %s (NODETYPE=%s)',
+    ' '.join(shlex.quote(a) for a in argv),
+    shlex.quote(nodetype),
+  )
+  rc = subprocess.call(argv, shell=False, env=env)

  log.info('sominion_setup_reactor: rc: %s' % rc)

@@ -1,18 +0,0 @@
-# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
-# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
-# https://securityonion.net/license; you may not use this file except in compliance with the
-# Elastic License 2.0.
-
-{# Fires on salt/key. Only act on successful key acceptance — not reauth. #}
-{% if data.get('act') == 'accept' and data.get('result') == True and data.get('id') %}
-
-{{ data['id'] }}_telegraf_pg_sync:
-  runner.state.orchestrate:
-    - args:
-      - mods: orch.telegraf_postgres_sync
-      - pillar:
-          postgres_fanout_minion: {{ data['id'] }}
-
-{% do salt.log.info('telegraf_user_sync reactor: syncing telegraf PG user for minion %s' % data['id']) %}
-
-{% endif %}
@@ -27,6 +27,7 @@ sool9_{{host}}:
    log_file: /opt/so/log/salt/minion
  grains:
    hypervisor_host: {{host ~ "_" ~ role}}
+    sosmodel: HVGUEST
  preflight_cmds:
    - |
      {%- set hostnames = [MANAGERHOSTNAME] %}
@@ -62,19 +62,6 @@ engines_config:
    - name: /etc/salt/master.d/engines.conf
    - source: salt://salt/files/engines.conf

-reactor_config_telegraf:
-  file.managed:
-    - name: /etc/salt/master.d/reactor_telegraf.conf
-    - contents: |
-        reactor:
-          - 'salt/key':
-            - /opt/so/saltstack/default/salt/reactor/telegraf_user_sync.sls
-    - user: root
-    - group: root
-    - mode: 644
-    - watch_in:
-      - service: salt_master_service
-
 # update the bootstrap script when used for salt-cloud
 salt_bootstrap_cloud:
  file.managed:
@@ -24,11 +24,6 @@

 {% do SOCDEFAULTS.soc.config.server.modules.elastic.update({'username': GLOBALS.elasticsearch.auth.users.so_elastic_user.user, 'password': GLOBALS.elasticsearch.auth.users.so_elastic_user.pass}) %}

-{% if GLOBALS.postgres is defined and GLOBALS.postgres.auth is defined %}
-{%   set PG_ADMIN_PASS = salt['pillar.get']('secrets:postgres_pass', '') %}
-{% do SOCDEFAULTS.soc.config.server.modules.update({'postgres': {'hostUrl': GLOBALS.manager_ip, 'port': 5432, 'username': GLOBALS.postgres.auth.users.so_postgres_user.user, 'password': GLOBALS.postgres.auth.users.so_postgres_user.pass, 'adminUser': 'postgres', 'adminPassword': PG_ADMIN_PASS, 'dbname': 'securityonion', 'sslMode': 'require', 'assistantEnabled': true, 'esHostUrl': 'https://' ~ GLOBALS.manager_ip ~ ':9200', 'esUsername': GLOBALS.elasticsearch.auth.users.so_elastic_user.user, 'esPassword': GLOBALS.elasticsearch.auth.users.so_elastic_user.pass, 'esVerifyCert': false}}) %}
-{% endif %}
-
 {% do SOCDEFAULTS.soc.config.server.modules.influxdb.update({'hostUrl': 'https://' ~ GLOBALS.influxdb_host ~ ':8086'}) %}
 {% do SOCDEFAULTS.soc.config.server.modules.influxdb.update({'token': INFLUXDB_TOKEN}) %}
 {% for tool in SOCDEFAULTS.soc.config.server.client.tools %}
@@ -3,6 +3,7 @@ soc:
    description: Enables or disables SOC. WARNING - Disabling this setting is unsupported and will cause the grid to malfunction. Re-enabling this setting is a manual effort via SSH.
    forcedType: bool
    advanced: True
+    readonly: True
  telemetryEnabled:
    title: SOC Telemetry
    description: When this setting is enabled and the grid is not in airgap mode, SOC will provide feature usage data to the Security Onion development team via Google Analytics. This data helps Security Onion developers determine which product features are being used and can also provide insight into improving the user interface. When changing this setting, wait for the grid to fully synchronize and then perform a hard browser refresh on SOC, to force the browser cache to update and reflect the new setting.
@@ -890,12 +891,16 @@ soc:
            suricata:
              description: The template used when creating a new Suricata detection. [publicId] will be replaced with an unused Public Id.
              multiline: True
+              forcedType: string
            strelka:
              description: The template used when creating a new Strelka detection.
              multiline: True
+              forcedType: string
            elastalert:
              description: The template used when creating a new ElastAlert detection. [publicId] will be replaced with an unused Public Id.
              multiline: True
+              forcedType: string
+
        grid:
          maxUploadSize:
            description: The maximum number of bytes for an uploaded PCAP import file.
@@ -10,12 +10,12 @@
 {%- set LOGSTASH_ENABLED = LOGSTASH_MERGED.enabled %}
 {%- set TG_OUT = TELEGRAFMERGED.output | upper %}
 {%- set PG_HOST = GLOBALS.manager_ip %}
-{#- Per-minion telegraf creds are written into the minion's own pillar file
-    (/opt/so/saltstack/local/pillar/minions/<id>.sls) by postgres.auth on the
-    manager. Each minion only sees its own password — the aggregate map in
-    postgres:auth:users is manager-scoped. #}
-{%- set PG_USER = salt['pillar.get']('postgres:telegraf:user', '') %}
-{%- set PG_PASS = salt['pillar.get']('postgres:telegraf:pass', '') %}
+{#- Per-minion telegraf creds live in the grid-wide telegraf/creds.sls pillar,
+    written by /usr/sbin/so-telegraf-cred on the manager. Each minion looks up
+    its own entry by grains.id. #}
+{%- set PG_ENTRY = salt['pillar.get']('telegraf:postgres_creds:' ~ grains.id, {}) %}
+{%- set PG_USER = PG_ENTRY.get('user', '') %}
+{%- set PG_PASS = PG_ENTRY.get('pass', '') %}
 # Global tags can be specified here in key="value" format.
 [global_tags]
  role = "{{ GLOBALS.role.split('-') | last }}"
@@ -202,10 +202,10 @@ check_service_status() {
 	systemctl status $service_name > /dev/null 2>&1
 	local status=$?
 	if [ $status -gt 0 ]; then
-		info "  $service_name is not running" 
+		info "$service_name is not running" 
 		return 1;
 	else
-		info "  $service_name is running"
+		info "$service_name is running"
 		return 0;
 	fi

@@ -1549,13 +1549,8 @@ clear_previous_setup_results() {
 reinstall_init() {
 	info "Putting system in state to run setup again"

-	if [[ $install_type =~ ^(MANAGER|EVAL|MANAGERSEARCH|MANAGERHYPE|STANDALONE|FLEET|IMPORT)$ ]]; then
-		local salt_services=( "salt-master" "salt-minion" )
-	else
-		local salt_services=( "salt-minion" )
-	fi
-
-	local service_retry_count=20
+	# Always include both services. check_service_status skips units that aren't present.
+	local salt_services=( "salt-master" "salt-minion" )

 	{
 		# remove all of root's cronjobs
@@ -1571,31 +1566,51 @@ reinstall_init() {

 		salt-call state.apply ca.remove -linfo --local --file-root=../salt

-		# Kill any salt processes (safely)
+		# Stop salt services and force-kill any lingering salt processes (including orphans
+		# from an earlier reinstall attempt where the unit file is gone but processes survive)
+		# so dnf remove salt can run cleanly
 		for service in "${salt_services[@]}"; do
-			# Stop the service in the background so we can exit after a certain amount of time
 			if check_service_status "$service"; then
-				systemctl stop "$service" &
+				info "Stopping $service via systemctl"
+				systemctl stop "$service"
 			fi
-			local pid=$!
-
-			local count=0
-			while check_service_status "$service"; do
-				if [[ $count -gt $service_retry_count ]]; then
-					echo "Could not stop $service after 1 minute, exiting setup."
-
-					# Stop the systemctl process trying to kill the service, show user a message, then exit setup
-					kill -9 $pid
-					fail_setup
-				fi
-				
-				sleep 5
-				((count++))
-			done
 		done

+		# Unconditionally force-kill any remaining salt binaries — these may be orphaned
+		# from a prior aborted reinstall (no unit file, so systemctl can't see them).
+		for salt_bin in salt-master salt-minion salt-call salt-cloud; do
+			if pgrep -f "/usr/bin/${salt_bin}" > /dev/null 2>&1; then
+				info "Force-killing lingering $salt_bin processes"
+				pkill -9 -ef "/usr/bin/${salt_bin}" 2>/dev/null
+			fi
+		done
+		# Catch stray `salt` CLI children from saltutil.kill_all_jobs / state.apply invocations
+		pkill -9 -ef "/usr/bin/python3 /bin/salt" 2>/dev/null
+
+		# Give the kernel a moment to reap the killed processes before dnf removes the binaries
+		local kill_wait=0
+		while pgrep -f "/usr/bin/salt-" > /dev/null 2>&1; do
+			if [[ $kill_wait -gt 10 ]]; then
+				info "Salt processes still present after SIGKILL + 10s wait; proceeding anyway"
+				pgrep -af "/usr/bin/salt-" | while read -r line; do info "  lingering: $line"; done
+				break
+			fi
+			sleep 1
+			((kill_wait++))
+		done
+
+		# Clear the 'failed' state SIGKILL left on the units before removing the package
+		systemctl reset-failed salt-master.service salt-minion.service 2>/dev/null || true
+
 		# Remove all salt configs
-		rm -rf /etc/salt/engines/* /etc/salt/grains /etc/salt/master /etc/salt/master.d/* /etc/salt/minion /etc/salt/minion.d/* /etc/salt/pki/* /etc/salt/proxy /etc/salt/proxy.d/* /var/cache/salt/
+		dnf -y remove salt
+		rm -rf /etc/salt/ /var/cache/salt/
+
+		# Drop systemd's in-memory references to the now-removed units
+		systemctl daemon-reload
+
+		# Uninstall local Elastic Agent, if installed
+		elastic-agent uninstall -f

 		if command -v docker &> /dev/null; then
 			# Stop and remove all so-* containers so files can be changed with more safety
@@ -1619,10 +1634,7 @@ reinstall_init() {
 		backup_dir /nsm/hydra "$date_string"
 		backup_dir /nsm/influxdb "$date_string"

-		# Uninstall local Elastic Agent, if installed
-		elastic-agent uninstall -f
-
-	} >> "$setup_log" 2>&1
+	} 2>&1 | tee -a "$setup_log"

 	info "System reinstall init has been completed."
 }
@@ -219,7 +219,7 @@ if [ -n "$test_profile" ]; then
 	WEBUSER=onionuser@somewhere.invalid
 	WEBPASSWD1=0n10nus3r
 	WEBPASSWD2=0n10nus3r
-	NODE_DESCRIPTION="${HOSTNAME} - ${install_type} - ${MAINIP}"
+	NODE_DESCRIPTION="${HOSTNAME} - ${install_type} - ${MSRVIP_OFFSET}"

 	update_sudoers_for_testing
 fi
Author	SHA1	Message	Date
Mike Reeves	2dcded6cca	drop postgres module from soc defaults injection The soc binary on 3/dev does not register a postgres module, so injecting postgres into soc.config.server.modules makes soc abort at launch with 'Module does not exist: postgres'. The soc-side module is staged on feature/postgres but is not landing this release. Drop the injection until the module ships; salt/postgres state and pillars are unchanged.	2026-04-28 15:46:56 -04:00
Mike Reeves	8ca59e6f0c	Merge pull request #15838 from Security-Onion-Solutions/fix/docker-refresh-multiarch-pull Fix/docker refresh multiarch pull	2026-04-28 15:14:27 -04:00
Mike Reeves	82dac82d15	drop platform/digest pull resolution The digest-pull logic was added to make `docker push` work for multi-arch upstream tags. Now that the push step is `docker buildx imagetools create` pinned to the gpg-verified RepoDigest, the registry-to-registry copy handles single- and multi-arch sources without help. Reverts the pull back to the original line and removes the unused PLATFORM_OS/_ARCH detection.	2026-04-28 14:54:25 -04:00
Mike Reeves	288a823edf	push images via buildx imagetools create Replaces `docker push` with a registry-to-registry copy. On Docker 29.x with the containerd image store, `docker push` of a freshly-pulled image hits a path that wraps single-platform manifests in a synthetic index and then can't push the layers it claims to reference, producing `NotFound: content digest ...` even when the image is fully present. Keep the local `docker tag` so so-image-pull's `docker images \| grep :5000` existence check continues to work.	2026-04-28 14:49:02 -04:00
Jorge Reyes	f9e3d30a71	Merge pull request #15837 from Security-Onion-Solutions/reyesj2/elastic-fleet-cert-check check current fleet policy cert against cert on disk	2026-04-28 13:47:55 -05:00
reyesj2	9cec79b299	check current fleet policy cert against cert on disk Co-authored-by: Copilot <copilot@github.com>	2026-04-28 13:34:39 -05:00
Mike Reeves	c86399327b	fix so-docker-refresh push for multi-arch source images docker pull of a multi-arch tag on Docker 29.x leaves the local tag pointing at the image index rather than the platform-specific manifest. The subsequent docker push then tries to push every sub-manifest the index references and fails on layers we never fetched. Resolve the local-platform manifest digest from the upstream index via docker buildx imagetools inspect, pull by that digest, and re-tag locally to the canonical tag. The signing flow and the existing tag/push to the embedded registry are unchanged.	2026-04-28 14:27:59 -04:00
Mike Reeves	fa8162de02	Merge pull request #15749 from Security-Onion-Solutions/feature/postgres Add so-postgres Salt states and infrastructure	2026-04-28 10:15:47 -04:00
Josh Patterson	33abc429d1	Merge pull request #15835 from Security-Onion-Solutions/fix/reactor/sominon_setup fix sominion_setup reactor	2026-04-28 08:55:58 -04:00
Jorge Reyes	b22585ca90	Merge pull request #15833 from Security-Onion-Solutions/reyesj2-es933 exclude more transform job errors	2026-04-27 15:05:11 -05:00
reyesj2	9f2ca7012f	exclude more transform job errors	2026-04-27 15:02:13 -05:00
Josh Patterson	21aeb68188	fix sominion_setup reactor	2026-04-27 14:30:41 -04:00
Josh Patterson	81e60ec5bf	Merge pull request #15829 from Security-Onion-Solutions/fix/reinstall2 fix reinstall	2026-04-24 16:20:53 -04:00
Josh Patterson	199c2746f1	stop salt-minion and salt-master regardless of install type. display reinstall on console and save to logfile	2026-04-24 15:24:11 -04:00
Josh Patterson	8eca465ef6	uninstall elastic-agent before stopping dockers on reinstall	2026-04-24 14:35:11 -04:00
Jorge Reyes	a45e59239f	Merge pull request #15826 from Security-Onion-Solutions/reyesj2-es933 heavynode should run es cluster state	2026-04-24 13:07:48 -05:00
Josh Patterson	2ad0bcab7c	Merge pull request #15828 from Security-Onion-Solutions/fix/annotations readonly soc and kratos enabled	2026-04-24 14:00:02 -04:00
Josh Patterson	070d150420	readonly soc and kratos enabled	2026-04-24 13:56:35 -04:00
reyesj2	90ecbe90d8	allow heavynodes to run elasticsearch/cluster state	2026-04-24 12:56:27 -05:00
Josh Patterson	813fa03dc3	Merge pull request #15824 from Security-Onion-Solutions/fix/reinstall2 fix reinstall issue with salt	2026-04-24 12:22:54 -04:00
Josh Patterson	02381fbbe9	stop salt-cloud , belt-and-suspenders against a broken/incomplete salt RPM	2026-04-24 11:33:21 -04:00
Josh Patterson	0722b681b1	redo service stop on reinstall	2026-04-24 11:04:46 -04:00
Josh Patterson	564815e836	redo how services are stopped during reinstall	2026-04-24 10:46:29 -04:00
Jorge Reyes	88b30adf7f	Merge pull request #15823 from Security-Onion-Solutions/reyesj2-es933 typo	2026-04-24 09:27:08 -05:00
reyesj2	b6acf3b522	typo	2026-04-24 09:24:58 -05:00
Jason Ertel	ba55468da8	Merge pull request #15822 from Security-Onion-Solutions/jertel/wip numeric test description	2026-04-24 08:26:55 -04:00
Jason Ertel	cdd217283d	numeric test description	2026-04-24 08:13:36 -04:00
Jorge Reyes	810a582717	Merge pull request #15813 from Security-Onion-Solutions/reyesj2-es933 split up Elastic Fleet state	2026-04-23 14:51:32 -05:00
Mike Reeves	a6948e8dcb	Remove helpLink for influxdb in soc_global.yaml Removed helpLink for influxdb from endgamehost configuration.	2026-04-23 13:56:41 -04:00
Mike Reeves	5f35554fdc	Merge pull request #15712 from Security-Onion-Solutions/soupfix Fix soup	2026-04-23 12:39:50 -04:00
Mike Reeves	0ecc7ae594	soup: drop --local from postgres.telegraf_users reconcile The manager's /etc/salt/minion (written by so-functions:configure_minion) has no file_roots, so salt-call --local falls back to Salt's default /srv/salt and fails with "No matching sls found for 'postgres.telegraf_users' in env 'base'". \|\| true was silently swallowing the error, which meant the DB roles for the pillar entries just populated by the so-telegraf-cred backfill loop never actually got created. Route through salt-master instead; its file_roots already points at the default/local salt trees.	2026-04-23 11:25:44 -04:00
reyesj2	fdfca469cc	prevent non-manager nodes from running elasticsearch.cluster state manually	2026-04-23 09:53:07 -05:00
reyesj2	5f2ec76ba8	prevent fleetnode from being able to run elasticfleet.manager state manually	2026-04-23 09:50:45 -05:00
reyesj2	b015c8ff14	remove docker import	2026-04-23 09:31:30 -05:00
reyesj2	7e70870a9e	remove globals import	2026-04-23 09:25:36 -05:00
Mike Reeves	eadad6c163	soup: bootstrap postgres pillar stubs and secret on 3.0.0 upgrade pillar/top.sls now references postgres.soc_postgres / postgres.adv_postgres unconditionally, but make_some_dirs only runs at install time so managers upgrading from 3.0.0 have no local/pillar/postgres/ and salt-master fails pillar render on the first post-upgrade restart. Similarly, secrets_pillar is a no-op on upgrade (secrets.sls already exists), so secrets:postgres_pass never gets seeded and the postgres container's POSTGRES_PASSWORD_FILE and SOC's PG_ADMIN_PASS would land empty after highstate. Add ensure_postgres_local_pillar and ensure_postgres_secret to up_to_3.1.0 so the stubs and secret exist before masterlock/salt-master restart. Both are idempotent and safe to re-run.	2026-04-23 10:01:38 -04:00
reyesj2	22b32a16dd	include elasticfleet.config	2026-04-23 08:30:47 -05:00
reyesj2	22f869734e	add check for files before attempting to use file pattern to load templates	2026-04-22 23:11:31 -05:00
reyesj2	398bc9e4ed	update kibana discardCorruptObjects version	2026-04-22 20:38:13 -05:00
reyesj2	72dbb69a1c	fix searchnodes running elasticsearch/cluster state	2026-04-22 20:37:48 -05:00
reyesj2	339959d1c0	split up elasticfleet/enabled state	2026-04-22 20:30:40 -05:00
Mike Reeves	d5c0ec4404	so-yaml_test: cover loadYaml error paths Exercises the FileNotFoundError and generic-exception branches added to loadYaml in the previous commit, restoring 100% coverage required by the build.	2026-04-22 14:30:51 -04:00
Mike Reeves	e616b4c120	so-telegraf-cred: make executable and harden error handling so-telegraf-cred was committed with mode 644, causing `so-telegraf-cred add "$MINION_ID"` in so-minion's add_telegraf_to_minion to fail with "Permission denied" and log "Failed to provision postgres telegraf cred for <minion>". Mark it executable. Also bail early in seed_creds_file if mkdir/printf/chmod fail, and in so-yaml.py loadYaml surface a clear stderr message with the filename instead of an unhandled FileNotFoundError traceback.	2026-04-22 14:25:19 -04:00
Mike Reeves	f240a99e22	so-telegraf-cred: thin bash wrapper around so-yaml.py Swap the ~150-line Python implementation for a 48-line bash script that delegates YAML mutation to so-yaml.py — the same helper so-minion and soup already use. Same semantics: seed the creds pillar on first use, idempotent add, silent remove. SO minion ids are dot-free by construction (setup/so-functions:1884 strips everything after the first '.'), so using the raw id as the so-yaml.py key path is safe.	2026-04-22 11:09:53 -04:00
Mike Reeves	614f32c5e0	Split postgres auth from per-minion telegraf creds The old flow had two writers for each per-minion Telegraf password (so-minion wrote the minion pillar; postgres.auth regenerated any missing aggregate entries). They drifted on first-boot and there was no trigger to create DB roles when a new minion joined. Split responsibilities: - pillar/postgres/auth.sls (manager-scoped) keeps only the so_postgres admin cred. - pillar/telegraf/creds.sls (grid-wide) holds a {minion_id: {user, pass}} map, shadowed per-install by the local-pillar copy. - salt/manager/tools/sbin/so-telegraf-cred is the single writer: flock, atomic YAML write, PyYAML safe_dump so passwords never round-trip through so-yaml.py's type coercion. Idempotent add, quiet remove. - so-minion's add/remove hooks now shell out to so-telegraf-cred instead of editing pillar files directly. - postgres.telegraf_users iterates the new pillar key and CREATE/ALTERs roles from it; telegraf.conf reads its own entry via grains.id. - orch.deploy_newnode runs postgres.telegraf_users on the manager and refreshes the new minion's pillar before the new node highstates, so the DB role is in place the first time telegraf tries to connect. - soup's post_to_3.1.0 backfills the creds pillar from accepted salt keys (idempotent) and runs postgres.telegraf_users once to reconcile the DB.	2026-04-22 10:55:15 -04:00
Josh Patterson	cd6707a566	Merge pull request #15800 from Security-Onion-Solutions/feature/vm-raid-status monitor raid for vms	2026-04-22 09:42:44 -04:00
Josh Patterson	edd207a9d5	soup update socloud.conf	2026-04-22 09:20:53 -04:00
Mike Reeves	724d76965f	soup: update postgres backfill comment to reflect reactor removal The reactor path is gone; so-minion now owns add/delete for new minions. The backfill itself is unchanged — postgres.auth's up_minions fallback fills the aggregate, postgres.telegraf_users creates the roles, and the bash loop fans to per-minion pillar files — so the pre-feature upgrade story still works end-to-end. Just refresh the comment so it isn't misleading.	2026-04-21 15:45:05 -04:00
Mike Reeves	dbf4fb66a4	Clean up postgres telegraf cred on so-minion delete Paired with the add path in add_telegraf_to_minion: when a minion is removed, drop its entry from the aggregate postgres pillar and drop the matching so_telegraf_<safe> role from the database. Without this, stale entries and DB roles accumulate over time. Makes rotate-password and compromise-recovery both a clean delete+add: so-minion -o=delete -m=<id> so-minion -o=add -m=<id> The first call drops the role and clears the aggregate pillar; the second generates a brand-new password. The cleanup is best-effort — if so-postgres isn't running or the DROP ROLE fails (e.g., the role owns unexpected objects), we log a warning and continue so the minion delete itself never gets blocked by postgres state. Admins can mop up stray roles manually if that happens.	2026-04-21 15:43:01 -04:00
Mike Reeves	5f28e9b191	Move per-minion telegraf cred provisioning into so-minion Simpler, race-free replacement for the reactor + orch + fan-out chain. - salt/manager/tools/sbin/so-minion: expand add_telegraf_to_minion to generate a random 72-char password, reuse any existing password from the aggregate pillar, write postgres.telegraf.{user,pass} into the minion's own pillar file, and update the aggregate pillar so postgres.telegraf_users can CREATE ROLE on the next manager apply. Every create<ROLE> function already calls this hook, so add / addVM / setup dispatches are all covered identically and synchronously. - salt/postgres/auth.sls: strip the fanout_targets loop and the postgres_telegraf_minion_pillar_<safe> cmd.run block — it's now redundant. The state still manages the so_postgres admin user and writes the aggregate pillar for postgres.telegraf_users to consume. - salt/reactor/telegraf_user_sync.sls: deleted. - salt/orch/telegraf_postgres_sync.sls: deleted. - salt/salt/master.sls: drop the reactor_config_telegraf block that registered the reactor on /etc/salt/master.d/reactor_telegraf.conf. - salt/orch/deploy_newnode.sls: drop the manager_fanout_postgres_telegraf step and the require: it added to the newnode highstate. Back to its original 3/dev shape. No more ephemeral postgres_fanout_minion pillar, no more async salt/key reactor, no more so-minion setupMinionFiles race: the pillar write happens inline inside setupMinionFiles itself.	2026-04-21 15:34:15 -04:00
Jorge Reyes	01bd3b6e06	Merge pull request #15807 from Security-Onion-Solutions/reyesj2-es933 urlencode elasticsearch version	2026-04-21 14:11:04 -05:00
Mike Reeves	1abfd77351	Hide telegraf password from console and close so-minion race Two fixes on the postgres telegraf fan-out path: 1. postgres.auth cmd.run leaked the password to the console because Salt always prints the Name: field and `show_changes: False` does not apply to cmd.run. Move the user and password into the `env:` attribute so the shell body still sees them via $PG_USER / $PG_PASS but Salt's state reporter never renders them. 2. so-minion's addMinion -> setupMinionFiles sequence removes the minion pillar file and rewrites it from scratch, which wipes the postgres.telegraf.* entries the reactor may have already written on salt-key accept. Add a postgres.auth fan-out step to orch.deploy_newnode (the orch so-minion kicks off after setupMinionFiles) and require it from the new minion's highstate. Idempotent via the existing unless: guard in postgres.auth.	2026-04-21 15:10:57 -04:00
reyesj2	06a555fafb	urlencode elasticsearch version	2026-04-21 14:01:31 -05:00
Jason Ertel	7411031e11	Merge pull request #15803 from Security-Onion-Solutions/jertel/wip more error handling during image updates	2026-04-21 10:21:56 -04:00
Jason Ertel	247091766c	more error handling during image updates	2026-04-21 10:18:05 -04:00
Josh Patterson	7f93110d68	Merge remote-tracking branch 'origin/3/dev' into feature/vm-raid-status	2026-04-21 10:10:38 -04:00
Jason Ertel	33ef138866	Merge pull request #15797 from Security-Onion-Solutions/jertel/wip fix template annotation	2026-04-20 17:14:53 -04:00
Jason Ertel	71da27dc8e	fix template annotation	2026-04-20 17:02:25 -04:00
Josh Patterson	ee437265fc	monitor raid for vms	2026-04-20 12:00:02 -04:00
Mike Reeves	664f3fd18a	Fix soup	2026-04-01 14:47:05 -04:00