Pin NIC names by MAC via udev (run-once) from the common state

Add so-nic-pin, which writes by-MAC persistent-net udev rules pinning each physical NIC to its current name so a kernel upgrade can't renumber the interfaces Security Onion binds by name (host:mainint, sensor:mainint, bond0). Gated by the drop file /opt/so/state/nic_names_pinned: run-once on highstate, and an admin can pre-create the marker to opt out. Wired into common/init.sls as pin_nic_names, guarded by a matching unless.
Merge pull request #15966 from Security-Onion-Solutions/reyesj2-patch-8
2026-06-12 13:19:22 +02:00 · 2026-06-11 18:40:43 -04:00 · 2026-06-11 14:36:03 -05:00 · 2026-06-11 08:22:14 -04:00 · 2026-06-11 08:18:38 -04:00 · 2026-06-11 08:17:39 -04:00
47 changed files with 1118 additions and 174 deletions
@@ -11,6 +11,7 @@ body:
        -
        - 3.0.0
        - 3.1.0
+        - 3.2.0
        - Other (please provide detail below)
    validations:
      required: true
@@ -1,17 +1,17 @@
-### 3.0.0-20260331 ISO image released on 2026/03/31
+### 3.1.0-20260528 ISO image released on 2026/05/28


 ### Download and Verify

-3.0.0-20260331 ISO image:  
-https://download.securityonion.net/file/securityonion/securityonion-3.0.0-20260331.iso
+3.1.0-20260528 ISO image:  
+https://download.securityonion.net/file/securityonion/securityonion-3.1.0-20260528.iso
 
-MD5: ECD318A1662A6FDE0EF213F5A9BD4B07  
-SHA1: E55BE314440CCF3392DC0B06BC5E270B43176D9C  
-SHA256: 7FC47405E335CBE5C2B6C51FE7AC60248F35CBE504907B8B5A33822B23F8F4D5  
+MD5: 9D6FF58DEEE24089D722C73169765B3E  
+SHA1: 2B8B816B6CEC3B7F96B3C5E040EBF502DD2C412F  
+SHA256: 62FAB57E247C843D6A04F0796D8162C732B65D82FC3E4A59D087135B9FD32912  

 Signature for ISO image:  
-https://github.com/Security-Onion-Solutions/securityonion/raw/3/main/sigs/securityonion-3.0.0-20260331.iso.sig
+https://github.com/Security-Onion-Solutions/securityonion/raw/3/main/sigs/securityonion-3.1.0-20260528.iso.sig

 Signing key:  
 https://raw.githubusercontent.com/Security-Onion-Solutions/securityonion/3/main/KEYS  
@@ -25,22 +25,22 @@ wget https://raw.githubusercontent.com/Security-Onion-Solutions/securityonion/3/

 Download the signature file for the ISO:  
 ```
-wget https://github.com/Security-Onion-Solutions/securityonion/raw/3/main/sigs/securityonion-3.0.0-20260331.iso.sig
+wget https://github.com/Security-Onion-Solutions/securityonion/raw/3/main/sigs/securityonion-3.1.0-20260528.iso.sig
 ```

 Download the ISO image:  
 ```
-wget https://download.securityonion.net/file/securityonion/securityonion-3.0.0-20260331.iso
+wget https://download.securityonion.net/file/securityonion/securityonion-3.1.0-20260528.iso
 ```

 Verify the downloaded ISO image using the signature file:  
 ```
-gpg --verify securityonion-3.0.0-20260331.iso.sig securityonion-3.0.0-20260331.iso
+gpg --verify securityonion-3.1.0-20260528.iso.sig securityonion-3.1.0-20260528.iso
 ```

 The output should show "Good signature" and the Primary key fingerprint should match what's shown below:
 ```
-gpg: Signature made Mon 30 Mar 2026 06:22:14 PM EDT using RSA key ID FE507013
+gpg: Signature made Wed 27 May 2026 03:03:59 PM EDT using RSA key ID FE507013
 gpg: Good signature from "Security Onion Solutions, LLC <info@securityonionsolutions.com>"
 gpg: WARNING: This key is not certified with a trusted signature!
 gpg:          There is no indication that the signature belongs to the owner.
@@ -0,0 +1 @@
+
@@ -1 +1 @@
-3.1.0
+3.2.0
@@ -25,9 +25,11 @@ if [ ! -f $BACKUPFILE ]; then
  # Create empty backup file
  tar -cf $BACKUPFILE -T /dev/null

-  # Loop through all paths defined in global.sls, and append them to backup file
+  # Loop through all paths defined in global.sls, and append them to backup file if they exist
  {%- for LOCATION in BACKUPLOCATIONS %}
-  tar -rf $BACKUPFILE "${EXCLUSIONS[@]}" {{ LOCATION }}
+  if [[ -d {{ LOCATION }} || -f {{ LOCATION }} ]]; then
+    tar -rf $BACKUPFILE "${EXCLUSIONS[@]}" {{ LOCATION }}
+  fi
  {%- endfor %}

 fi
@@ -130,6 +130,17 @@ common_sbin:
      - so-pcap-import
 {% endif %}

+# Pin physical NIC names by MAC (run-once) so a kernel upgrade can't renumber the
+# interfaces SO binds by name. The marker keeps it a one-time setup; an admin can
+# pre-create the marker to opt out.
+pin_nic_names:
+  cmd.run:
+    - name: /usr/sbin/so-nic-pin
+    - unless: 'test -e /opt/so/state/nic_names_pinned'
+    - require:
+      - file: common_sbin
+      - file: statedir
+
 common_sbin_jinja:
  file.recurse:
    - name: /usr/sbin
@@ -165,6 +165,8 @@ if [[ $EXCLUDE_FALSE_POSITIVE_ERRORS == 'Y' ]]; then
    EXCLUDED_ERRORS="$EXCLUDED_ERRORS|upgrading component template"  # false positive (elasticsearch index or template names contain 'error')
    EXCLUDED_ERRORS="$EXCLUDED_ERRORS|upgrading composable template" # false positive (elasticsearch composable template names contain 'error')
    EXCLUDED_ERRORS="$EXCLUDED_ERRORS|Error while parsing document for index \[.ds-logs-kratos-so-.*object mapping for \[file\]" # false positive (mapping error occuring BEFORE kratos index has rolled over in 2.4.210)
+    EXCLUDED_ERRORS="$EXCLUDED_ERRORS|No such container"            # false positive (telegraf trying to run stats on an old container)
+    EXCLUDED_ERRORS="$EXCLUDED_ERRORS|passwords do not match"       # false positive (automated hydra test)
 fi

 if [[ $EXCLUDE_KNOWN_ERRORS == 'Y' ]]; then
@@ -0,0 +1,76 @@
+#!/bin/bash
+#
+# so-nic-pin — pin physical NIC names by permanent MAC via classic by-MAC udev
+#              rules, so a kernel upgrade can't renumber them.
+#
+# Security Onion binds its management and monitor interfaces BY NAME in pillar
+# (host:mainint, sensor:mainint, and bond0 is built on a specific physical NIC).
+# A kernel upgrade can change the kernel/systemd-udevd predictable-naming output
+# and renumber those NICs (e.g. enp1s0 -> enp2s0), which breaks the grid: the
+# pillar references a name that no longer exists and bond/bridge bring-up fails.
+#
+# This writes /etc/udev/rules.d/70-persistent-net.rules pinning each PHYSICAL NIC
+# to its CURRENT name by its PERMANENT MAC, freezing the names across future kernel
+# changes. It only writes the rules file; it does NOT live-trigger a rename (the
+# rules apply on the next boot/kernel, and a live rename would be disruptive).
+#
+# Run-once: gated by the drop file /opt/so/state/nic_names_pinned. If the marker is
+# present the script does nothing, so an admin can pre-create it to opt out. Invoked
+# from the common state on every highstate; the marker keeps it a one-time setup.
+
+NET_RULES_FILE="/etc/udev/rules.d/70-persistent-net.rules"
+MARKER="/opt/so/state/nic_names_pinned"
+
+log() { echo -e "[so-nic-pin] $*"; }
+
+# Echo "<name> <permanent-mac>" for every PHYSICAL NIC. A physical NIC is backed by a
+# real device (has device/driver), which excludes bond0/sobridge/docker0/veth*/lo whose
+# MACs are dynamic and must never be pinned. The PERMANENT MAC is used (ethtool -P, with
+# fallbacks), not the current one: an enslaved bond member's current MAC is rewritten to
+# the bond's, so matching on it would be wrong/ambiguous.
+physical_nics() {
+    local path n mac
+    for path in /sys/class/net/*; do
+        n="${path##*/}"
+        [ "$n" = "lo" ] && continue
+        [ -e "${path}/device/driver" ] || continue          # real device only
+        mac="$(ethtool -P "$n" 2>/dev/null | awk '/Permanent address/{print $NF}')"
+        case "$mac" in ""|00:00:00:00:00:00) mac="$(cat "${path}/bonding_slave/perm_hwaddr" 2>/dev/null)" ;; esac
+        case "$mac" in ""|00:00:00:00:00:00) mac="$(cat "${path}/address" 2>/dev/null)" ;; esac
+        case "$mac" in ""|00:00:00:00:00:00) continue ;; esac
+        echo "$n $mac"
+    done
+}
+
+# Turn "<name> <mac>" lines on stdin into classic by-MAC persistent-net udev rules.
+render_net_rules() {
+    echo "# Generated by so-nic-pin: pin NIC names by MAC so kernel upgrades can't renumber them."
+    echo "# Security Onion binds its management/monitor interfaces by name; do not hand-edit."
+    local n mac
+    while read -r n mac; do
+        [ -n "$n" ] || continue
+        printf 'SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="%s", NAME="%s"\n' \
+            "$mac" "$n"
+    done
+}
+
+[ "$(id -u)" -eq 0 ] || exit 0                   # salt runs us as root; bail quietly otherwise
+[ -e "${MARKER}" ] && exit 0                      # run-once guard (mirrors the state's unless)
+
+nics="$(physical_nics)"
+if [ -z "${nics}" ]; then
+    log "no physical NICs detected — nothing to pin (will retry on next highstate)"
+    exit 0                                         # do NOT drop the marker; let it retry later
+fi
+
+log "pinning physical NICs by permanent MAC:"
+echo "${nics}" | sed 's/^/    /'
+
+[ -f "${NET_RULES_FILE}" ] && cp -f "${NET_RULES_FILE}" "${NET_RULES_FILE}.bak"
+echo "${nics}" | render_net_rules > "${NET_RULES_FILE}" || {
+    log "ERROR: failed to write ${NET_RULES_FILE}"
+    exit 1
+}
+
+mkdir -p "$(dirname "${MARKER}")" && touch "${MARKER}"
+log "wrote ${NET_RULES_FILE} ($(grep -c '^SUBSYSTEM' "${NET_RULES_FILE}") NIC(s) pinned); dropped ${MARKER}"
@@ -26,7 +26,9 @@ include:
 wait_for_elasticsearch_elasticfleet:
  cmd.run:
    - name: so-elasticsearch-wait
+{% endif %}

+{% if GLOBALS.role == "so-fleet" %}
 # Sync Elastic Agent artifacts to Fleet Node
 elasticagent_syncartifacts:
  file.recurse:
@@ -99,6 +101,17 @@ so-elastic-fleet:
      - file: trusttheca
      - x509: etc_elasticfleet_key
      - x509: etc_elasticfleet_crt
+
+wait_for_so-elastic-fleet:
+  http.wait_for_successful_query:
+    - name: "https://localhost:8220/api/status"
+    - ssl: True
+    - verify_ssl: False
+    - status: 200
+    - wait_for: 300
+    - request_interval: 15
+    - require:
+      - docker_container: so-elastic-fleet
 {%   endif %}

 delete_so-elastic-fleet_so-status.disabled:
@@ -9,16 +9,20 @@

 include:
  - elasticfleet.config
+  - kibana.enabled

 # If enabled, automatically update Fleet Logstash Outputs
-{% if ELASTICFLEETMERGED.config.server.enable_auto_configuration and grains.role not in ['so-import', 'so-eval'] %}
+{% if ELASTICFLEETMERGED.config.server.enable_auto_configuration %}
+{%   if grains.role not in ['so-import', 'so-eval']%}
 so-elastic-fleet-auto-configure-logstash-outputs:
  cmd.run:
    - name: /usr/sbin/so-elastic-fleet-outputs-update
    - retry:
        attempts: 4
        interval: 30
-{% endif %}
+    - require:
+      - http: wait_for_so-kibana
+{%   endif %}

 # If enabled, automatically update Fleet Server URLs & ES Connection
 so-elastic-fleet-auto-configure-server-urls:
@@ -27,6 +31,9 @@ so-elastic-fleet-auto-configure-server-urls:
    - retry:
        attempts: 4
        interval: 30
+    - require:
+      - http: wait_for_so-kibana
+{% endif %}

 # Automatically update Fleet Server Elasticsearch URLs & Agent Artifact URLs
 so-elastic-fleet-auto-configure-elasticsearch-urls:
@@ -35,6 +42,8 @@ so-elastic-fleet-auto-configure-elasticsearch-urls:
    - retry:
        attempts: 4
        interval: 30
+    - require:
+      - http: wait_for_so-kibana

 so-elastic-fleet-auto-configure-artifact-urls:
  cmd.run:
@@ -42,6 +51,8 @@ so-elastic-fleet-auto-configure-artifact-urls:
    - retry:
        attempts: 4
        interval: 30
+    - require:
+      - http: wait_for_so-kibana

 so-elastic-fleet-package-statefile:
  file.managed:
@@ -53,7 +64,9 @@ so-elastic-fleet-package-upgrade:
    - name: /usr/sbin/so-elastic-fleet-package-upgrade
    - retry:
        attempts: 3
-        interval: 10
+        interval: 30
+    - require:
+      - http: wait_for_so-kibana
    - onchanges:
      - file: /opt/so/state/elastic_fleet_packages.txt

@@ -63,6 +76,8 @@ so-elastic-fleet-integrations:
    - retry:
        attempts: 3
        interval: 10
+    - require:
+      - http: wait_for_so-kibana

 so-elastic-agent-grid-upgrade:
  cmd.run:
@@ -70,6 +85,8 @@ so-elastic-agent-grid-upgrade:
    - retry:
        attempts: 12
        interval: 5
+    - require:
+      - http: wait_for_so-kibana

 so-elastic-fleet-integration-upgrade:
  cmd.run:
@@ -77,16 +94,22 @@ so-elastic-fleet-integration-upgrade:
    - retry:
        attempts: 3
        interval: 10
+    - require:
+      - http: wait_for_so-kibana

 {# Optional integrations script doesn't need the retries like so-elastic-fleet-integration-upgrade which loads the default integrations #}
 so-elastic-fleet-addon-integrations:
  cmd.run:
    - name: /usr/sbin/so-elastic-fleet-optional-integrations-load
+    - require:
+      - http: wait_for_so-kibana

 {% if ELASTICFLEETMERGED.config.defend_filters.enable_auto_configuration %}
 so-elastic-defend-manage-filters-file-watch:
  cmd.run:
    - name: python3 /sbin/so-elastic-defend-manage-filters.py -c /opt/so/conf/elasticsearch/curl.config -d /opt/so/conf/elastic-fleet/defend-exclusions/disabled-filters.yaml -i /nsm/securityonion-resources/event_filters/ -i /opt/so/conf/elastic-fleet/defend-exclusions/rulesets/custom-filters/ &>> /opt/so/log/elasticfleet/elastic-defend-manage-filters.log
+    - require:
+      - http: wait_for_so-kibana
    - onchanges:
      - file: elasticdefendcustom
      - file: elasticdefenddisabled
@@ -108,9 +108,12 @@ if [ ! -f /opt/so/state/eaintegrations.txt ]; then
  done

  # Only create the state file if all policies were created/updated successfully
-  if [[ "$RETURN_CODE" != "1" ]]; then
+  if [[ $RETURN_CODE -eq 0 ]]; then
    touch /opt/so/state/eaintegrations.txt
+  else
+    exit 1
  fi
 else
-  exit $RETURN_CODE
+  echo "Fleet integration policies already loaded."
+  exit 0
 fi
@@ -8,18 +8,33 @@

 . /usr/sbin/so-elastic-fleet-common

+PKG_LOAD_FAILURES=0
+PKG_LOAD_FAILURES_NAMES=()
+
 {%- for PACKAGE in SUPPORTED_PACKAGES %}
 echo "Upgrading {{ PACKAGE }} package..."
 if VERSION=$(elastic_fleet_package_latest_version_check "{{ PACKAGE }}"); then
    if ! elastic_fleet_package_install "{{ PACKAGE }}" "$VERSION"; then
-        # exit 1 on failure to upgrade a default package, allow salt to handle retries
-        echo -e "\nERROR: Failed to upgrade $PACKAGE to version: $VERSION"
-        exit 1
+        PKG_LOAD_FAILURES=$((PKG_LOAD_FAILURES + 1))
+        PKG_LOAD_FAILURES_NAMES+=("{{ PACKAGE }}")
    fi
 else
-    echo -e "\nERROR: Failed to get version information for integration $PACKAGE"
+    PKG_LOAD_FAILURES=$((PKG_LOAD_FAILURES + 1))
+    PKG_LOAD_FAILURES_NAMES+=("{{ PACKAGE }}")
 fi
 echo
 {%- endfor %}
+
+if [ $PKG_LOAD_FAILURES -gt 0 ]; then
+    echo "ERROR: Failed to upgrade $PKG_LOAD_FAILURES package(s):"
+    for PKG in "${PKG_LOAD_FAILURES_NAMES[@]}"; do
+        echo " - $PKG"
+    done
+    # exit 1 on failure to upgrade a default package, allow salt to handle retries
+    exit 1
+else
+    echo "Successfully upgraded all packages."
+fi
+
 echo
 /usr/sbin/so-elasticsearch-templates-load
@@ -9,9 +9,12 @@
 {%   from 'elasticsearch/config.map.jinja' import ELASTICSEARCHMERGED %}
 {%   from 'elasticsearch/template.map.jinja' import ES_INDEX_SETTINGS, SO_MANAGED_INDICES %}
 {%   if GLOBALS.role != 'so-heavynode' %}
-{%     from 'elasticsearch/template.map.jinja' import ALL_ADDON_SETTINGS %}
+{%     from 'elasticsearch/template.map.jinja' import ALL_ADDON_SETTINGS, ADDON_INDICES %}
 {%   endif %}

+include:
+  - elasticsearch.enabled
+
 escomponenttemplates:
  file.recurse:
    - name: /opt/so/conf/elasticsearch/templates/component
@@ -35,6 +38,20 @@ so_index_template_dir:
      {%- endfor %}
    {%- endif %}

+{%  if GLOBALS.role != "so-heavynode" %}
+# Clean up legacy and non-SO managed templates from the elasticsearch/templates/addon-index/ directory
+addon_index_template_dir:
+  file.directory:
+    - name: /opt/so/conf/elasticsearch/templates/addon-index
+    - clean: True
+    {%- if ADDON_INDICES %}
+    - require:
+      {%- for index in ADDON_INDICES %}
+      - file: addon_index_template_{{index}}
+      {%- endfor %}
+    {%- endif %}
+{%  endif %}
+
 # Auto-generate index templates for SO managed indices (directly defined in elasticsearch/defaults.yaml)
 #   These index templates are for the core SO datasets and are always required
 {%  for index, settings in ES_INDEX_SETTINGS.items() %}
@@ -3958,10 +3958,13 @@ elasticsearch:
        - vulnerability-mappings
        - common-settings
        - common-dynamic-mappings
+        - logs-redis.log@package
+        - logs-redis.log@custom
        data_stream:
          allow_custom_routing: false
          hidden: false
-        ignore_missing_component_templates: []
+        ignore_missing_component_templates:
+        - logs-redis.log@custom
        index_patterns:
        - logs-redis.log*
        priority: 501
@@ -0,0 +1,71 @@
+{
+    "description": "zeek.ja4d",
+    "processors": [
+        {
+            "set": {
+                "field": "event.dataset",
+                "value": "ja4d"
+            }
+        },
+        {
+            "remove": {
+                "field": [
+                    "host"
+                ],
+                "ignore_failure": true
+            }
+        },
+        {
+            "json": {
+                "field": "message",
+                "target_field": "message2",
+                "ignore_failure": true
+            }
+        },
+        {
+            "rename": {
+                "field": "message2.ja4d",
+                "target_field": "hash.ja4d",
+                "ignore_missing": true,
+                "if": "ctx?.message2?.ja4d != null && ctx.message2.ja4d.length() > 0"
+            }
+        },
+        {
+            "rename": {
+                "field": "message2.client_mac",
+                "target_field": "host.mac",
+                "ignore_missing": true,
+                "if": "ctx?.message2?.client_mac != null && ctx.message2.client_mac.length() > 0"
+            }
+        },
+        {
+            "rename": {
+                "field": "message2.hostname",
+                "target_field": "host.hostname",
+                "ignore_missing": true,
+                "if": "ctx?.message2?.hostname != null && ctx.message2.hostname.length() > 0"
+            }
+        },
+        {
+            "rename": {
+                "field": "message2.requested_ip",
+                "target_field": "dhcp.requested_address",
+                "ignore_missing": true,
+                "if": "ctx?.message2?.requested_ip != null && ctx.message2.requested_ip.length() > 0"
+            }
+        },
+        {
+            "rename": {
+                "field": "message2.vendor_class_id",
+                "target_field": "zeek.ja4d.vendor_class_id",
+                "ignore_missing": true,
+                "if": "ctx?.message2?.vendor_class_id != null && ctx.message2.vendor_class_id.length() > 0"
+            }
+        },
+        {
+            "pipeline": {
+                "name": "zeek.common"
+            }
+        }
+    ]
+}
@@ -61,15 +61,25 @@
 {% if ALL_ADDON_SETTINGS_ORIG.keys() | length > 0 %}
 {%   for index in ALL_ADDON_SETTINGS_ORIG.keys() %}
 {%     do ALL_ADDON_SETTINGS_GLOBAL_OVERRIDES.update({index: salt['defaults.merge'](ALL_ADDON_SETTINGS_ORIG[index], PILLAR_GLOBAL_OVERRIDES, in_place=False)}) %}
+{#     Explicitly excluding addon indices from ES_INDEX_SETTINGS_ORIG
+         When manager.soc_managed_annotations runs, new entries are added to the salt/elasticsearch/defaults.yaml file to support 'revert to default' functionality.
+         Subsequent map renders will then incorrectly include 'integration X' in 'ES_INDEX_SETTINGS_ORIG' due to being in the defaults.yaml file. #}
+{%     if index in ES_INDEX_SETTINGS_ORIG.keys() %}
+{%       do ES_INDEX_SETTINGS_ORIG.pop(index) %}
+{%     endif %}
 {%   endfor %}
 {% endif %}

 {% set ES_INDEX_SETTINGS = {} %}
-{% macro create_final_index_template(DEFINED_SETTINGS, GLOBAL_OVERRIDES, FINAL_INDEX_SETTINGS) %}
+{% macro create_final_index_template(DEFINED_SETTINGS, GLOBAL_OVERRIDES, FINAL_INDEX_SETTINGS, EXCLUDE_INDICES=[]) %}

 {% do GLOBAL_OVERRIDES.update(salt['defaults.merge'](GLOBAL_OVERRIDES, ES_INDEX_PILLAR, in_place=False)) %}
 {% for index, settings in GLOBAL_OVERRIDES.items() %}

+{%   if index in EXCLUDE_INDICES %}
+{%     continue %}
+{%   endif %}
+
 {#   prevent this action from being performed on custom defined indices. #}
 {#   the custom defined index is not present in either of the dictionaries and fails to reder. #}
 {%   if index in DEFINED_SETTINGS and index in GLOBAL_OVERRIDES %}
@@ -150,10 +160,19 @@
 {% endfor %}
 {% endmacro %}

-{{ create_final_index_template(ES_INDEX_SETTINGS_ORIG, ES_INDEX_SETTINGS_GLOBAL_OVERRIDES, ES_INDEX_SETTINGS) }}
-{{ create_final_index_template(ALL_ADDON_SETTINGS_ORIG, ALL_ADDON_SETTINGS_GLOBAL_OVERRIDES, ALL_ADDON_SETTINGS) }}
+{# Exclude addon integrations from final ES_INDEX_SETTINGS #}
+{{ create_final_index_template(ES_INDEX_SETTINGS_ORIG, ES_INDEX_SETTINGS_GLOBAL_OVERRIDES, ES_INDEX_SETTINGS, ALL_ADDON_SETTINGS_ORIG.keys() | list ) }}
+
+{# Exclude SO managed indices, otherwise ALL_ADDON_SETTINGS will include pillar values
+  of core integrations without merging defaults, resulting in an overlapping, but bad index template being generated. #}
+{{ create_final_index_template(ALL_ADDON_SETTINGS_ORIG, ALL_ADDON_SETTINGS_GLOBAL_OVERRIDES, ALL_ADDON_SETTINGS, ES_INDEX_SETTINGS_ORIG.keys() | list ) }}

 {% set SO_MANAGED_INDICES = [] %}
 {% for index, settings in ES_INDEX_SETTINGS.items() %}
 {%   do SO_MANAGED_INDICES.append(index) %}
-{% endfor %}
+{% endfor %}
+
+{% set ADDON_INDICES = [] %}
+{% for index, settings in ALL_ADDON_SETTINGS.items() %}
+{%   do ADDON_INDICES.append(index) %}
+{% endfor %}
@@ -6,6 +6,7 @@
 {% from 'allowed_states.map.jinja' import allowed_states %}
 {% if sls.split('.')[0] in allowed_states %}
 {%   from 'docker/docker.map.jinja' import DOCKERMERGED %}
+{%   from 'elasticsearch/config.map.jinja' import ELASTICSEARCHMERGED %}
 {%   from 'vars/globals.map.jinja' import GLOBALS %}

 include:
@@ -60,6 +61,19 @@ so-kibana:
    - watch:
      - file: kibanaconfig

+wait_for_so-kibana:
+  http.wait_for_successful_query:
+    - name: "http://localhost:5601/api/status"
+    - username: 'so_elastic'
+    - password: '{{ ELASTICSEARCHMERGED.auth.users.so_elastic_user.pass }}'
+    - ssl: True
+    - verify_ssl: False
+    - status: 200
+    - wait_for: 300
+    - request_interval: 15
+    - require:
+      - docker_container: so-kibana
+
 delete_so-kibana_so-status.disabled:
  file.uncomment:
    - name: /opt/so/conf/so-status/so-status.conf
@@ -103,7 +103,7 @@ kratos:
  config:
    session:
      lifespan: 
-        description: Defines the length of a login session.
+        description: Defines the length of a login session before it will timeout, and require a new login.
        global: True
        helpLink: kratos
      whoami:
@@ -31,11 +31,13 @@ sync_es_users:
      - http: wait_for_kratos
      - file: so-user.lock # require so-user.lock file to be missing

-# we dont want this added too early in setup, so we add the onlyif to verify 'startup_states: highstate'
-# is in the minion config. That line is added before the final highstate during setup
+# we dont want this added too early in setup, so the onlyif gates on the
+# /opt/so/state/setup-complete marker. The marker is written by
+# mark_setup_complete in setup/so-functions just before the final setup
+# highstate (and by an upgrade-path state for systems set up under the old gate).
 so-user_sync:
  cron.present:
    - user: root
    - name: 'PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin /usr/sbin/so-user sync &>> /opt/so/log/soc/sync.log'
    - identifier: so-user_sync
-    - onlyif: "grep -x 'startup_states: highstate' /etc/salt/minion"
+    - onlyif: "test -e /opt/so/state/setup-complete"
@@ -0,0 +1,117 @@
+#!/bin/bash
+#
+# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
+# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
+# https://securityonion.net/license; you may not use this file except in compliance with the
+# Elastic License 2.0.
+
+# Runs once per boot on managers (via so-boot-mine-update.service), before
+# so-boot-highstate.service. Waits for the responsive minion set to settle, pushes
+# mine.update, waits until every up minion has actually reported to the mine, then
+# warms the master's per-minion pillar cache so the mine-backed node pillars (node
+# IPs, ES/Redis/Logstash/hypervisor discovery -- some glob- and some pillar/grain-
+# targeted) are complete before the boot highstate renders them. Otherwise a node
+# that is up but not yet fully reported gets dropped from those pillars and torn
+# out of the configs they build (e.g. so-elasticsearch ExtraHosts -> container recreate).
+
+MAX_WAIT=${MINE_UPDATE_MAX_WAIT:-180}   # hard backstop only
+INTERVAL=10
+STABLE_CHECKS=3                          # up-count must hold steady this many polls
+elapsed=0
+prev=-1
+stable=0
+up=0
+
+# Wait for the *reachable* minion set to settle rather than for every accepted
+# key to report up: an operator may accept a minion's key and then intentionally
+# power off that host, so requiring up >= accepted would never be satisfied and
+# we'd always burn the full MAX_WAIT. Once the responsive count stops growing we
+# stop waiting and run mine.update against whoever is up.
+while [ "$elapsed" -lt "$MAX_WAIT" ]; do
+  up=$(/usr/bin/salt-run manage.up --out=json 2>/dev/null \
+    | python3 -c 'import sys,json; print(len(json.load(sys.stdin)))' 2>/dev/null)
+  up=${up:-0}
+  if [ "$up" -gt 0 ] && [ "$up" -eq "$prev" ]; then
+    stable=$((stable + 1))
+    [ "$stable" -ge "$STABLE_CHECKS" ] && break
+  else
+    stable=0
+  fi
+  prev=$up
+  sleep "$INTERVAL"
+  elapsed=$((elapsed + INTERVAL))
+done
+
+echo "so-boot-mine-update: ${up} minions up (settled after ${elapsed}s); running mine.update"
+/usr/bin/salt '*' mine.update --out=txt
+
+# A node that is up but has not yet re-reported network.ip_addrs to the mine is
+# silently dropped from mine-backed pillars (elasticsearch:nodes, node_data, ...)
+# when highstate recompiles them -- which e.g. removes it from so-elasticsearch
+# ExtraHosts and forces a container recreate. After the broad mine.update above,
+# wait until every up minion actually has network.ip_addrs in the mine, re-pushing
+# mine.update to stragglers, before releasing the boot highstate. Bounded by the
+# same MAX_WAIT backstop so a slow/down node never blocks boot indefinitely.
+missing=""
+while [ "$elapsed" -lt "$MAX_WAIT" ]; do
+  up_json=$(/usr/bin/salt-run manage.up --out=json 2>/dev/null)
+  mine_json=$(/usr/bin/salt-run mine.get '*' network.ip_addrs tgt_type=glob --out=json 2>/dev/null)
+  missing=$(printf '%s' "$up_json" | python3 -c '
+import sys, json
+up = set(json.load(sys.stdin) or [])
+mine = {k for k, v in (json.loads(sys.argv[1]) or {}).items() if v}
+print("\n".join(sorted(up - mine)))
+' "$mine_json" 2>/dev/null)
+  if [ -z "$missing" ]; then
+    echo "so-boot-mine-update: mine complete for all up minions after ${elapsed}s"
+    break
+  fi
+  echo "so-boot-mine-update: mine missing up minion(s): $(echo $missing); re-running mine.update"
+  for m in $missing; do /usr/bin/salt "$m" mine.update --out=txt; done
+  sleep "$INTERVAL"
+  elapsed=$((elapsed + INTERVAL))
+done
+[ -n "$missing" ] && echo "so-boot-mine-update: WARNING ${MAX_WAIT}s backstop hit; up minion(s) still absent from mine: $(echo $missing); highstate may drop them from configs"
+
+# The pillar/compound-targeted node pillars (elasticsearch:nodes, redis:nodes,
+# logstash:nodes, hypervisor:nodes) resolve their target against the master's
+# per-minion data cache (grains+pillar in .../minions/<id>/data.p), populated only
+# when a minion's pillar is (re)compiled -- separately from the mine. A freshly
+# booted node can be in the mine (glob/node_data sees it) yet absent from that
+# cache, so it is dropped from those pillars and from the configs they build (e.g.
+# so-elasticsearch ExtraHosts). Force a synchronous pillar refresh so the master
+# caches every up node's pillar; refresh_pillar wait=True returns only once the
+# pillar is recompiled (and thus cached for matching). Retry stragglers <= MAX_WAIT.
+echo "so-boot-mine-update: warming master pillar cache for pillar/grain-targeted node pillars"
+/usr/bin/salt '*' saltutil.refresh_pillar wait=True --out=txt
+missing=""
+while [ "$elapsed" -lt "$MAX_WAIT" ]; do
+  up_json=$(/usr/bin/salt-run manage.up --out=json 2>/dev/null)
+  cached_json=$(/usr/bin/salt-run cache.pillar tgt='*' --out=json 2>/dev/null)
+  missing=$(printf '%s' "$up_json" | python3 -c '
+import sys, json
+up = set(json.load(sys.stdin) or [])
+cached = {k for k, v in (json.loads(sys.argv[1]) or {}).items() if v}
+print("\n".join(sorted(up - cached)))
+' "$cached_json" 2>/dev/null)
+  if [ -z "$missing" ]; then
+    echo "so-boot-mine-update: pillar cache warm for all up minions after ${elapsed}s"
+    break
+  fi
+  echo "so-boot-mine-update: pillar not yet cached for: $(echo $missing); refreshing"
+  for m in $missing; do /usr/bin/salt "$m" saltutil.refresh_pillar wait=True --out=txt; done
+  sleep "$INTERVAL"
+  elapsed=$((elapsed + INTERVAL))
+done
+[ -n "$missing" ] && echo "so-boot-mine-update: WARNING ${MAX_WAIT}s backstop hit; pillar not cached for: $(echo $missing); pillar-targeted pillars may drop them"
+
+# Log what the mine-backed pillars render so the boot-time state is inspectable.
+/usr/bin/salt-call saltutil.refresh_pillar >/dev/null 2>&1
+sleep 2
+for key in node_data elasticsearch:nodes; do
+  rendered=$(/usr/bin/salt-call --out=json pillar.get "$key" 2>/dev/null \
+    | python3 -c 'import sys,json; print(json.dumps(json.load(sys.stdin).get("local"), indent=2, sort_keys=True))' 2>/dev/null)
+  echo "so-boot-mine-update: ${key} rendered as:"
+  echo "${rendered:-null}"
+done
+exit 0
@@ -188,13 +188,6 @@ airgap_update_dockers() {
  fi
 }

-backup_old_states_pillars() {
-
-	tar czf /nsm/backup/$(echo $INSTALLEDVERSION)_$(date +%Y%m%d-%H%M%S)_soup_default_states_pillars.tar.gz /opt/so/saltstack/default/
-	tar czf /nsm/backup/$(echo $INSTALLEDVERSION)_$(date +%Y%m%d-%H%M%S)_soup_local_states_pillars.tar.gz /opt/so/saltstack/local/
-
-}
-
 update_registry() {
  docker stop so-dockerregistry
  docker rm so-dockerregistry
@@ -370,8 +363,9 @@ preupgrade_changes() {
    # This function is to add any new pillar items if needed.
    echo "Checking to see if changes are needed."

-    [[ "$INSTALLEDVERSION" =~ ^2\.4\.21[0-9]+$ ]] && up_to_3.0.0   
+    [[ "$INSTALLEDVERSION" =~ ^2\.4\.21[0-9]+$ ]] && up_to_3.0.0
    [[ "$INSTALLEDVERSION" == "3.0.0" ]] && up_to_3.1.0
+    [[ "$INSTALLEDVERSION" == "3.1.0" ]] && up_to_3.2.0
    true
 }

@@ -381,6 +375,7 @@ postupgrade_changes() {

    [[ "$POSTVERSION" =~ ^2\.4\.21[0-9]+$ ]] && post_to_3.0.0
    [[ "$POSTVERSION" == "3.0.0" ]] && post_to_3.1.0
+    [[ "$POSTVERSION" == "3.1.0" ]] && post_to_3.2.0
    true
 }

@@ -533,6 +528,23 @@ elasticfleet_set_agent_logging_level_warn() {
    done <<< "$policies_to_update"
 }

+update_logstash_pipeline_name() {
+    local original_pipeline_name="$1"
+    local new_pipeline_name="$2"
+
+    echo "Checking for conflicting logstash defined_pipelines pillar value."
+    local LOGSTASH_FILE=/opt/so/saltstack/local/pillar/logstash/soc_logstash.sls
+    local MINIONDIR=/opt/so/saltstack/local/pillar/minions
+    for pillar_file in "$LOGSTASH_FILE" "$MINIONDIR"/*.sls; do
+        [[ -f "$pillar_file" ]] || continue
+        if grep -q "$original_pipeline_name$" "$pillar_file"; then
+            echo "Found conflicting defined_pipeline pillar value in $pillar_file. Updating to use the new logstash pipeline name."
+            sed -i "s#$original_pipeline_name\$#$new_pipeline_name#g" "$pillar_file"
+            chown socore:socore "$pillar_file"
+        fi
+    done
+}
+
 check_transform_health_and_reauthorize() {
    . /usr/sbin/so-elastic-fleet-common

@@ -676,6 +688,10 @@ rename_strelka_scan_lnk() {
  rm -f "$TMP_VALUE_FILE"
 }

+fix_logstash_0013_lumberjack_pipeline_name() {
+    update_logstash_pipeline_name "so/0013_input_lumberjack_fleet.conf" "so/0013_input_lumberjack_fleet.conf.jinja"
+}
+
 up_to_3.1.0() {
  ensure_postgres_local_pillar
  ensure_postgres_secret
@@ -684,6 +700,7 @@ up_to_3.1.0() {
  # Clear existing component template state file.
  rm -f /opt/so/state/esfleet_component_templates.json
  rename_strelka_scan_lnk
+  fix_logstash_0013_lumberjack_pipeline_name

  INSTALLEDVERSION=3.1.0
 }
@@ -720,6 +737,48 @@ post_to_3.1.0() {

 ### 3.1.0 End ###

+### 3.2.0 Scripts ###
+
+bootstrap_so_soc_database() {
+  # init-db.sh is mounted into so-postgres at /docker-entrypoint-initdb.d/init-db.sh
+  # and runs automatically only on a fresh data directory. Hosts upgrading from
+  # 3.1.0 already have /nsm/postgres populated, so the so_soc bootstrap block
+  # added in 3.2 never fires. Re-run the script explicitly; it's idempotent.
+  echo "Bootstrapping so_soc database via init-db.sh."
+  # The postgres image has no USER directive, so `docker exec` defaults to
+  # root, and the container env intentionally omits POSTGRES_USER (the upstream
+  # entrypoint defaults it transiently during first-init only). Recreate both
+  # so psql inside init-db.sh resolves the connect user correctly.
+  local exec_cmd="docker exec -u postgres -e POSTGRES_USER=postgres so-postgres bash /docker-entrypoint-initdb.d/init-db.sh"
+  if ! /usr/sbin/so-postgres-wait; then
+    FINAL_MESSAGE_QUEUE+=("WARNING: so-postgres was not ready during the 3.2.0 upgrade; the so_soc database may not have been bootstrapped. Re-run manually: $exec_cmd")
+    return 0
+  fi
+  if ! $exec_cmd; then
+    FINAL_MESSAGE_QUEUE+=("WARNING: init-db.sh failed inside so-postgres during the 3.2.0 upgrade; the so_soc database may not have been bootstrapped. Re-run manually: $exec_cmd")
+    return 0
+  fi
+  echo "so_soc bootstrap complete."
+}
+
+up_to_3.2.0() {
+  fix_logstash_0013_lumberjack_pipeline_name
+
+  INSTALLEDVERSION=3.2.0
+}
+
+post_to_3.2.0() {
+  bootstrap_so_soc_database
+
+  # Including agent regen script here since it was missed in post_to_3.1.0
+  echo "Regenerating Elastic Agent Installers"
+  /sbin/so-elastic-agent-gen-installers
+
+  POSTVERSION=3.2.0
+}
+
+### 3.2.0 End ###
+

 repo_sync() {
  echo "Sync the local repo."
@@ -971,6 +1030,9 @@ verify_es_version_compatibility() {
    local is_active_intermediate_upgrade=1
    # supported upgrade paths for SO-ES versions
    declare -A es_upgrade_map=(
+        ["8.18.4"]="8.18.6 8.18.8 9.0.8"
+	    ["8.18.6"]="8.18.8 9.0.8"
+	    ["8.18.8"]="9.0.8"
        ["9.0.8"]="9.3.3"
    )

@@ -994,6 +1056,171 @@ verify_es_version_compatibility() {
        exit 160
    fi

+    compatible_es_versions="$target_es_version"
+    for current_version in "${!es_upgrade_map[@]}"; do
+        # shellcheck disable=SC2076
+        if [[ " ${es_upgrade_map[$current_version]} " =~ " $target_es_version " ]]; then
+            compatible_es_versions+=" $current_version"
+        fi
+    done
+
+    # Check if the given ES version can directly upgrade to the target ES version. Used to assist with catching lagging nodes during the upgrade process
+    es_version_can_upgrade_to_target() {
+        local current_version="$1"
+        # shellcheck disable=SC2076
+        if [[ -n "$current_version" && " $compatible_es_versions " =~ " $current_version " ]]; then
+            return 0
+        fi
+
+        return 1
+    }
+
+    # Gather Elasticsearch cluster version info and verify that each node in the cluster is running a version compatible with the target ES version.
+    verify_searchnodes_es_target_compatibility() {
+        local retries=20
+        local retry_count=0
+        local delay=180
+        local expected_es_nodes searchnode_minions attempt
+        local searchnode_discovery_success=false
+        SEARCHNODE_ES_VERSIONS=""
+
+        for attempt in {1..3}; do
+            if searchnode_minions=$(set -o pipefail; salt-key --out=json --list=accepted 2> /dev/null | jq -r '.minions[]? | select(endswith("searchnode"))'); then
+                searchnode_discovery_success=true
+                break
+            fi
+
+            echo "Failed to retrieve grid searchnodes via salt-key... Retrying in 30 seconds. Attempt $attempt of 3."
+            sleep 30
+        done
+
+        if [[ "$searchnode_discovery_success" != "true" ]]; then
+            echo "Failed to retrieve grid searchnodes via salt-key."
+            return 1
+        fi
+
+        # Always add node running soup to expected es nodes
+        expected_es_nodes="${MINIONID%_*}"
+        while IFS= read -r searchnode_minion; do
+            [[ -z "$searchnode_minion" ]] && continue
+            expected_es_nodes+=$'\n'"${searchnode_minion%_searchnode}"
+        done <<< "$searchnode_minions"
+
+        while [[ $retry_count -lt $retries ]]; do
+            SEARCHNODE_ES_VERSIONS=$(so-elasticsearch-query _nodes/_all/version --retry 5 --retry-delay 10 --fail 2>&1)
+            local exit_status=$?
+
+            if [[ $exit_status -ne 0 ]]; then
+                echo "Failed to retrieve Elasticsearch versions from searchnodes... Retrying in $delay seconds. Attempt $((retry_count + 1)) of $retries."
+                ((retry_count++))
+                sleep $delay
+                continue
+            fi
+
+            local all_searchnodes_compatible=true
+            while IFS=$'\t' read -r node current_version; do
+                [[ -z "$node" ]] && continue
+                if ! es_version_can_upgrade_to_target "$current_version"; then
+                    echo "Searchnode $node is running Elasticsearch $current_version, which is not directly upgradable to Elasticsearch $target_es_version."
+                    all_searchnodes_compatible=false
+                fi
+            done < <(echo "$SEARCHNODE_ES_VERSIONS" | jq -r '.nodes | to_entries[] | [.value.name, .value.version] | @tsv')
+
+            while IFS= read -r expected_es_node; do
+                [[ -z "$expected_es_node" ]] && continue
+                if ! echo "$SEARCHNODE_ES_VERSIONS" | jq -e --arg node "$expected_es_node" '.nodes | to_entries | any(.value.name == $node)' > /dev/null; then
+                    echo "Searchnode $expected_es_node did not report an Elasticsearch version. It may be offline or still upgrading."
+                    all_searchnodes_compatible=false
+                fi
+            done <<< "$expected_es_nodes"
+
+            if [[ "$all_searchnodes_compatible" == true ]]; then
+                echo "All Searchnodes are upgradable to Elasticsearch $target_es_version."
+                return 0
+            fi
+
+            echo "One or more Searchnodes cannot upgrade directly to Elasticsearch $target_es_version. Rechecking in $delay seconds. Attempt $((retry_count + 1)) of $retries."
+            ((retry_count++))
+            sleep $delay
+        done
+
+        return 1
+    }
+
+    # Gather heavynode version info and verify that each node is running a version compatible with the target ES version.
+    verify_heavynodes_es_target_compatibility() {
+        local heavynode_minions attempt
+        local retries=20
+        local retry_count=0
+        local delay=180
+        local heavynode_discovery_success=false
+        HEAVYNODE_ES_VERSIONS=""
+
+        for attempt in {1..3}; do
+            if heavynode_minions=$(set -o pipefail; salt-key --out=json --list=accepted 2> /dev/null | jq -r '.minions[]? | select(endswith("heavynode"))'); then
+                heavynode_discovery_success=true
+                break
+            fi
+
+            echo "Failed to retrieve grid heavynodes via salt-key... Retrying in 30 seconds. Attempt $attempt of 3."
+            sleep 30
+        done
+
+        if [[ "$heavynode_discovery_success" != "true" ]]; then
+            echo "Failed to retrieve grid heavynodes via salt-key."
+            return 1
+        fi
+
+        if [[ -z "$heavynode_minions" ]]; then
+            echo "No heavynodes detected. Skipping heavynode Elasticsearch version compatibility check."
+            return 0
+        fi
+
+        while [[ $retry_count -lt $retries ]]; do
+            HEAVYNODE_ES_VERSIONS=$(salt -C 'G@role:so-heavynode' cmd.run 'set -o pipefail; so-elasticsearch-query / --retry 5 --retry-delay 10 | jq -er ".version.number"' shell=/bin/bash --out=json 2> /dev/null)
+            local exit_status=$?
+
+            if [[ $exit_status -ne 0 ]]; then
+                echo "Failed to retrieve Elasticsearch version from one or more heavynodes... Retrying in $delay seconds. Attempt $((retry_count + 1)) of $retries."
+                ((retry_count++))
+                sleep $delay
+                continue
+            fi
+
+            local all_heavynodes_compatible=true
+            while IFS=$'\t' read -r node current_version; do
+                [[ -z "$node" ]] && continue
+                if ! es_version_can_upgrade_to_target "$current_version"; then
+                    echo "Heavynode $node is running Elasticsearch $current_version, which is not directly upgradable to Elasticsearch $target_es_version."
+                    all_heavynodes_compatible=false
+                fi
+            done < <(echo "$HEAVYNODE_ES_VERSIONS" | jq -r 'to_entries[] | [.key, .value] | @tsv')
+
+            while IFS= read -r heavynode_minion; do
+                [[ -z "$heavynode_minion" ]] && continue
+                if ! echo "$HEAVYNODE_ES_VERSIONS" | jq -se --arg minion "$heavynode_minion" 'add | has($minion)' > /dev/null; then
+                    echo "Heavynode $heavynode_minion did not report an Elasticsearch version. It may be offline or still upgrading."
+                    all_heavynodes_compatible=false
+                fi
+            done <<< "$heavynode_minions"
+
+            if [[ "$all_heavynodes_compatible" == true ]]; then
+                echo -e "\nAll heavynodes can upgrade to Elasticsearch $target_es_version."
+                return 0
+            fi
+
+            echo "One or more heavynodes cannot upgrade directly to Elasticsearch $target_es_version. Rechecking in $delay seconds. Attempt $((retry_count + 1)) of $retries."
+            ((retry_count++))
+            sleep $delay
+        done
+
+        return 1
+    }
+
+    if [[ ! -f "$es_verification_script" ]]; then
+        create_intermediate_upgrade_verification_script "$es_verification_script"
+    fi
+
    for statefile in "${es_required_version_statefile_base}"-*; do
        [[ -f $statefile ]] || continue

@@ -1012,10 +1239,6 @@ verify_es_version_compatibility() {
            continue
        fi

-        if [[ ! -f "$es_verification_script" ]]; then
-            create_intermediate_upgrade_verification_script "$es_verification_script"
-        fi
-
        echo -e "\n##############################################################################################################################\n"
        echo "A previously required intermediate Elasticsearch upgrade was detected. Verifying that all Searchnodes/Heavynodes have successfully upgraded Elasticsearch to $es_required_version_statefile_value before proceeding with soup to avoid potential data loss! This command can take up to an hour to complete."
        if ! timeout --foreground 4000 bash "$es_verification_script" "$es_required_version_statefile_value" "$statefile"; then
@@ -1037,6 +1260,26 @@ verify_es_version_compatibility() {

    # shellcheck disable=SC2076 # Do not want a regex here eg usage " 8.18.8 9.0.8 " =~ " 9.0.8 "
    if [[ " ${es_upgrade_map[$es_version]} " =~ " $target_es_version " || "$es_version" == "$target_es_version" ]]; then
+        if ! verify_searchnodes_es_target_compatibility || ! verify_heavynodes_es_target_compatibility; then
+            echo -e "\n!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!\n"
+
+            echo "One or more Searchnode(s)/Heavynode(s) cannot upgrade directly to Elasticsearch $target_es_version. This can happen with soups that include Elasticsearch upgrades being run in quick succession. Typically, this will resolve itself as the grid synchronizes. Please allow time for all Searchnodes/Heavynodes to have upgraded Elasticsearch to a compatible version with $target_es_version before running soup again to avoid potential data loss!"
+
+            if [[ -n "$HEAVYNODE_ES_VERSIONS" ]]; then
+                echo "Current heavynode Elasticsearch versions:"
+                echo "$HEAVYNODE_ES_VERSIONS" | jq '.'
+            fi
+
+            if [[ -n "$SEARCHNODE_ES_VERSIONS" ]]; then
+                echo "Current searchnode Elasticsearch versions:"
+                echo "$SEARCHNODE_ES_VERSIONS" | jq '.nodes | to_entries | map({(.value.name): .value.version}) | sort | add'
+            fi
+
+            echo -e "\n!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!\n"
+
+            exit 161
+        fi
+
        # supported upgrade
        return 0
    else
@@ -1322,7 +1565,7 @@ EOF

 # Keeping this block in case we need to do a hotfix that requires salt update
 apply_hotfix() {
-   echo "No actions required. ($INSTALLEDVERSION/$HOTFIXVERSION)"
+    echo "No actions required. ($INSTALLEDVERSION/$HOTFIXVERSION)"
 }

 failed_soup_restore_items() {
@@ -1394,13 +1637,13 @@ main() {
  echo "Verifying we have the latest soup script."
  verify_latest_update_script

-  echo "Verifying Elasticsearch version compatibility before upgrading."
-  verify_es_version_compatibility
-
  echo "Let's see if we need to update Security Onion."
  upgrade_check
  upgrade_space

+  echo "Verifying Elasticsearch version compatibility across the grid before upgrading."
+  verify_es_version_compatibility
+
  echo "Checking for Salt Master and Minion updates."
  upgrade_check_salt
  set -e
@@ -1420,7 +1663,8 @@ main() {
    echo "Applying $HOTFIXVERSION hotfix"
    # since we don't run the backup.config_backup state on import we wont snapshot previous version states and pillars
    if [[ ! "$MINION_ROLE" == "import" ]]; then
-      backup_old_states_pillars
+        echo "Running so-config-backup script."
+        /sbin/so-config-backup
    fi
    copy_new_files
    create_local_directories "/opt/so/saltstack/default"
@@ -1476,8 +1720,8 @@ main() {
    # since we don't run the backup.config_backup state on import we wont snapshot previous version states and pillars
    if [[ ! "$MINION_ROLE" == "import" ]]; then
      echo ""
-      echo "Creating snapshots of default and local Salt states and pillars and saving to /nsm/backup/"
-      backup_old_states_pillars
+      echo "Running so-config-backup script."
+      /sbin/so-config-backup
    fi

    echo ""
@@ -225,6 +225,7 @@ http {
 			limit_req             zone=auth_throttle burst={{ NGINXMERGED.config.throttle_login_burst }} nodelay;
 			limit_req_status      429;
 			proxy_pass            http://{{ GLOBALS.manager }}:4433;
+			proxy_set_header      Connection "Close";
 			proxy_read_timeout    90;
 			proxy_connect_timeout 90;
 			proxy_set_header      Host $host;
@@ -237,6 +238,7 @@ http {
 		location ~ ^/auth/.*?(whoami|logout|settings|errors|webauthn.js) {
 			rewrite               /auth/(.*) /$1 break;
 			proxy_pass            http://{{ GLOBALS.manager }}:4433;
+			proxy_set_header      Connection "Close";
 			proxy_read_timeout    90;
 			proxy_connect_timeout 90;
 			proxy_set_header      Host $host;
@@ -46,10 +46,10 @@ postgresinitdir:
    - require:
      - file: postgresconfdir

-postgresinitusers:
+postgresinitdb:
  file.managed:
-    - name: /opt/so/conf/postgres/init/init-users.sh
-    - source: salt://postgres/files/init-users.sh
+    - name: /opt/so/conf/postgres/init/init-db.sh
+    - source: salt://postgres/files/init-db.sh
    - user: 939
    - group: 939
    - mode: 755
@@ -31,7 +31,7 @@ so-postgres:
      - POSTGRES_DB=securityonion
      # Passwords are delivered via mounted 0600 secret files, not plaintext env vars.
      # The upstream postgres image resolves POSTGRES_PASSWORD_FILE; entrypoint.sh and
-      # init-users.sh resolve SO_POSTGRES_PASS_FILE the same way.
+      # init-db.sh resolve SO_POSTGRES_PASS_FILE the same way.
      - POSTGRES_PASSWORD_FILE=/run/secrets/postgres_password
      - SO_POSTGRES_USER={{ SO_POSTGRES_USER }}
      - SO_POSTGRES_PASS_FILE=/run/secrets/so_postgres_pass
@@ -46,7 +46,7 @@ so-postgres:
      - /opt/so/conf/postgres/postgresql.conf:/conf/postgresql.conf:ro
      - /opt/so/conf/postgres/pg_hba.conf:/conf/pg_hba.conf:ro
      - /opt/so/conf/postgres/secrets:/run/secrets:ro
-      - /opt/so/conf/postgres/init/init-users.sh:/docker-entrypoint-initdb.d/init-users.sh:ro
+      - /opt/so/conf/postgres/init/init-db.sh:/docker-entrypoint-initdb.d/init-db.sh:ro
      - /etc/pki/postgres.crt:/conf/postgres.crt:ro
      - /etc/pki/postgres.key:/conf/postgres.key:ro
      - /etc/pki/tls/certs/intca.crt:/conf/ca.crt:ro
@@ -70,7 +70,7 @@ so-postgres:
    - watch:
      - file: postgresconf
      - file: postgreshba
-      - file: postgresinitusers
+      - file: postgresinitdb
      - file: postgres_super_secret
      - file: postgres_app_secret
      - x509: postgres_crt
@@ -78,7 +78,7 @@ so-postgres:
    - require:
      - file: postgresconf
      - file: postgreshba
-      - file: postgresinitusers
+      - file: postgresinitdb
      - file: postgres_super_secret
      - file: postgres_app_secret
      - x509: postgres_crt
@@ -17,6 +17,7 @@ psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --dbname "$POSTGRES_DB" <<-E
        END IF;
    END
    \$\$;
+    GRANT ALL ON SCHEMA public TO "$SO_POSTGRES_USER";
    GRANT ALL PRIVILEGES ON DATABASE "$POSTGRES_DB" TO "$SO_POSTGRES_USER";
    -- Lock the SOC database down at the connect layer; PUBLIC gets CONNECT
    -- by default, which would let per-minion telegraf roles open sessions
@@ -31,4 +32,4 @@ EOSQL
 # only ensures the shared database exists on first initialization.
 if ! psql -U "$POSTGRES_USER" -tAc "SELECT 1 FROM pg_database WHERE datname='so_telegraf'" | grep -q 1; then
    psql -v ON_ERROR_STOP=1 -U "$POSTGRES_USER" -c "CREATE DATABASE so_telegraf"
-fi
+fi
@@ -18,38 +18,22 @@ include:
 {% set TG_OUT = TELEGRAFMERGED.output | upper %}
 {% if TG_OUT in ['POSTGRES', 'BOTH'] %}

-# docker_container.running returns as soon as the container starts, but on
-# first-init docker-entrypoint.sh starts a temporary postgres with
-# `listen_addresses=''` to run /docker-entrypoint-initdb.d scripts, then
-# shuts it down before exec'ing the real CMD. A default pg_isready check
-# (Unix socket) passes during that ephemeral phase and races the shutdown
-# with "the database system is shutting down". Checking TCP readiness on
-# 127.0.0.1 only succeeds after the final postgres binds the port.
 postgres_wait_ready:
  cmd.run:
-    - name: |
-        for i in $(seq 1 60); do
-          if docker exec so-postgres pg_isready -h 127.0.0.1 -U postgres -q 2>/dev/null; then
-            exit 0
-          fi
-          sleep 2
-        done
-        echo "so-postgres did not accept TCP connections within 120s" >&2
-        exit 1
+    - name: /usr/sbin/so-postgres-wait
    - require:
      - docker_container: so-postgres
+      - file: postgres_sbin

-# Ensure the shared Telegraf database exists. init-users.sh only runs on a
+# Ensure the shared Telegraf database exists. init-db.sh only runs on a
 # fresh data dir, so hosts upgraded onto an existing /nsm/postgres volume
 # would otherwise never get so_telegraf.
 postgres_create_telegraf_db:
  cmd.run:
-    - name: |
-        if ! docker exec so-postgres psql -U postgres -tAc "SELECT 1 FROM pg_database WHERE datname='so_telegraf'" | grep -q 1; then
-          docker exec so-postgres psql -v ON_ERROR_STOP=1 -U postgres -c "CREATE DATABASE so_telegraf"
-        fi
+    - name: /usr/sbin/so-telegraf-postgres create_db
    - require:
      - cmd: postgres_wait_ready
+      - file: postgres_sbin

 # Provision the shared group role and schema once. Every per-minion role is a
 # member of so_telegraf, and each Telegraf connection does SET ROLE so_telegraf
@@ -57,68 +41,26 @@ postgres_create_telegraf_db:
 # on first write are owned by the group role and every member can INSERT/SELECT.
 postgres_telegraf_group_role:
  cmd.run:
-    - name: |
-        docker exec -i so-postgres psql -v ON_ERROR_STOP=1 -U postgres -d so_telegraf <<'EOSQL'
-        DO $$
-        BEGIN
-            IF NOT EXISTS (SELECT FROM pg_catalog.pg_roles WHERE rolname = 'so_telegraf') THEN
-                CREATE ROLE so_telegraf NOLOGIN;
-            END IF;
-        END
-        $$;
-        GRANT CONNECT ON DATABASE so_telegraf TO so_telegraf;
-        CREATE SCHEMA IF NOT EXISTS telegraf AUTHORIZATION so_telegraf;
-        GRANT USAGE, CREATE ON SCHEMA telegraf TO so_telegraf;
-        CREATE SCHEMA IF NOT EXISTS partman;
-        CREATE EXTENSION IF NOT EXISTS pg_partman SCHEMA partman;
-        CREATE EXTENSION IF NOT EXISTS pg_cron;
-        -- Telegraf (running as so_telegraf) calls partman.create_parent()
-        -- on first write of each metric, which needs USAGE on the partman
-        -- schema, EXECUTE on its functions/procedures, and write access to
-        -- partman.part_config so it can register new partitioned parents.
-        GRANT USAGE, CREATE ON SCHEMA partman TO so_telegraf;
-        GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA partman TO so_telegraf;
-        GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA partman TO so_telegraf;
-        GRANT EXECUTE ON ALL PROCEDURES IN SCHEMA partman TO so_telegraf;
-        -- partman creates per-parent template tables (partman.template_*) at
-        -- runtime; default privileges extend DML/sequence access to them.
-        ALTER DEFAULT PRIVILEGES IN SCHEMA partman
-            GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO so_telegraf;
-        ALTER DEFAULT PRIVILEGES IN SCHEMA partman
-            GRANT USAGE, SELECT, UPDATE ON SEQUENCES TO so_telegraf;
-        -- Hourly partman maintenance. cron.schedule is idempotent by jobname.
-        SELECT cron.schedule(
-          'telegraf-partman-maintenance',
-          '17 * * * *',
-          'CALL partman.run_maintenance_proc()'
-        );
-        EOSQL
+    - name: /usr/sbin/so-telegraf-postgres group_role
    - require:
      - cmd: postgres_create_telegraf_db
+      - file: postgres_sbin

 {%   set creds = salt['pillar.get']('telegraf:postgres_creds', {}) %}
 {%   for mid, entry in creds.items() %}
 {%     if entry.get('user') and entry.get('pass') %}
 {%       set u = entry.user %}
-{%       set p = entry.pass | replace("'", "''") %}
+{%       set p = entry.pass %}

 postgres_telegraf_role_{{ u }}:
  cmd.run:
-    - name: |
-        docker exec -i so-postgres psql -v ON_ERROR_STOP=1 -U postgres -d so_telegraf <<'EOSQL'
-        DO $$
-        BEGIN
-            IF NOT EXISTS (SELECT FROM pg_catalog.pg_roles WHERE rolname = '{{ u }}') THEN
-                EXECUTE format('CREATE ROLE %I WITH LOGIN PASSWORD %L', '{{ u }}', '{{ p }}');
-            ELSE
-                EXECUTE format('ALTER ROLE %I WITH PASSWORD %L', '{{ u }}', '{{ p }}');
-            END IF;
-        END
-        $$;
-        GRANT CONNECT ON DATABASE so_telegraf TO "{{ u }}";
-        GRANT so_telegraf TO "{{ u }}";
-        EOSQL
+    - name: /usr/sbin/so-telegraf-postgres user
+    - env:
+      - ROLE_USER: {{ u | tojson }}
+      - ROLE_PASS: {{ p | tojson }}
+    - hide_output: True
    - require:
+      - file: postgres_sbin
      - cmd: postgres_telegraf_group_role

 {%     endif %}
@@ -130,21 +72,12 @@ postgres_telegraf_role_{{ u }}:
 {%   set retention = salt['pillar.get']('postgres:telegraf:retention_days', 14) | int %}
 postgres_telegraf_retention_reconcile:
  cmd.run:
-    - name: |
-        docker exec -i so-postgres psql -v ON_ERROR_STOP=1 -U postgres -d so_telegraf <<'EOSQL'
-        DO $$
-        BEGIN
-            IF EXISTS (SELECT 1 FROM pg_catalog.pg_extension WHERE extname = 'pg_partman') THEN
-                UPDATE partman.part_config
-                SET retention = '{{ retention }} days',
-                    retention_keep_table = false
-                WHERE parent_table LIKE 'telegraf.%';
-            END IF;
-        END
-        $$;
-        EOSQL
+    - name: /usr/sbin/so-telegraf-postgres retention
+    - env:
+      - RETENTION_DAYS: {{ retention }}
    - require:
      - cmd: postgres_telegraf_group_role
+      - file: postgres_sbin

 {% endif %}

@@ -7,15 +7,29 @@

 . /usr/sbin/so-common

+# Without pipefail, a pipeline's exit status is gzip's. A failed pg_dumpall would
+# otherwise be masked by a successful gzip, silently producing a valid .gz that
+# holds a truncated dump.
+set -o pipefail
+
 # Backups contain role password hashes and full chat data; keep them 0600.
 umask 0077

 TODAY=$(date '+%Y_%m_%d')
 BACKUPDIR=/nsm/backup
 BACKUPFILE="$BACKUPDIR/so-postgres-backup-$TODAY.sql.gz"
+TMPFILE="$BACKUPFILE.tmp"
 MAXBACKUPS=7
+LOGFILE=/opt/so/log/postgres/backup.log

-mkdir -p $BACKUPDIR
+log() {
+  echo "$(date '+%Y-%m-%d %H:%M:%S') $*" >> "$LOGFILE"
+}
+
+mkdir -p "$BACKUPDIR"
+
+# Remove any temp files left behind by a previously crashed run
+rm -f "$BACKUPDIR"/so-postgres-backup-*.sql.gz.tmp

 # Skip if already backed up today
 if [ -f "$BACKUPFILE" ]; then
@@ -27,13 +41,33 @@ if ! docker ps --format '{{.Names}}' | grep -q '^so-postgres$'; then
  exit 0
 fi

-# Dump all databases and roles, compress
-docker exec so-postgres pg_dumpall -U postgres | gzip > "$BACKUPFILE"
+# Always clean up the temp file on exit; the success path clears this trap
+# after the atomic rename so the finished backup is not deleted.
+trap 'rm -f "$TMPFILE"' EXIT

-# Retention cleanup
-NUMBACKUPS=$(find $BACKUPDIR -type f -name "so-postgres-backup*" | wc -l)
+# Dump all databases and roles, compress. Write to a temp file so the final
+# filename only ever appears for a complete, verified backup.
+if ! docker exec so-postgres pg_dumpall -U postgres | gzip > "$TMPFILE"; then
+  log "ERROR: pg_dumpall/gzip failed; backup aborted"
+  exit 1
+fi
+
+# Verify the compressed stream is intact before publishing it
+if ! gzip -t "$TMPFILE"; then
+  log "ERROR: backup failed gzip integrity check; backup aborted"
+  exit 1
+fi
+
+# Atomically publish the verified backup
+mv "$TMPFILE" "$BACKUPFILE"
+trap - EXIT
+log "OK: wrote $BACKUPFILE"
+
+# Retention cleanup (only reached after a successful backup). The glob is
+# restricted to finished backups so an in-progress .tmp can never be counted.
+NUMBACKUPS=$(find "$BACKUPDIR" -type f -name "so-postgres-backup-*.sql.gz" | wc -l)
 while [ "$NUMBACKUPS" -gt "$MAXBACKUPS" ]; do
-  OLDEST=$(find $BACKUPDIR -type f -name "so-postgres-backup*" -printf '%T+ %p\n' | sort | head -n 1 | awk -F" " '{print $2}')
+  OLDEST=$(find "$BACKUPDIR" -type f -name "so-postgres-backup-*.sql.gz" -printf '%T+ %p\n' | sort | head -n 1 | awk -F" " '{print $2}')
  rm -f "$OLDEST"
-  NUMBACKUPS=$(find $BACKUPDIR -type f -name "so-postgres-backup*" | wc -l)
+  NUMBACKUPS=$(find "$BACKUPDIR" -type f -name "so-postgres-backup-*.sql.gz" | wc -l)
 done
@@ -0,0 +1,32 @@
+#!/bin/bash
+
+# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
+# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
+# https://securityonion.net/license; you may not use this file except in compliance with the
+# Elastic License 2.0.
+
+# Wait for the so-postgres container to accept TCP connections.
+#
+# docker_container.running returns as soon as the container starts, but on
+# first-init docker-entrypoint.sh starts a temporary postgres with
+# `listen_addresses=''` to run /docker-entrypoint-initdb.d scripts, then
+# shuts it down before exec'ing the real CMD. A default pg_isready check
+# (Unix socket) passes during that ephemeral phase and races the shutdown
+# with "the database system is shutting down". Checking TCP readiness on
+# 127.0.0.1 only succeeds after the final postgres binds the port.
+#
+# Usage: so-postgres-wait [iterations] [sleep_seconds]
+# Default: 60 iterations, 2s sleep (~120s total).
+
+ITERATIONS=${1:-60}
+SLEEP_SECONDS=${2:-2}
+
+for i in $(seq 1 "$ITERATIONS"); do
+  if docker exec so-postgres pg_isready -h 127.0.0.1 -U postgres -q 2>/dev/null; then
+    exit 0
+  fi
+  sleep "$SLEEP_SECONDS"
+done
+
+echo "so-postgres did not accept TCP connections within $((ITERATIONS * SLEEP_SECONDS))s" >&2
+exit 1
@@ -0,0 +1,110 @@
+#!/bin/bash
+set -e
+
+# Provision Telegraf state inside the so-postgres container.
+# Usage: so-telegraf-postgres <subcommand>
+#   create_db    Ensure the so_telegraf database exists.
+#   group_role   Provision the so_telegraf group role, telegraf/partman schemas,
+#                pg_partman, pg_cron, and the hourly partman maintenance job.
+#   user         Create or update a per-minion login role granted to so_telegraf.
+#                Env: ROLE_USER, ROLE_PASS.
+#   retention    Reconcile partman retention on telegraf parents.
+#                Env: RETENTION_DAYS.
+
+cmd="${1:?subcommand required}"
+
+case "$cmd" in
+  create_db)
+    if ! docker exec so-postgres psql -U postgres -tAc \
+        "SELECT 1 FROM pg_database WHERE datname='so_telegraf'" | grep -q 1; then
+      docker exec so-postgres psql -v ON_ERROR_STOP=1 -U postgres \
+        -c "CREATE DATABASE so_telegraf"
+    fi
+    ;;
+
+  group_role)
+    docker exec -i so-postgres psql -v ON_ERROR_STOP=1 -U postgres -d so_telegraf <<'EOSQL'
+DO $$
+BEGIN
+    IF NOT EXISTS (SELECT FROM pg_catalog.pg_roles WHERE rolname = 'so_telegraf') THEN
+        CREATE ROLE so_telegraf NOLOGIN;
+    END IF;
+END
+$$;
+GRANT CONNECT ON DATABASE so_telegraf TO so_telegraf;
+CREATE SCHEMA IF NOT EXISTS telegraf AUTHORIZATION so_telegraf;
+GRANT USAGE, CREATE ON SCHEMA telegraf TO so_telegraf;
+CREATE SCHEMA IF NOT EXISTS partman;
+CREATE EXTENSION IF NOT EXISTS pg_partman SCHEMA partman;
+CREATE EXTENSION IF NOT EXISTS pg_cron;
+-- Telegraf (running as so_telegraf) calls partman.create_parent()
+-- on first write of each metric, which needs USAGE on the partman
+-- schema, EXECUTE on its functions/procedures, and write access to
+-- partman.part_config so it can register new partitioned parents.
+GRANT USAGE, CREATE ON SCHEMA partman TO so_telegraf;
+GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA partman TO so_telegraf;
+GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA partman TO so_telegraf;
+GRANT EXECUTE ON ALL PROCEDURES IN SCHEMA partman TO so_telegraf;
+-- partman creates per-parent template tables (partman.template_*) at
+-- runtime; default privileges extend DML/sequence access to them.
+ALTER DEFAULT PRIVILEGES IN SCHEMA partman
+    GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO so_telegraf;
+ALTER DEFAULT PRIVILEGES IN SCHEMA partman
+    GRANT USAGE, SELECT, UPDATE ON SEQUENCES TO so_telegraf;
+-- Hourly partman maintenance. cron.schedule is idempotent by jobname.
+SELECT cron.schedule(
+  'telegraf-partman-maintenance',
+  '17 * * * *',
+  'CALL partman.run_maintenance_proc()'
+);
+EOSQL
+    ;;
+
+  user)
+    : "${ROLE_USER:?ROLE_USER is required}"
+    : "${ROLE_PASS:?ROLE_PASS is required}"
+    # psql does not substitute :vars inside dollar-quoted strings, so the
+    # conditional CREATE/ALTER is built outside any DO block and dispatched
+    # with \gexec. format() handles identifier/literal quoting.
+    docker exec -i so-postgres psql \
+      -v ON_ERROR_STOP=1 \
+      -v role_user="$ROLE_USER" \
+      -v role_pass="$ROLE_PASS" \
+      -U postgres -d so_telegraf <<'EOSQL'
+SELECT format(
+  CASE WHEN EXISTS (SELECT FROM pg_catalog.pg_roles WHERE rolname = :'role_user')
+       THEN 'ALTER ROLE %I WITH LOGIN PASSWORD %L'
+       ELSE 'CREATE ROLE %I WITH LOGIN PASSWORD %L'
+  END,
+  :'role_user',
+  :'role_pass'
+) \gexec
+GRANT CONNECT ON DATABASE so_telegraf TO :"role_user";
+GRANT so_telegraf TO :"role_user";
+EOSQL
+    ;;
+
+  retention)
+    : "${RETENTION_DAYS:?RETENTION_DAYS is required}"
+    # \gset + \if guards against a missing pg_partman without using a DO
+    # block (psql :var substitution doesn't reach into dollar-quoted code).
+    docker exec -i so-postgres psql \
+      -v ON_ERROR_STOP=1 \
+      -v retention_days="$RETENTION_DAYS" \
+      -U postgres -d so_telegraf <<'EOSQL'
+SELECT CASE WHEN EXISTS (SELECT 1 FROM pg_catalog.pg_extension WHERE extname = 'pg_partman')
+            THEN 'true' ELSE 'false' END AS has_partman \gset
+\if :has_partman
+UPDATE partman.part_config
+SET retention = :'retention_days' || ' days',
+    retention_keep_table = false
+WHERE parent_table LIKE 'telegraf.%';
+\endif
+EOSQL
+    ;;
+
+  *)
+    echo "Unknown subcommand: $cmd" >&2
+    exit 1
+    ;;
+esac
@@ -14,6 +14,7 @@

 include:
  - salt.minion
+  - salt.master.boot_mine_update
 {%   if 'vrt' in salt['pillar.get']('features', []) %}
  - salt.cloud
  - salt.cloud.reactor_config_hypervisor
@@ -0,0 +1,29 @@
+# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
+# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
+# https://securityonion.net/license; you may not use this file except in compliance with the
+# Elastic License 2.0.
+
+# Manages /etc/systemd/system/so-boot-mine-update.service, a manager-only
+# Type=oneshot unit that pushes `salt '*' mine.update` once per boot, ordered
+# before so-boot-highstate.service so mine-backed pillars (node IPs, ES/Redis/
+# Logstash discovery) are fresh before the boot highstate renders them.
+
+include:
+  - systemd.reload
+
+so_boot_mine_update_unit_file:
+  file.managed:
+    - name: /etc/systemd/system/so-boot-mine-update.service
+    - source: salt://salt/service/so-boot-mine-update.service
+    - onchanges_in:
+      - module: systemd_reload
+
+# Only enable once setup is complete. Until then the gate file is missing and
+# the unit's own ConditionPathExists would no-op it anyway.
+so_boot_mine_update_service:
+  service.enabled:
+    - name: so-boot-mine-update.service
+    - onlyif: test -e /opt/so/state/setup-complete
+    - require:
+      - file: so_boot_mine_update_unit_file
+      - module: systemd_reload
@@ -0,0 +1,31 @@
+# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
+# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
+# https://securityonion.net/license; you may not use this file except in compliance with the
+# Elastic License 2.0.
+
+# Manages /etc/systemd/system/so-boot-highstate.service, a Type=oneshot
+# RemainAfterExit=yes unit that runs `salt-call state.highstate` exactly once
+# per system boot. Replaces the legacy `startup_states: highstate` minion
+# config, which fired on every salt-minion service restart (causing a redundant
+# highstate whenever a highstate itself restarted salt-minion).
+
+include:
+  - systemd.reload
+
+so_boot_highstate_unit_file:
+  file.managed:
+    - name: /etc/systemd/system/so-boot-highstate.service
+    - source: salt://salt/service/so-boot-highstate.service
+    - onchanges_in:
+      - module: systemd_reload
+
+# Only enable once setup is complete. Until then the gate file is missing and
+# the unit's own ConditionPathExists would no-op it anyway -- this just keeps
+# `systemctl is-enabled` honest for the sync_es_users gate.
+so_boot_highstate_service:
+  service.enabled:
+    - name: so-boot-highstate.service
+    - onlyif: test -e /opt/so/state/setup-complete
+    - require:
+      - file: so_boot_highstate_unit_file
+      - module: systemd_reload
@@ -17,6 +17,7 @@ include:
  - repo.client
  - salt.mine_functions
  - salt.minion.service_file
+  - salt.minion.boot_highstate
 {% if GLOBALS.is_manager %}
  - ca.signing_policy
 {% endif %}
@@ -80,11 +81,33 @@ set_log_levels:
      - "log_level: info"
      - "log_level_logfile: info"

-enable_startup_states:
-  file.uncomment:
+# startup_states: highstate caused a full highstate to run on every
+# salt-minion service start, including the restart triggered when a highstate
+# itself modified the minion config (beacons, mine, unit file). Replaced by
+# so-boot-highstate.service (managed in salt.minion.boot_highstate), which
+# runs once per system boot only. Strip the line from /etc/salt/minion on
+# upgrade; both the commented and uncommented forms historically existed.
+remove_startup_states:
+  file.line:
    - name: /etc/salt/minion
-    - regex: '^startup_states: highstate$'
-    - unless: pgrep so-setup
+    - match: 'startup_states: highstate'
+    - mode: delete
+
+# Upgrade-path bridge: systems that already passed setup under the old gate
+# (`grep -x 'startup_states: highstate' /etc/salt/minion`) get a /opt/so/state/setup-complete
+# marker so so-boot-highstate.service can be enabled and the so-user_sync cron
+# in sync_es_users.sls keeps installing. Setup-in-progress systems instead get
+# the marker from `mark_setup_complete` in setup/so-functions at the right
+# moment. `replace: false` means we never overwrite a marker once written.
+mark_setup_complete_for_upgrades:
+  file.managed:
+    - name: /opt/so/state/setup-complete
+    - replace: false
+    - makedirs: True
+    - onlyif: "grep -qx 'startup_states: highstate' /etc/salt/minion"
+    - require_in:
+      - file: remove_startup_states
+      - service: so_boot_highstate_service

 {% endif %}

@@ -0,0 +1,14 @@
+[Unit]
+Description=Security Onion boot-time highstate (runs once per boot)
+After=salt-minion.service network-online.target docker.service
+Wants=network-online.target docker.service
+Requires=salt-minion.service
+ConditionPathExists=/opt/so/state/setup-complete
+
+[Service]
+Type=oneshot
+RemainAfterExit=yes
+ExecStart=/usr/bin/salt-call state.highstate -l info queue=True
+
+[Install]
+WantedBy=multi-user.target
@@ -0,0 +1,15 @@
+[Unit]
+Description=Security Onion boot-time grid mine.update (managers, runs once per boot before highstate)
+After=salt-master.service salt-minion.service network-online.target
+Wants=network-online.target
+Requires=salt-master.service salt-minion.service
+Before=so-boot-highstate.service
+ConditionPathExists=/opt/so/state/setup-complete
+
+[Service]
+Type=oneshot
+RemainAfterExit=yes
+ExecStart=/usr/sbin/so-boot-mine-update
+
+[Install]
+WantedBy=multi-user.target
@@ -8,11 +8,6 @@ set_role_grain:
    - name: role
    - value: so-{{ grains.id.split("_") | last }}

-set_highstate:
-  file.append:
-    - name: /etc/salt/minion
-    - text: 'startup_states: highstate'
-
 enable_salt_minion:
  service.enabled:
    - name: salt-minion
@@ -1519,6 +1519,16 @@ soc:
              serviceAccountJSON: ""
              serviceAccountLocation: ""
              healthTimeoutSeconds: 5
+        onionconfig:
+          saltstackDir: /opt/so/saltstack
+          bypassEnabled: false
+        postgres:
+          host: ""
+          port: 5432
+          sslMode: "allow"
+          database: securityonion
+          user: ""
+          password: ""
        salt:
          queueDir: /opt/sensoroni/queue
          timeoutMs: 45000
@@ -16,6 +16,14 @@
 {% do SOCMERGED.config.server.update({'additionalCA': MANAGERMERGED.additionalCA}) %}
 {% do SOCMERGED.config.server.update({'insecureSkipVerify': MANAGERMERGED.insecureSkipVerify}) %}

+{% if not SOCMERGED.config.server.modules.postgres.host %}
+{%   do SOCMERGED.config.server.modules.postgres.update({'host': GLOBALS.manager}) %}
+{% endif %}
+{% if not SOCMERGED.config.server.modules.postgres.password %}
+{%   do SOCMERGED.config.server.modules.postgres.update({'password': salt['pillar.get']('postgres:auth:users:so_postgres_user:pass', '')}) %}
+{%   do SOCMERGED.config.server.modules.postgres.update({'user': salt['pillar.get']('postgres:auth:users:so_postgres_user:user', 'so_postgres')}) %}
+{% endif %}
+
 {# if SOCMERGED.config.server.modules.cases == httpcase details come from the soc pillar #}
 {% if SOCMERGED.config.server.modules.cases != 'soc' %}
 {%   do SOCMERGED.config.server.modules.elastic.update({'casesEnabled': false}) %}
@@ -453,6 +453,42 @@ soc:
            description: Duration (in milliseconds) that must elapse after a grid node fails to check-in before the node will be marked offline (fault).
            global: True
            advanced: True
+        onionconfig:
+          saltstackDir:
+            description: Root directory containing the SaltStack tree that SOC reads and writes configuration from. Should not be changed under normal circumstances.
+            global: True
+            advanced: True
+          bypassEnabled:
+            description: When enabled, errors encountered while reading the SaltStack pillar tree (missing files, unreadable directories, etc.) are logged but do not prevent SOC from starting or serving settings. Intended for advanced troubleshooting and recovery scenarios when the pillar tree is partially unreadable.
+            global: True
+            advanced: True
+            forcedType: bool
+        postgres:
+          host:
+            description: Hostname or IP address of the PostgreSQL server used by SOC. Defaults to the manager hostname.
+            global: True
+            advanced: True
+          port:
+            description: Port of the PostgreSQL server used by SOC.
+            global: True
+            advanced: True
+          sslMode:
+            description: "Use encrypted connections to the PostgreSQL server. Must be one of the following values: disable, allow, prefer, require, verify-ca, verify-full.  Defaults to allow."
+            global: True
+            advanced: True
+          database:
+            description: Database used by SOC to authenticate to the PostgreSQL server.
+            global: True
+            advanced: True
+          user:
+            description: Username used by SOC to authenticate to the PostgreSQL server.
+            global: True
+            advanced: True
+          password:
+            description: Password used by SOC to authenticate to the PostgreSQL server.
+            global: True
+            sensitive: True
+            advanced: True
        salt:
          longRelayTimeoutMs:
            description: Duration (in milliseconds) to wait for a response from the Salt API when executing tasks known for being long running before giving up and showing an error on the SOC UI.
@@ -818,6 +854,7 @@ soc:
          description: List of available external tools visible in the SOC UI. Each tool is defined in JSON object notation, and must include the "name" key and "link" key, where the link is the tool's URL.
          global: True
          advanced: True
+          multiline: True
          forcedType: "[]{}"
        exportNodeId:
          description: The node ID on which export jobs will be executed.
@@ -261,7 +261,7 @@ strelka:
              priority: 5
              options:
                limit: 1000
-          'ScanLNK':
+          'ScanLnk':
            - positive:
                flavors:
                  - 'lnk_file'
@@ -99,7 +99,7 @@ strelka:
          'ScanJpeg': *scannerOptions
          'ScanJson': *scannerOptions
          'ScanLibarchive': *scannerOptions
-          'ScanLNK': *scannerOptions
+          'ScanLnk': *scannerOptions
          'ScanLsb': *scannerOptions
          'ScanLzma': *scannerOptions
          'ScanMacho': *scannerOptions
@@ -1,6 +1,6 @@
 telegraf:
  enabled: False
-  output: BOTH
+  output: INFLUXDB
  config:
    interval: '30s'
    metric_batch_size: 1000
@@ -119,7 +119,7 @@ base:
    - kafka
    - pcap.cleanup

-  '*_manager or *_managerhype and G@saltversion:{{saltversion}} and not I@node_data:False':
+  '*_manager and G@saltversion:{{saltversion}} and not I@node_data:False':
    - match: compound
    - salt.master
    - registry
@@ -146,6 +146,32 @@ base:
    - stig
    - kafka

+  '*_managerhype and G@saltversion:{{saltversion}} and not I@node_data:False':
+    - match: compound
+    - salt.master
+    - registry
+    - nginx
+    - influxdb
+    - postgres
+    - strelka.manager
+    - soc
+    - kratos
+    - hydra
+    - firewall
+    - manager
+    - sensoroni
+    - telegraf
+    - backup.config_backup
+    - elasticsearch
+    - logstash
+    - redis
+    - elastic-fleet-package-registry
+    - kibana
+    - elastalert
+    - utility
+    - elasticfleet
+    - kafka
+
  '*_managerhype and I@features:vrt and G@saltversion:{{saltversion}}':
    - match: compound
    - manager.hypervisor
@@ -286,7 +312,6 @@ base:
    - libvirt
    - libvirt.images
    - elasticfleet.install_agent_grid
-    - stig
  
  '*_desktop and G@saltversion:{{saltversion}}':
    - sensoroni
@@ -539,16 +539,19 @@ configure_minion() {
 		"  x509_v2: true"\
 		"log_level: info"\
 		"log_level_logfile: info"\
-		"log_file: /opt/so/log/salt/minion"\
-		"#startup_states: highstate" >> "$minion_config"
+		"log_file: /opt/so/log/salt/minion" >> "$minion_config"

 }

-checkin_at_boot() {
-	local minion_config=/etc/salt/minion
+mark_setup_complete() {
+	# Writes the setup-complete marker. Salt's so-boot-highstate.service
+	# (boot-time oneshot) and the so-user_sync cron gate in
+	# salt/manager/sync_es_users.sls both key off this file.
+	local marker=/opt/so/state/setup-complete

-	info "Enabling checkin at boot"
-	sed -i 's/#startup_states: highstate/startup_states: highstate/' "$minion_config"
+	info "Marking setup as complete"
+	mkdir -p "$(dirname "$marker")"
+	touch "$marker"
 }

 check_requirements() {
@@ -977,6 +980,8 @@ docker_seed_registry() {
 		docker_seed_update_percent=25

 		update_docker_containers 'netinstall' '' 'docker_seed_update' '/dev/stdout' 2>&1 | tee -a "$setup_log"
+        # Use pipe exit status of 'update_docker_containers' for return code
+		return ${PIPESTATUS[0]}
 	fi
 }

@@ -223,6 +223,8 @@ if [ -n "$test_profile" ]; then
 	WEBPASSWD1=0n10nus3r
 	WEBPASSWD2=0n10nus3r
 	NODE_DESCRIPTION="${HOSTNAME} - ${install_type} - ${MSRVIP_OFFSET}"
+	# opt out of telemetry for automated testing
+	telemetry=1

 	update_sudoers_for_testing
 fi
@@ -767,7 +769,10 @@ if ! [[ -f $install_opt_file ]]; then
 		title "Applying the registry state"
 		logCmd "salt-call state.apply -l info registry"
 		title "Seeding the docker registry"
-		docker_seed_registry
+		if ! docker_seed_registry; then
+			error "Failed to seed the docker registry"
+			fail_setup
+		fi
 		title "Applying the manager state"
 		logCmd "salt-call state.apply -l info manager"
 		logCmd "salt-call state.apply influxdb -l info"
@@ -792,7 +797,7 @@ if ! [[ -f $install_opt_file ]]; then
 			error "Failed to run so-elastic-fleet-setup"
 			fail_setup
 		fi
-		checkin_at_boot
+		mark_setup_complete
 		set_initial_firewall_access
        initialize_elasticsearch_indices "so-case so-casehistory so-assistant-session so-assistant-chat"
 		# run a final highstate before enabling scheduled highstates.
Author	SHA1	Message	Date
Mike Reeves	80c39d612c	Pin NIC names by MAC via udev (run-once) from the common state Add so-nic-pin, which writes by-MAC persistent-net udev rules pinning each physical NIC to its current name so a kernel upgrade can't renumber the interfaces Security Onion binds by name (host:mainint, sensor:mainint, bond0). Gated by the drop file /opt/so/state/nic_names_pinned: run-once on highstate, and an admin can pre-create the marker to opt out. Wired into common/init.sls as pin_nic_names, guarded by a matching unless.	2026-06-11 18:40:43 -04:00
Jorge Reyes	f03f0155f4	Merge pull request #15966 from Security-Onion-Solutions/reyesj2-patch-8 update so-elastic-fleet-package-upgrade script	2026-06-11 14:36:03 -05:00
Jason Ertel	0cc94980af	Merge pull request #15967 from Security-Onion-Solutions/jertel/wip Jertel/wip	2026-06-11 08:22:14 -04:00
Jason Ertel	b8bf684077	ver	2026-06-11 08:18:38 -04:00
Jason Ertel	f083db67e4	disable telemetry for automated tests	2026-06-11 08:17:39 -04:00
reyesj2	4741cc92bd	fleet manager start kibana if it isn't already running and wait for healthly status	2026-06-10 17:52:08 -05:00
reyesj2	46655860e9	http	2026-06-10 17:27:23 -05:00
reyesj2	289ddda5e8	kibana health check for fleet scripts	2026-06-10 17:06:22 -05:00
reyesj2	f905afbc6f	logging	2026-06-10 15:01:22 -05:00
reyesj2	bd5e77afc5	increase delay in so-elastic-fleet-package-upgrade attempts	2026-06-10 14:59:29 -05:00
reyesj2	944e773759	save exit until all packages have been attempted	2026-06-10 14:58:49 -05:00
Josh Patterson	3ba96da3b7	Merge pull request #15965 from Security-Onion-Solutions/nostartupstates remove startup states from salt config	2026-06-09 16:26:47 -04:00
Jorge Reyes	f0712bd780	Merge pull request #15964 from Security-Onion-Solutions/reyesj2-patch-8 use pipe exit status for update_docker_containers	2026-06-09 13:49:24 -05:00
Josh Patterson	448668a72e	Merge remote-tracking branch 'origin/3/dev' into nostartupstates	2026-06-09 14:02:00 -04:00
Josh Patterson	f088a27159	so-boot-mine-update: warm master pillar cache before highstate A complete mine is not enough: elasticsearch:nodes, redis:nodes, logstash:nodes (tgt_type=pillar) and hypervisor:nodes (tgt_type=compound) resolve their target against the master's per-minion data cache (grains+pillar in data.p), which is populated only when a minion's pillar is recompiled -- separately from the mine. After a reboot a node can be in the mine (so node_data/glob sees it) yet absent from that cache, so it fails the elasticsearch:enabled:true pillar match and is dropped from elasticsearch:nodes -> so-elasticsearch ExtraHosts -> container recreate. After the mine-completeness wait, run salt '*' saltutil.refresh_pillar wait=True to synchronously cache every up node's pillar (the same lever deploy_newnode.sls uses), then verify with salt-run cache.pillar and retry stragglers, bounded by MINE_UPDATE_MAX_WAIT. Also log elasticsearch:nodes alongside node_data for inspection.	2026-06-09 13:52:19 -04:00
reyesj2	9f5a9616a5	use pipe exit status for update_docker_containers	2026-06-09 12:51:58 -05:00
Josh Patterson	27c7702325	so-boot-mine-update: wait for a complete mine before highstate Mine-backed pillars (node_data, elasticsearch:nodes, redis:nodes, logstash:nodes, hypervisor:nodes) include a node only if it returned an IP from the mine, and the configs they build are rebuilt fresh every highstate. After a manager reboot with a flushed mine, the first boot highstate could run before an up node re-reported network.ip_addrs, dropping it from e.g. so-elasticsearch ExtraHosts and forcing a container recreate. After the initial broad mine.update, poll until every currently-up minion actually has network.ip_addrs in the mine, re-pushing mine.update to stragglers, before releasing the boot highstate. Shares the existing MINE_UPDATE_MAX_WAIT backstop so a slow/down node never blocks boot, and still logs the rendered node_data for inspection.	2026-06-09 10:10:32 -04:00
Josh Patterson	8c306eb37d	so-boot-mine-update: log the rendered node_data content Dump the actual rendered node_data pillar (pretty-printed JSON) to the journal instead of just a rendered/empty verdict, so the boot-time render attempt is fully inspectable. Empty renders print false/null and still emit the WARNING.	2026-06-09 09:49:19 -04:00
Josh Patterson	e536ffa363	so-boot-mine-update: render node_data after mine.update before highstate After the boot-time mine.update, have the manager actually render the node_data pillar and log whether it came back populated. node_data: False makes salt/top.sls apply the bootstrap recovery branch instead of the manager's real config, so surfacing this in the journal makes the condition visible before so-boot-highstate runs. Best-effort and non-blocking: always exits 0 so highstate proceeds regardless.	2026-06-09 09:35:24 -04:00
Jason Ertel	eb82f9ea9d	kilo version	2026-06-08 16:53:35 -04:00
Jorge Reyes	d7aa7ab228	Merge pull request #15961 from Security-Onion-Solutions/reyesj2/fleet-autoconfigure respect elasticfleet enable_auto_configuration setting for so-elastic…	2026-06-08 15:09:58 -05:00
Jorge Reyes	fe0b68d24c	Merge pull request #15958 from Security-Onion-Solutions/reyesj2-patch-template fix elasticsearch template generation issue	2026-06-08 15:07:49 -05:00
reyesj2	6ad345730b	respect elasticfleet enable_auto_configuration setting for so-elastic-fleet-urls-update	2026-06-08 15:02:57 -05:00
Josh Patterson	9580976ba2	Add manager boot-time grid mine.update oneshot before highstate so-boot-mine-update.service is a manager-only Type=oneshot unit that runs once per boot after salt-master/salt-minion start and before so-boot-highstate.service. It pushes mine.update to all reachable minions so mine-backed pillars (node IPs, ES/Redis/Logstash discovery) are fresh before the boot highstate renders them. The helper waits for the responsive minion set to settle (plateau) rather than for every accepted key to report up, so an intentionally powered-off minion doesn't block the update; MAX_WAIT remains as a backstop.	2026-06-08 11:05:13 -04:00
reyesj2	ac907ba45f	fix elasticsearch template generation issue	2026-06-05 16:42:08 -05:00
Josh Patterson	f957954abf	Merge pull request #15956 from Security-Onion-Solutions/nostartupstates higstate on host start, not salt-minion start	2026-06-04 16:51:10 -04:00
Josh Patterson	cb3631da81	Move setup-complete marker from /opt/so/conf to /opt/so/state The setup-complete marker is a runtime-state file, not config, so move it to /opt/so/state/setup-complete. Updates both writers (mark_setup_complete in setup/so-functions and the upgrade-path state in minion/init.sls) and the three readers (so-boot-highstate.service ConditionPathExists, boot_highstate.sls enable gate, and the so-user_sync cron gate).	2026-06-04 15:07:27 -04:00
Josh Patterson	f5d63f585e	Merge remote-tracking branch 'origin/3/dev' into nostartupstates	2026-06-04 09:19:01 -04:00
Josh Patterson	13f8be40b5	so-boot-highstate: wait for docker before running highstate Add docker.service to After= and Wants= so the boot-time highstate starts after docker is up. Uses Wants (soft) so highstate still runs if docker fails to start.	2026-06-04 08:46:35 -04:00
Jason Ertel	9ee90a5bc0	Merge pull request #15955 from Security-Onion-Solutions/jertel/wip config updates	2026-06-03 17:26:51 -04:00
Jason Ertel	ca85c5d900	fix version	2026-06-03 17:26:08 -04:00
Josh Patterson	2d653b6f1b	does not need to be jinja template	2026-06-03 15:46:58 -04:00
Josh Patterson	34fee25b0c	Merge remote-tracking branch 'origin/3/dev' into nostartupstates	2026-06-03 15:44:41 -04:00
Jason Ertel	1d3d98f759	kilo	2026-06-03 12:24:41 -04:00
Jason Ertel	a767c79641	restore soup db init	2026-06-03 10:39:37 -04:00
Jason Ertel	61e72c89e4	postgres updates	2026-06-03 09:49:53 -04:00
Jason Ertel	d9fb7313f9	merge	2026-06-03 09:30:05 -04:00
Jason Ertel	7ca2313255	move to securityonion db	2026-06-03 09:05:23 -04:00
Jorge Reyes	534f0e639d	Merge pull request #15954 from Security-Onion-Solutions/reyesj2-patch-4 run elastic agent regen installer script in post_to_3.2.0	2026-06-02 15:25:55 -05:00
reyesj2	559465b407	run elastic agent gen installers script in post_to_3.2.0	2026-06-02 15:18:00 -05:00
reyesj2	f9c2579261	remove logstash pipeline rename from hotfix moving to up_to_3.2.0	2026-06-02 15:18:00 -05:00
Jorge Reyes	33699a914b	Merge pull request #15952 from Security-Onion-Solutions/reyesj2-patch-3 use so-config-backup script in soup	2026-06-02 15:02:27 -05:00
Jorge Reyes	0c2d8f8973	Merge pull request #15951 from Security-Onion-Solutions/reyesj2-patch-2 check if there is a version or hotfix to upgrade to before verifiying elasticsearch compatibility	2026-06-02 15:02:10 -05:00
reyesj2	f2996fb888	use so-config-backup script in soup	2026-06-01 11:52:35 -05:00
reyesj2	3c533cccbc	and after free space check	2026-06-01 11:28:59 -05:00
reyesj2	79da9f9f2c	check if there is a version or hotfix to upgrade to before verifiying elasticsearch compatibility	2026-06-01 11:26:52 -05:00
Mike Reeves	99a027589b	Merge pull request #15949 from Security-Onion-Solutions/jertel/wip fix version	2026-05-30 09:50:14 -04:00
Jason Ertel	68a82a425b	fix version	2026-05-30 08:12:50 -04:00
Jason Ertel	d86a3c5cc9	Merge pull request #15947 from Security-Onion-Solutions/jertel/wip refactored soc config	2026-05-29 14:07:06 -04:00
Jason Ertel	86edc5aaba	version	2026-05-28 22:57:59 -04:00
Josh Patterson	9a70a06b3b	Merge remote-tracking branch 'origin/3/dev' into jertel/wip	2026-05-28 13:55:12 -04:00
Mike Reeves	526d739b3b	Merge pull request #15940 from Security-Onion-Solutions/TOoSmOotH-patch-4 Remove outdated HOTFIX version number	2026-05-28 10:25:28 -04:00
Mike Reeves	68d783e760	Remove outdated HOTFIX version number	2026-05-28 10:24:47 -04:00
Mike Reeves	1e9b6b0975	Merge pull request #15939 from Security-Onion-Solutions/3/main main to dev for hotfix	2026-05-28 10:24:21 -04:00
Mike Reeves	2131e7d450	Merge pull request #15937 from Security-Onion-Solutions/hotfix/3.1.0 Hotfix/3.1.0	2026-05-28 10:20:53 -04:00
Mike Reeves	2a2d853ac4	Merge pull request #15936 from Security-Onion-Solutions/hotfix310 3.1.0 hotfix	2026-05-28 09:53:00 -04:00
Mike Reeves	5abd6de4b5	3.1.0 hotfix	2026-05-28 09:34:17 -04:00
Josh Patterson	bb8ae91d91	fix so-soc postgres bootstrap	2026-05-27 16:39:52 -04:00
Josh Patterson	93ffce98d7	add onionconfig and postgres modules to soc config	2026-05-27 15:07:25 -04:00
Jorge Reyes	5599cce22c	Merge pull request #15934 from Security-Onion-Solutions/reyesj2-patch-1 keep logstash lumberjack pipeline name update unified	2026-05-27 13:37:41 -05:00
reyesj2	b2a82fec29	fix_logstash_0013_lumberjack_pipeline_name Before removing from apply_hotfix function first verify that older installs < 3.1.0 are still upgradable when referencing 'so/0013_input_lumberjack_fleet.conf' via pillar. Failure to do so will prevent logstash from starting	2026-05-27 13:24:23 -05:00
reyesj2	613eca52fc	update hotfix date	2026-05-27 13:24:10 -05:00
Josh Patterson	79987f3659	bootstrap so-soc db in postgres during soup	2026-05-27 13:55:30 -04:00
reyesj2	bf609a112e	LF	2026-05-27 12:21:44 -05:00
reyesj2	0b4a4de609	always run logstash pipeline rename	2026-05-27 12:21:22 -05:00
Jorge Reyes	ad376d2a43	Merge pull request #15930 from Security-Onion-Solutions/reyesj2-patch-1 check for stale logstash pipeline name in local pillar	2026-05-27 10:16:39 -05:00
reyesj2	0834998cca	usuable for next soup	2026-05-27 09:52:29 -05:00
reyesj2	473f93f0ee	check for stale logstash pipeline name in pillars	2026-05-27 09:33:15 -05:00
Josh Patterson	16055c4d88	Merge remote-tracking branch 'origin/3/dev' into jertel/wip	2026-05-27 09:18:33 -04:00
Jorge Reyes	7cc2e045fb	Merge pull request #15925 from Security-Onion-Solutions/reyesj2/soup-heavynode use multiple or combined input	2026-05-26 08:34:33 -05:00
Mike Reeves	6955ee73bf	Merge pull request #15924 from Security-Onion-Solutions/TOoSmOotH-patch-3 Add version number to HOTFIX file	2026-05-26 09:28:41 -04:00
Mike Reeves	c0272ddb81	Add version number to HOTFIX file	2026-05-26 09:24:10 -04:00
reyesj2	d72219c586	use multiple or combined input	2026-05-22 20:04:21 -05:00
Mike Reeves	ffd34d4e0e	Merge pull request #15919 from Security-Onion-Solutions/TOoSmOotH-patch-2 Add 3.2.0 option to discussion template	2026-05-21 15:58:28 -04:00
Mike Reeves	aa78978740	Add 3.2.0 option to discussion template	2026-05-21 15:57:57 -04:00
Mike Reeves	75d4f5e496	Merge pull request #15918 from Security-Onion-Solutions/TOoSmOotH-patch-1 Bump version from 3.1.0 to 3.2.0	2026-05-21 15:49:08 -04:00
Mike Reeves	89a28d2cfe	Bump version from 3.1.0 to 3.2.0	2026-05-21 15:45:58 -04:00
Mike Reeves	c1d187599b	Merge pull request #15912 from Security-Onion-Solutions/3/dev 3.1.0	2026-05-21 15:41:50 -04:00
Mike Reeves	d87313db27	Merge pull request #15911 from Security-Onion-Solutions/3.1.0 3.1.0	2026-05-21 13:50:23 -04:00
Mike Reeves	141a61f5b5	3.1.0	2026-05-21 13:47:03 -04:00
Jorge Reyes	901cbf03e4	Merge pull request #15907 from Security-Onion-Solutions/reyesj2/es-verify-compat Verify compatibility for all ES nodes in the cluster	2026-05-20 14:16:41 -05:00
reyesj2	b485be4602	separate salt-key command from main es version compatiblity loop	2026-05-20 14:12:58 -05:00
reyesj2	7d13007aa9	block soup if all ES nodes are not online and reporting their ES version for compatibility check	2026-05-20 10:03:37 -05:00
reyesj2	d7a1b67095	use pipefail on heavynode versino command to pass through error	2026-05-20 09:16:57 -05:00
reyesj2	6c8997b28a	verify all heavynodes and all searchnodes are at compatible ES version before attempting an elasticsearch upgrade	2026-05-19 22:27:31 -05:00
Jorge Reyes	58f1d08ebe	Merge pull request #15902 from Security-Onion-Solutions/reyesj2/ea-fleet-sync sync elastic agent packages to fleet nodes	2026-05-19 11:08:48 -05:00
reyesj2	d0aa33a255	sync elastic agent packages to fleet nodes	2026-05-19 10:50:17 -05:00
Jorge Reyes	74b50f6009	Merge pull request #15899 from Security-Onion-Solutions/revert-15895-reyesj2/agentinstall Revert "use -verify flag during grid agent install to ensure agent health"	2026-05-16 10:01:58 -05:00
Jorge Reyes	e89c820b65	Revert "use -verify flag during grid agent install to ensure agent health"	2026-05-16 09:59:14 -05:00
Jorge Reyes	9ac05a6ad1	Merge pull request #15895 from Security-Onion-Solutions/reyesj2/agentinstall use -verify flag during grid agent install to ensure agent health	2026-05-15 12:58:09 -05:00
Jason Ertel	24ee3318bc	Merge pull request #15898 from Security-Onion-Solutions/jertel/logcheck exclude fps	2026-05-15 11:38:20 -04:00
Jason Ertel	ce566ba174	exclude fps	2026-05-15 11:36:46 -04:00
Mike Reeves	2635a60a8c	Merge pull request #15896 from Security-Onion-Solutions/quickfixes2 Make so-postgres-backup fail-safe against silent corruption	2026-05-15 09:32:15 -04:00
Mike Reeves	244a73b7a2	Make so-postgres-backup fail-safe against silent corruption The dump pipeline returned gzip's exit status, so a pg_dumpall that died mid-stream still produced a valid .gz holding a truncated dump, written straight to the final filename. The idempotency check then blocked retries for the day and the corrupt file counted toward retention, evicting a good backup each day until none remained. - set -o pipefail so a failed pg_dumpall fails the pipeline - dump to a .tmp file and atomically rename only after success, so the final filename appears only for a complete backup - gzip -t integrity check before publishing - trap-based cleanup of the temp file; sweep stale temps at startup - run retention only after a successful backup, with a glob restricted to finished backups - log timestamped OK/ERROR outcomes to /opt/so/log/postgres/backup.log	2026-05-15 08:48:54 -04:00
Jason Ertel	e45ad45d73	Merge branch '3/dev' into jertel/wip	2026-05-14 18:33:40 -04:00
Mike Reeves	1189621ec5	Merge pull request #15893 from Security-Onion-Solutions/quickfixes2	2026-05-14 18:21:30 -04:00
reyesj2	d2524a593f	use -verify flag during grid agent install to ensure agent health	2026-05-14 17:12:02 -05:00
Josh Brower	f2ab2354fd	Merge pull request #15894 from Security-Onion-Solutions/3/nginx-fix Tweak for nginx upgrade	2026-05-14 23:20:57 +02:00
Mike Reeves	64731c73ba	Fix psql :var substitution in telegraf role and retention SQL psql does not substitute :var references inside dollar-quoted strings, so the DO blocks in the user and retention subcommands were receiving literal colons and failing (silently for user, via hide_output: True). Rewrite the conditional CREATE/ALTER ROLE with SELECT format(...) \\gexec and guard the retention UPDATE with \\gset + \\if.	2026-05-14 17:17:49 -04:00
Josh Brower	024fece607	Tweak for nginx upgrade	2026-05-14 17:08:57 -04:00
Mike Reeves	249b126312	Quote telegraf role env vars to survive YAML-special chars in passwords	2026-05-14 17:08:51 -04:00
Mike Reeves	8e38bff0c3	Rename telegraf_postgres.sh to so-telegraf-postgres	2026-05-14 16:55:53 -04:00
Mike Reeves	b9f2d56932	Consolidate telegraf postgres SQL into multi-mode script Replace inline psql heredocs in telegraf_users.sls with subcommand dispatcher telegraf_postgres.sh: create_db, group_role, user, retention.	2026-05-14 16:37:08 -04:00
Mike Reeves	03fa01a705	Move telegraf_role.sh to postgres tools/sbin	2026-05-14 16:18:01 -04:00
Mike Reeves	450eacca41	Move telegraf role provisioning to external script with env vars	2026-05-14 16:15:54 -04:00
Mike Reeves	b7a13899f7	Suppress output logging for postgres telegraf role provisioning	2026-05-14 15:56:04 -04:00
Mike Reeves	6f273d7d97	Rename init-users.sh to init-db.sh and update all references	2026-05-14 15:53:00 -04:00
Josh Patterson	fabecb8288	remove highstate from startup_states. highstate on system start	2026-05-14 13:57:40 -04:00
Jason Ertel	907f699721	state rename	2026-05-14 11:03:08 -04:00
Jason Ertel	e7a7047f71	Merge branch '3/dev' into jertel/wip	2026-05-14 11:01:36 -04:00
Josh Brower	b328820c01	Merge pull request #15792 from Security-Onion-Solutions/3/strelkalnk Fix module name	2026-05-14 13:06:26 +02:00
Jason Ertel	936295f1c4	Merge branch '3/dev' into jertel/wip	2026-05-13 17:28:25 -04:00
Jason Ertel	61ca60a94c	prep for soc db config	2026-05-13 17:28:07 -04:00
Jorge Reyes	638aca97c8	Merge pull request #15877 from Security-Onion-Solutions/reyesj2-patch-1 update redis index template	2026-05-13 13:44:04 -05:00
Jorge Reyes	74a5c895e8	Merge pull request #15889 from Security-Onion-Solutions/reyesj2/zeek-ja4d add zeek.ja4d ingest pipeline	2026-05-13 13:43:56 -05:00
reyesj2	d56bf01823	add zeek.ja4d ingest pipeline	2026-05-13 12:32:54 -05:00
Mike Reeves	d29267d9c2	Merge pull request #15888 from Security-Onion-Solutions/TOoSmOotH-patch-1 Change Telegraf output from BOTH to INFLUXDB	2026-05-13 12:47:55 -04:00
Mike Reeves	72327285b2	Change Telegraf output from BOTH to INFLUXDB	2026-05-13 11:58:21 -04:00
Josh Patterson	cc7a237457	Merge pull request #15887 from Security-Onion-Solutions/m0duspwnens-patch-1 remove stig from hypervisor and managerhype	2026-05-13 10:57:58 -04:00
Josh Patterson	b068ad2b35	remove stig from hypervisor and managerhype	2026-05-13 10:53:11 -04:00
Jorge Reyes	4a2177c827	update redis index template missing redis integration component templates	2026-05-11 16:15:56 -05:00
Josh Brower	affede7f0a	Rename 'ScanLNK' to 'ScanLnk' in YAML config	2026-04-20 10:01:10 -04:00
Josh Brower	97366c0496	Rename 'ScanLNK' to 'ScanLnk' in defaults.yaml	2026-04-20 10:00:29 -04:00
@@ -1 +1 @@
 .1.0
 .2.0