make so-yaml PG-canonical and add pillar-change reactor stack

Two coupled changes that together let so_pillar.* be the canonical config store, with config edits driving service reloads automatically: so-yaml PG-canonical mode - Adds /opt/so/conf/so-yaml/mode (and SO_YAML_BACKEND env override) with three values: dual (legacy), postgres (PG-only for managed paths), disk (emergency rollback). Bootstrap files (secrets.sls, ca/init.sls, *.nodes.sls, top.sls, ...) stay disk-only regardless via the existing SkipPath allowlist in so_yaml_postgres.locate. - loadYaml/writeYaml/purgeFile now route to so_pillar.* in postgres mode: replace/add/get all read+write the database with no disk file ever appearing. PG failure is fatal in postgres mode (no silent fallback); dual mode preserves the prior best-effort mirror. - so_yaml_postgres gains read_yaml(path), is_pg_managed(path), and is_enabled() so so-yaml can answer "is this path PG-managed and is PG up" without reaching into private helpers. - schema_pillar.sls writes /opt/so/conf/so-yaml/mode = postgres after the importer succeeds, so flipping postgres:so_pillar:enabled flips so-yaml's behavior in lockstep with the schema being live. pg_notify-driven change fan-out - 008_change_notify.sql adds so_pillar.change_queue + an AFTER trigger on pillar_entry that enqueues the locator and pg_notifies 'so_pillar_change'. Queue is drained at-least-once so engine restarts don't lose events; pg_notify is just the wakeup signal. - New salt-master engine pg_notify_pillar.py LISTENs on the channel, drains the queue with FOR UPDATE SKIP LOCKED, debounces bursts, and fires 'so/pillar/changed' events grouped by (scope, role, minion). - Reactor so_pillar_changed.sls catches the tag and dispatches to orch.so_pillar_reload, which carries a DISPATCH map of pillar-path prefix -> (state sls, role grain set) so adding a new service to the auto-reload list is a one-line edit instead of a new reactor. - Engine + reactor wiring is gated on the same postgres:so_pillar:enabled flag as the schema and ext_pillar config so the whole stack flips on/off together. Tests: 21 new cases (112 total, all passing) covering mode resolution, PG-managed detection, and PG-canonical read/write/purge routing with the PG client stubbed.
add so-yaml dual-write to so_pillar.* + purge verb
2026-05-09 04:42:40 +02:00 · 2026-05-01 09:31:48 -04:00 · 2026-04-30 17:09:58 -04:00 · 2026-04-30 16:34:05 -04:00 · 2026-04-30 16:30:57 -04:00 · 2026-04-28 10:15:47 -04:00
52 changed files with 3094 additions and 369 deletions
@@ -0,0 +1,12 @@
+# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
+# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
+# https://securityonion.net/license; you may not use this file except in compliance with the
+# Elastic License 2.0.
+
+# Per-minion Telegraf Postgres credentials. so-telegraf-cred on the manager is
+# the single writer; it mutates /opt/so/saltstack/local/pillar/telegraf/creds.sls
+# under flock. Pillar_roots order (local before default) means the populated
+# copy shadows this default on any real grid; this file exists so the pillar
+# key is always defined on fresh installs and when no minions have creds yet.
+telegraf:
+  postgres_creds: {}
@@ -17,6 +17,7 @@ base:
    - sensoroni.adv_sensoroni
    - telegraf.soc_telegraf
    - telegraf.adv_telegraf
+    - telegraf.creds
    - versionlock.soc_versionlock
    - versionlock.adv_versionlock
    - soc.license
@@ -35,6 +35,8 @@
    'kratos',
    'hydra',
    'elasticfleet',
+    'elasticfleet.manager',
+    'elasticsearch.cluster',
    'elastic-fleet-package-registry',
    'utility'
 ] %}
@@ -79,7 +81,7 @@
        ),
        'so-heavynode': (
            sensor_states +
-            ['elasticagent', 'elasticsearch', 'logstash', 'redis', 'nginx']
+            ['elasticagent', 'elasticsearch', 'elasticsearch.cluster', 'logstash', 'redis', 'nginx']
        ),
        'so-idh': (
            ['idh']
@@ -188,8 +188,14 @@ update_docker_containers() {
        if [ -z "$HOSTNAME" ]; then
          HOSTNAME=$(hostname)
        fi
-        docker tag $CONTAINER_REGISTRY/$IMAGEREPO/$image $HOSTNAME:5000/$IMAGEREPO/$image >> "$LOG_FILE" 2>&1 
-        docker push $HOSTNAME:5000/$IMAGEREPO/$image >> "$LOG_FILE" 2>&1 
+        docker tag $CONTAINER_REGISTRY/$IMAGEREPO/$image $HOSTNAME:5000/$IMAGEREPO/$image >> "$LOG_FILE" 2>&1 || {
+          echo "Unable to tag $image" >> "$LOG_FILE" 2>&1 
+          exit 1
+        }
+        docker push $HOSTNAME:5000/$IMAGEREPO/$image >> "$LOG_FILE" 2>&1 || {
+          echo "Unable to push $image" >> "$LOG_FILE" 2>&1 
+          exit 1
+        }
      fi
    else
      echo "There is a problem downloading the $image image. Details: " >> "$LOG_FILE" 2>&1 
@@ -227,7 +227,7 @@ if [[ $EXCLUDE_KNOWN_ERRORS == 'Y' ]]; then
    EXCLUDED_ERRORS="$EXCLUDED_ERRORS|from NIC checksum offloading" # zeek reporter.log
    EXCLUDED_ERRORS="$EXCLUDED_ERRORS|marked for removal"           # docker container getting recycled
    EXCLUDED_ERRORS="$EXCLUDED_ERRORS|tcp 127.0.0.1:6791: bind: address already in use" # so-elastic-fleet agent restarting. Seen starting w/ 8.18.8 https://github.com/elastic/kibana/issues/201459
-    EXCLUDED_ERRORS="$EXCLUDED_ERRORS|TransformTask\] \[logs-(tychon|aws_billing|microsoft_defender_endpoint).*user so_kibana lacks the required permissions \[logs-\1" # Known issue with 3 integrations using kibana_system role vs creating unique api creds with proper permissions.
+    EXCLUDED_ERRORS="$EXCLUDED_ERRORS|TransformTask\] \[logs-(tychon|aws_billing|microsoft_defender_endpoint|armis|o365_metrics|microsoft_sentinel|snyk).*user so_kibana lacks the required permissions \[(logs|metrics)-\1" # Known issue with integrations starting transform jobs that are explicitly not allowed to start as a system user. (installed as so_elastic / so_kibana)
    EXCLUDED_ERRORS="$EXCLUDED_ERRORS|manifest unknown"             # appears in so-dockerregistry log for so-tcpreplay following docker upgrade to 29.2.1-1
 fi

@@ -9,7 +9,7 @@

 . /usr/sbin/so-common

-software_raid=("SOSMN" "SOSMN-DE02" "SOSSNNV" "SOSSNNV-DE02" "SOS10k-DE02" "SOS10KNV" "SOS10KNV-DE02" "SOS10KNV-DE02" "SOS2000-DE02" "SOS-GOFAST-LT-DE02" "SOS-GOFAST-MD-DE02" "SOS-GOFAST-HV-DE02")
+software_raid=("SOSMN" "SOSMN-DE02" "SOSSNNV" "SOSSNNV-DE02" "SOS10k-DE02" "SOS10KNV" "SOS10KNV-DE02" "SOS10KNV-DE02" "SOS2000-DE02" "SOS-GOFAST-LT-DE02" "SOS-GOFAST-MD-DE02" "SOS-GOFAST-HV-DE02" "HVGUEST")
 hardware_raid=("SOS1000" "SOS1000F" "SOSSN7200" "SOS5000" "SOS4000")

 {%- if salt['grains.get']('sosmodel', '') %}
@@ -87,6 +87,11 @@ check_boss_raid() {
 }

 check_software_raid() {
+  if [[ ! -f /proc/mdstat ]]; then
+    SWRAID=0
+    return
+  fi
+
  SWRC=$(grep "_" /proc/mdstat)
  if [[ -n $SWRC ]]; then
      # RAID is failed in some way
@@ -107,7 +112,9 @@ if [[ "$is_hwraid" == "true" ]]; then
 fi
 if [[ "$is_softwareraid" == "true" ]]; then
 	check_software_raid
-  check_boss_raid
+  if [ "$model" != "HVGUEST" ]; then
+    check_boss_raid
+  fi
 fi

 sum=$(($SWRAID + $BOSSRAID + $HWRAID))
@@ -17,65 +17,17 @@ include:
  - logstash.ssl
  - elasticfleet.config
  - elasticfleet.sostatus
+{%- if GLOBALS.role != "so-fleet" %}
+  - elasticfleet.manager
+{%- endif %}

-{% if grains.role not in ['so-fleet'] %}
+{% if GLOBALS.role != "so-fleet" %}
 # Wait for Elasticsearch to be ready - no reason to try running Elastic Fleet server if ES is not ready
 wait_for_elasticsearch_elasticfleet:
  cmd.run:
    - name: so-elasticsearch-wait
-{% endif %}
-
-# If enabled, automatically update Fleet Logstash Outputs
-{% if ELASTICFLEETMERGED.config.server.enable_auto_configuration and grains.role not in ['so-import', 'so-eval', 'so-fleet'] %}
-so-elastic-fleet-auto-configure-logstash-outputs:
-  cmd.run:
-    - name: /usr/sbin/so-elastic-fleet-outputs-update
-    - retry:
-        attempts: 4
-        interval: 30
-
-{# Separate from above in order to catch elasticfleet-logstash.crt changes and force update to fleet output policy #}
-so-elastic-fleet-auto-configure-logstash-outputs-force:
-  cmd.run:
-    - name: /usr/sbin/so-elastic-fleet-outputs-update --certs
-    - retry:
-        attempts: 4
-        interval: 30
-    - onchanges:
-        - x509: etc_elasticfleet_logstash_crt
-        - x509: elasticfleet_kafka_crt
-{% endif %}
-
-# If enabled, automatically update Fleet Server URLs & ES Connection
-{% if ELASTICFLEETMERGED.config.server.enable_auto_configuration and grains.role not in ['so-fleet'] %}
-so-elastic-fleet-auto-configure-server-urls:
-  cmd.run:
-    - name: /usr/sbin/so-elastic-fleet-urls-update
-    - retry:
-        attempts: 4
-        interval: 30
-{% endif %}
-
-# Automatically update Fleet Server Elasticsearch URLs & Agent Artifact URLs
-{% if grains.role not in ['so-fleet'] %}
-so-elastic-fleet-auto-configure-elasticsearch-urls:
-  cmd.run:
-    - name: /usr/sbin/so-elastic-fleet-es-url-update
-    - retry:
-        attempts: 4
-        interval: 30
-
-so-elastic-fleet-auto-configure-artifact-urls:
-  cmd.run:
-    - name: /usr/sbin/so-elastic-fleet-artifacts-url-update
-    - retry:
-        attempts: 4
-        interval: 30
-
-{% endif %}

 # Sync Elastic Agent artifacts to Fleet Node
-{% if grains.role in ['so-fleet'] %}
 elasticagent_syncartifacts:
  file.recurse:
    - name: /nsm/elastic-fleet/artifacts/beats
@@ -149,57 +101,6 @@ so-elastic-fleet:
      - x509: etc_elasticfleet_crt
 {%   endif %}

-{%  if GLOBALS.role != "so-fleet" %}
-so-elastic-fleet-package-statefile:
-  file.managed:
-    - name: /opt/so/state/elastic_fleet_packages.txt
-    - contents: {{ELASTICFLEETMERGED.packages}}
-
-so-elastic-fleet-package-upgrade:
-  cmd.run:
-    - name: /usr/sbin/so-elastic-fleet-package-upgrade
-    - retry:
-        attempts: 3
-        interval: 10
-    - onchanges:
-      - file: /opt/so/state/elastic_fleet_packages.txt
-
-so-elastic-fleet-integrations:
-  cmd.run:
-    - name: /usr/sbin/so-elastic-fleet-integration-policy-load
-    - retry:
-        attempts: 3
-        interval: 10
-
-so-elastic-agent-grid-upgrade:
-  cmd.run:
-    - name: /usr/sbin/so-elastic-agent-grid-upgrade
-    - retry:
-        attempts: 12
-        interval: 5
-
-so-elastic-fleet-integration-upgrade:
-  cmd.run:
-    - name: /usr/sbin/so-elastic-fleet-integration-upgrade
-    - retry:
-        attempts: 3
-        interval: 10
-
-{# Optional integrations script doesn't need the retries like so-elastic-fleet-integration-upgrade which loads the default integrations #}
-so-elastic-fleet-addon-integrations:
-  cmd.run:
-    - name: /usr/sbin/so-elastic-fleet-optional-integrations-load
-
-{%   if ELASTICFLEETMERGED.config.defend_filters.enable_auto_configuration %}
-so-elastic-defend-manage-filters-file-watch:
-  cmd.run:
-    - name: python3 /sbin/so-elastic-defend-manage-filters.py -c /opt/so/conf/elasticsearch/curl.config -d /opt/so/conf/elastic-fleet/defend-exclusions/disabled-filters.yaml -i /nsm/securityonion-resources/event_filters/ -i /opt/so/conf/elastic-fleet/defend-exclusions/rulesets/custom-filters/ &>> /opt/so/log/elasticfleet/elastic-defend-manage-filters.log
-    - onchanges:
-      - file: elasticdefendcustom
-      - file: elasticdefenddisabled
-{%    endif %}
-{%  endif %}
-
 delete_so-elastic-fleet_so-status.disabled:
  file.uncomment:
    - name: /opt/so/conf/so-status/so-status.conf
@@ -0,0 +1,112 @@
+# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
+# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at 
+# https://securityonion.net/license; you may not use this file except in compliance with the
+# Elastic License 2.0.
+
+{% from 'allowed_states.map.jinja' import allowed_states %}
+{% if sls in allowed_states %}
+{%   from 'elasticfleet/map.jinja' import ELASTICFLEETMERGED %}
+
+include:
+  - elasticfleet.config
+
+# If enabled, automatically update Fleet Logstash Outputs
+{% if ELASTICFLEETMERGED.config.server.enable_auto_configuration and grains.role not in ['so-import', 'so-eval'] %}
+so-elastic-fleet-auto-configure-logstash-outputs:
+  cmd.run:
+    - name: /usr/sbin/so-elastic-fleet-outputs-update
+    - retry:
+        attempts: 4
+        interval: 30
+
+{# Separate from above in order to catch elasticfleet-logstash.crt changes and force update to fleet output policy #}
+so-elastic-fleet-auto-configure-logstash-outputs-force:
+  cmd.run:
+    - name: /usr/sbin/so-elastic-fleet-outputs-update --certs
+    - retry:
+        attempts: 4
+        interval: 30
+    - onchanges:
+        - x509: etc_elasticfleet_logstash_crt
+        - x509: elasticfleet_kafka_crt
+{% endif %}
+
+# If enabled, automatically update Fleet Server URLs & ES Connection
+so-elastic-fleet-auto-configure-server-urls:
+  cmd.run:
+    - name: /usr/sbin/so-elastic-fleet-urls-update
+    - retry:
+        attempts: 4
+        interval: 30
+
+# Automatically update Fleet Server Elasticsearch URLs & Agent Artifact URLs
+so-elastic-fleet-auto-configure-elasticsearch-urls:
+  cmd.run:
+    - name: /usr/sbin/so-elastic-fleet-es-url-update
+    - retry:
+        attempts: 4
+        interval: 30
+
+so-elastic-fleet-auto-configure-artifact-urls:
+  cmd.run:
+    - name: /usr/sbin/so-elastic-fleet-artifacts-url-update
+    - retry:
+        attempts: 4
+        interval: 30
+
+so-elastic-fleet-package-statefile:
+  file.managed:
+    - name: /opt/so/state/elastic_fleet_packages.txt
+    - contents: {{ELASTICFLEETMERGED.packages}}
+
+so-elastic-fleet-package-upgrade:
+  cmd.run:
+    - name: /usr/sbin/so-elastic-fleet-package-upgrade
+    - retry:
+        attempts: 3
+        interval: 10
+    - onchanges:
+      - file: /opt/so/state/elastic_fleet_packages.txt
+
+so-elastic-fleet-integrations:
+  cmd.run:
+    - name: /usr/sbin/so-elastic-fleet-integration-policy-load
+    - retry:
+        attempts: 3
+        interval: 10
+
+so-elastic-agent-grid-upgrade:
+  cmd.run:
+    - name: /usr/sbin/so-elastic-agent-grid-upgrade
+    - retry:
+        attempts: 12
+        interval: 5
+
+so-elastic-fleet-integration-upgrade:
+  cmd.run:
+    - name: /usr/sbin/so-elastic-fleet-integration-upgrade
+    - retry:
+        attempts: 3
+        interval: 10
+
+{# Optional integrations script doesn't need the retries like so-elastic-fleet-integration-upgrade which loads the default integrations #}
+so-elastic-fleet-addon-integrations:
+  cmd.run:
+    - name: /usr/sbin/so-elastic-fleet-optional-integrations-load
+
+{% if ELASTICFLEETMERGED.config.defend_filters.enable_auto_configuration %}
+so-elastic-defend-manage-filters-file-watch:
+  cmd.run:
+    - name: python3 /sbin/so-elastic-defend-manage-filters.py -c /opt/so/conf/elasticsearch/curl.config -d /opt/so/conf/elastic-fleet/defend-exclusions/disabled-filters.yaml -i /nsm/securityonion-resources/event_filters/ -i /opt/so/conf/elastic-fleet/defend-exclusions/rulesets/custom-filters/ &>> /opt/so/log/elasticfleet/elastic-defend-manage-filters.log
+    - onchanges:
+      - file: elasticdefendcustom
+      - file: elasticdefenddisabled
+{% endif %}
+
+{% else %}
+
+{{sls}}_state_not_allowed:
+  test.fail_without_changes:
+    - name: {{sls}}_state_not_allowed
+
+{% endif %}
@@ -5,11 +5,12 @@
 # this file except in compliance with the Elastic License 2.0.

 . /usr/sbin/so-common
+. /usr/sbin/so-elastic-fleet-common
 {%- import_yaml 'elasticsearch/defaults.yaml' as ELASTICSEARCHDEFAULTS %}
 {%- import_yaml 'elasticfleet/defaults.yaml' as ELASTICFLEETDEFAULTS %}
 {# Optionally override Elasticsearch version for Elastic Agent patch releases #}
 {%- if ELASTICFLEETDEFAULTS.elasticfleet.patch_version is defined %}
-{%-   do ELASTICSEARCHDEFAULTS.update({'elasticsearch': {'version': ELASTICFLEETDEFAULTS.elasticfleet.patch_version}}) %}
+{%-   do ELASTICSEARCHDEFAULTS.elasticsearch.update({'version': ELASTICFLEETDEFAULTS.elasticfleet.patch_version}) %}
 {%- endif %}

 # Only run on Managers
@@ -19,13 +20,10 @@ if ! is_manager_node; then
 fi

 # Get current list of Grid Node Agents that need to be upgraded
-RAW_JSON=$(curl -K /opt/so/conf/elasticsearch/curl.config -L "http://localhost:5601/api/fleet/agents?perPage=20&page=1&kuery=NOT%20agent.version%3A%20{{ELASTICSEARCHDEFAULTS.elasticsearch.version}}%20AND%20policy_id%3A%20so-grid-nodes_%2A&showInactive=false&getStatusSummary=true" --retry 3 --retry-delay 30 --fail 2>/dev/null)
+if ! RAW_JSON=$(fleet_api "agents?perPage=20&page=1&kuery=NOT%20agent.version%3A%20{{ELASTICSEARCHDEFAULTS.elasticsearch.version | urlencode }}%20AND%20policy_id%3A%20so-grid-nodes_%2A&showInactive=false&getStatusSummary=true" -H 'kbn-xsrf: true' -H 'Content-Type: application/json'); then

-# Check to make sure that the server responded with good data - else, bail from script
-CHECKSUM=$(jq -r '.page' <<< "$RAW_JSON")
-if [ "$CHECKSUM" -ne 1 ]; then
- printf "Failed to query for current Grid Agents...\n"
- exit 1
+    printf "Failed to query for current Grid Agents...\n"
+    exit 1
 fi

 # Generate list of Node Agents that need updates
@@ -36,10 +34,12 @@ if [ "$OUTDATED_LIST" != '[]' ]; then
   printf "Initiating upgrades for $AGENTNUMBERS Agents to Elastic {{ELASTICSEARCHDEFAULTS.elasticsearch.version}}...\n\n"

   # Generate updated JSON payload
-   JSON_STRING=$(jq -n --arg ELASTICVERSION {{ELASTICSEARCHDEFAULTS.elasticsearch.version}} --arg UPDATELIST $OUTDATED_LIST '{"version": $ELASTICVERSION,"agents": $UPDATELIST }')
+   JSON_STRING=$(jq -n --arg ELASTICVERSION "{{ELASTICSEARCHDEFAULTS.elasticsearch.version}}" --argjson UPDATELIST "$OUTDATED_LIST" '{"version": $ELASTICVERSION,"agents": $UPDATELIST }')

   # Update Node Agents
-   curl -K /opt/so/conf/elasticsearch/curl.config -L -X POST "http://localhost:5601/api/fleet/agents/bulk_upgrade" -H 'kbn-xsrf: true' -H 'Content-Type: application/json' -d "$JSON_STRING"
+   if ! fleet_api "agents/bulk_upgrade" -XPOST -H 'kbn-xsrf: true' -H 'Content-Type: application/json' -d "$JSON_STRING"; then
+       printf "Failed to initiate Agent upgrades...\n"
+   fi
 else
    printf "No Agents need updates... Exiting\n\n"
    exit 0
@@ -4,7 +4,7 @@
 # Elastic License 2.0.

 {% from 'allowed_states.map.jinja' import allowed_states %}
-{% if sls.split('.')[0] in allowed_states %}
+{% if sls in allowed_states %}
 {%   from 'vars/globals.map.jinja' import GLOBALS %}
 {%   from 'elasticsearch/config.map.jinja' import ELASTICSEARCHMERGED %}
 {%   from 'elasticsearch/template.map.jinja' import ES_INDEX_SETTINGS, SO_MANAGED_INDICES %}
@@ -17,7 +17,7 @@ include:
  - elasticsearch.ssl
  - elasticsearch.config
  - elasticsearch.sostatus
-{%- if GLOBALS.role != 'so-searchode' %}
+{%- if GLOBALS.role != "so-searchnode" %}
  - elasticsearch.cluster
 {%- endif%}

@@ -102,11 +102,6 @@ so-elasticsearch:
      - cmd: auth_users_roles_inode
      - cmd: auth_users_inode

-delete_so-elasticsearch_so-status.disabled:
-  file.uncomment:
-    - name: /opt/so/conf/so-status/so-status.conf
-    - regex: ^so-elasticsearch$
-
 wait_for_so-elasticsearch:
  http.wait_for_successful_query:
    - name: "https://localhost:9200/"
@@ -117,10 +112,14 @@ wait_for_so-elasticsearch:
    - status: 200
    - wait_for: 300
    - request_interval: 15
-    - backend: requests
    - require:
      - docker_container: so-elasticsearch

+delete_so-elasticsearch_so-status.disabled:
+  file.uncomment:
+    - name: /opt/so/conf/so-status/so-status.conf
+    - regex: ^so-elasticsearch$
+
 {% else %}

 {{sls}}_state_not_allowed:
@@ -103,11 +103,13 @@ load_component_templates() {
    local pattern="${ELASTICSEARCH_TEMPLATES_DIR}/component/$2"
    local append_mappings="${3:-"false"}"

-    # current state of nullglob shell option
-    shopt -q nullglob && nullglob_set=1 || nullglob_set=0
-
-    shopt -s nullglob
    echo -e "\nLoading $printed_name component templates...\n"
+
+    if ! compgen -G "${pattern}/*.json" > /dev/null; then
+        echo "No $printed_name component templates found in ${pattern}, skipping."
+        return
+    fi
+
    for component in "$pattern"/*.json; do
        tmpl_name=$(basename "${component%.json}")

@@ -121,11 +123,6 @@ load_component_templates() {
            SO_LOAD_FAILURES_NAMES+=("$component")
        fi
    done
-
-    # restore nullglob shell option if needed
-    if [[ $nullglob_set -eq 1 ]]; then
-        shopt -u nullglob
-    fi
 }

 check_elasticsearch_responsive() {
@@ -136,7 +133,32 @@ check_elasticsearch_responsive() {
        fail "Elasticsearch is not responding. Please review Elasticsearch logs /opt/so/log/elasticsearch/securityonion.log for more details. Additionally, consider running so-elasticsearch-troubleshoot."
 }

-if [[ "$FORCE" == "true" || ! -f "$SO_STATEFILE_SUCCESS" ]]; then
+index_templates_exist() {
+    local templates_dir="$1"
+
+    if [[ ! -d "$templates_dir" ]]; then
+        return 1
+    fi
+
+    compgen -G "${templates_dir}/*.json" > /dev/null
+}
+
+should_load_addon_templates() {
+    if [[ "$IS_HEAVYNODE" == "true" ]]; then
+        return 1
+    fi
+
+    # Skip statefile checks when forcing template load
+    if [[ "$FORCE" != "true" ]]; then
+        if [[ ! -f "$SO_STATEFILE_SUCCESS" || -f "$ADDON_STATEFILE_SUCCESS" ]]; then
+            return 1
+        fi
+    fi
+
+    index_templates_exist "$ADDON_TEMPLATES_DIR"
+}
+
+if [[ "$FORCE" == "true" || ! -f "$SO_STATEFILE_SUCCESS" ]] && index_templates_exist "$SO_TEMPLATES_DIR"; then
    check_elasticsearch_responsive

    if [[ "$IS_HEAVYNODE" == "false" ]]; then
@@ -201,13 +223,14 @@ if [[ "$FORCE" == "true" || ! -f "$SO_STATEFILE_SUCCESS" ]]; then
            fail "Failed to load all Security Onion core templates successfully."
        fi
    fi
-else
-
+elif ! index_templates_exist "$SO_TEMPLATES_DIR"; then
+    echo "No Security Onion core index templates found in ${SO_TEMPLATES_DIR}, skipping."
+elif [[ -f "$SO_STATEFILE_SUCCESS" ]]; then
    echo "Security Onion core templates already loaded"
 fi

 # Start loading addon templates
-if [[ (-d "$ADDON_TEMPLATES_DIR" && -f "$SO_STATEFILE_SUCCESS" && "$IS_HEAVYNODE" == "false" && ! -f "$ADDON_STATEFILE_SUCCESS") || (-d "$ADDON_TEMPLATES_DIR" && "$IS_HEAVYNODE" == "false" && "$FORCE" == "true") ]]; then
+if should_load_addon_templates; then

    check_elasticsearch_responsive

@@ -59,5 +59,4 @@ global:
    description: Allows use of Endgame with Security Onion. This feature requires a license from Endgame.
    global: True
    advanced: True
-    helpLink: influxdb

@@ -22,7 +22,7 @@ kibana:
          - default
          - file
    migrations:
-      discardCorruptObjects: "8.18.8"
+      discardCorruptObjects: "9.3.3"
    telemetry:
      enabled: False
    xpack:
@@ -3,8 +3,8 @@ kratos:
    description: Enables or disables the Kratos authentication system. WARNING - Disabling this process will cause the grid to malfunction. Re-enabling this setting will require manual effort via SSH.
    forcedType: bool
    advanced: True
+    readonly: True
    helpLink: kratos
-
  oidc:
    enabled:
      description: Set to True to enable OIDC / Single Sign-On (SSO) to SOC. Requires a valid Security Onion license key.
@@ -273,7 +273,7 @@ function deleteMinionFiles () {
 		log "ERROR" "Failed to delete $PILLARFILE"
 		return 1
 	fi
-	
+
 	rm -f $ADVPILLARFILE
 	if [ $? -ne 0 ]; then
 		log "ERROR" "Failed to delete $ADVPILLARFILE"
@@ -281,6 +281,39 @@ function deleteMinionFiles () {
 	fi
 }

+# Remove this minion's postgres Telegraf credential from the shared creds
+# pillar and drop the matching role in Postgres. Always returns 0 so a dead
+# or unreachable so-postgres doesn't block minion deletion — in that case we
+# log a warning and leave the role behind for manual cleanup.
+function remove_postgres_telegraf_from_minion() {
+	local MINION_SAFE
+	MINION_SAFE=$(echo "$MINION_ID" | tr '.-' '__' | tr '[:upper:]' '[:lower:]')
+	local PG_USER="so_telegraf_${MINION_SAFE}"
+
+	log "INFO" "Removing postgres telegraf cred for $MINION_ID"
+
+	so-telegraf-cred remove "$MINION_ID" >/dev/null 2>&1 || true
+
+	if docker ps --format '{{.Names}}' 2>/dev/null | grep -q '^so-postgres$'; then
+		if ! docker exec -i so-postgres psql -v ON_ERROR_STOP=1 -U postgres -d so_telegraf >/dev/null 2>&1 <<EOSQL
+DO \$\$
+BEGIN
+    IF EXISTS (SELECT FROM pg_catalog.pg_roles WHERE rolname = '$PG_USER') THEN
+        EXECUTE format('REASSIGN OWNED BY %I TO so_telegraf', '$PG_USER');
+        EXECUTE format('DROP OWNED BY %I', '$PG_USER');
+        EXECUTE format('DROP ROLE %I', '$PG_USER');
+    END IF;
+END
+\$\$;
+EOSQL
+		then
+			log "WARN" "Failed to drop postgres role $PG_USER; pillar entry was removed — drop manually if the role persists"
+		fi
+	else
+		log "WARN" "so-postgres container is not running; skipping DB role cleanup for $PG_USER"
+	fi
+}
+
 # Create the minion file
 function ensure_socore_ownership() {
 	log "INFO" "Setting socore ownership on minion files"
@@ -542,6 +575,17 @@ function add_telegraf_to_minion() {
        log "ERROR" "Failed to add telegraf configuration to $PILLARFILE"
        return 1
    fi
+
+    # Provision the per-minion postgres Telegraf credential in the shared
+    # telegraf/creds.sls pillar. so-telegraf-cred is the only writer; it
+    # generates a password on first add and is a no-op on re-add so the cred
+    # is stable across repeated so-minion runs. postgres.telegraf_users on the
+    # manager creates/updates the DB role from the same pillar.
+    so-telegraf-cred add "$MINION_ID"
+    if [ $? -ne 0 ]; then
+        log "ERROR" "Failed to provision postgres telegraf cred for $MINION_ID"
+        return 1
+    fi
 }

 function add_influxdb_to_minion() {
@@ -1069,6 +1113,7 @@ case "$OPERATION" in

 	"delete")
 		log "INFO" "Removing minion $MINION_ID"
+		remove_postgres_telegraf_from_minion
 		deleteMinionFiles || {
 			log "ERROR" "Failed to delete minion files for $MINION_ID"
 			exit 1
@@ -0,0 +1,329 @@
+#!/usr/bin/env python3
+
+# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
+# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
+# https://securityonion.net/license; you may not use this file except in compliance with the
+# Elastic License 2.0.
+
+"""
+so-pillar-import — populate the so_pillar.* schema in so-postgres from the
+on-disk Salt pillar tree.
+
+Reads /opt/so/saltstack/local/pillar/, decomposes each .sls file into a
+(scope, role|minion_id, pillar_path, data) tuple, and UPSERTs it into
+so_pillar.pillar_entry. Idempotent — re-running with no SLS edits produces
+no version bumps because the audit trigger only writes a row when data
+actually changes.
+
+Bootstrap and mine-driven files are skipped (see EXCLUDE_BASENAMES /
+EXCLUDE_PREFIXES below). Files containing Jinja templates ({% or {{) are
+also skipped — those stay disk-authoritative and ext_pillar_first: False
+means they render before the PG overlay anyway.
+
+All SQL goes through `docker exec so-postgres psql` so no separate DSN
+config is required at first-install time. Designed to be called by
+salt/postgres/schema_pillar.sls (initial seed) and by salt/manager/tools/
+sbin/so-minion (per-minion sync on add/delete).
+"""
+
+import argparse
+import json
+import os
+import shlex
+import subprocess
+import sys
+from pathlib import Path
+
+import yaml
+
+
+PILLAR_LOCAL_ROOT = Path("/opt/so/saltstack/local/pillar")
+PILLAR_DEFAULT_ROOT = Path("/opt/so/saltstack/default/pillar")
+DOCKER_CONTAINER = "so-postgres"
+PG_SUPERUSER = "postgres"
+PG_DATABASE = "securityonion"
+
+# Files that must NEVER move to Postgres. These are read by Salt before
+# Postgres is reachable, or contain renderer-time computed values (mine, etc.).
+EXCLUDE_BASENAMES = {
+    "secrets.sls",
+    "auth.sls",          # postgres/auth.sls bootstrap
+    "top.sls",
+}
+# Filename prefixes to skip — these are renderer-time computed pillars
+# (Salt mine, file_exists guards, etc.) that have to stay on disk.
+EXCLUDE_PATH_FRAGMENTS = (
+    "/elasticsearch/nodes.sls",
+    "/redis/nodes.sls",
+    "/kafka/nodes.sls",
+    "/hypervisor/nodes.sls",
+    "/logstash/nodes.sls",
+    "/node_data/ips.sls",
+    "/postgres/auth.sls",
+    "/elasticsearch/auth.sls",
+    "/kibana/secrets.sls",
+)
+
+
+def log(level, msg):
+    print(f"[{level}] {msg}", file=sys.stderr)
+
+
+def is_jinja_templated(content_bytes):
+    return b"{%" in content_bytes or b"{{" in content_bytes
+
+
+def classify(path):
+    """Return (scope, role_name, minion_id, pillar_path) for a pillar file
+    or None to skip it. role_name is None for now — the importer leaves role
+    membership to the so_pillar.minion trigger and the salt/auth reactor."""
+    rel_str = str(path)
+    if path.name in EXCLUDE_BASENAMES:
+        return None
+    for frag in EXCLUDE_PATH_FRAGMENTS:
+        if frag in rel_str:
+            return None
+
+    # /local/pillar/minions/<id>.sls  or  adv_<id>.sls
+    if path.parent.name == "minions":
+        stem = path.stem  # filename without .sls
+        if stem.startswith("adv_"):
+            mid = stem[4:]
+            return ("minion", None, mid, f"minions.adv_{mid}")
+        return ("minion", None, stem, f"minions.{stem}")
+
+    # /local/pillar/<section>/<file>.sls
+    if path.parent.parent == PILLAR_LOCAL_ROOT or path.parent.parent == PILLAR_DEFAULT_ROOT:
+        section = path.parent.name
+        stem = path.stem
+        # Only soc_<section>.sls and adv_<section>.sls are SOC-managed pillar
+        # surfaces. Other files (e.g. nodes.sls, auth.sls, *.token) are
+        # either covered by EXCLUDE_PATH_FRAGMENTS or are bootstrap surfaces
+        # we leave alone for now.
+        if stem.startswith("soc_") or stem.startswith("adv_"):
+            return ("global", None, None, f"{section}.{stem}")
+        return None
+
+    return None
+
+
+def parse_yaml_file(path):
+    with open(path, "rb") as f:
+        content = f.read()
+    if not content.strip():
+        return {}
+    if is_jinja_templated(content):
+        return None
+    data = yaml.safe_load(content)
+    if data is None:
+        return {}
+    if not isinstance(data, dict):
+        return {"_raw": data}
+    return data
+
+
+def derive_node_type(minion_id):
+    """Conventional Security Onion minion ids are <host>_<role>. Take the
+    last underscore-delimited token as the canonical role suffix."""
+    parts = minion_id.rsplit("_", 1)
+    if len(parts) == 2:
+        return parts[1]
+    return None
+
+
+def docker_psql(sql, *, db=PG_DATABASE, user=PG_SUPERUSER, on_error_stop=True, capture=True):
+    """Run sql via docker exec ... psql. Returns stdout as str."""
+    args = [
+        "docker", "exec", "-i", DOCKER_CONTAINER,
+        "psql", "-U", user, "-d", db, "-tA", "-q",
+    ]
+    if on_error_stop:
+        args += ["-v", "ON_ERROR_STOP=1"]
+    proc = subprocess.run(
+        args, input=sql.encode(),
+        capture_output=capture, check=False,
+    )
+    if proc.returncode != 0:
+        sys.stderr.write(proc.stderr.decode(errors="replace"))
+        raise RuntimeError(f"docker exec psql failed (rc={proc.returncode})")
+    return proc.stdout.decode(errors="replace")
+
+
+def upsert_minion(minion_id, node_type):
+    sql = (
+        "INSERT INTO so_pillar.minion (minion_id, node_type) "
+        f"VALUES ({pg_str(minion_id)}, {pg_str(node_type) if node_type else 'NULL'}) "
+        "ON CONFLICT (minion_id) DO UPDATE SET node_type = EXCLUDED.node_type;"
+    )
+    docker_psql(sql)
+
+
+def delete_minion(minion_id):
+    """CASCADE removes pillar_entry + role_member rows."""
+    sql = f"DELETE FROM so_pillar.minion WHERE minion_id = {pg_str(minion_id)};"
+    docker_psql(sql)
+
+
+def upsert_pillar_entry(scope, role_name, minion_id, pillar_path, data, reason):
+    """Insert or update the row keyed by the partial unique index that
+    matches scope. Audit trigger handles history; versioning trigger bumps
+    version only when data changes."""
+    data_json = json.dumps(data)
+    role_sql = pg_str(role_name) if role_name else "NULL"
+    minion_sql = pg_str(minion_id) if minion_id else "NULL"
+    reason_sql = pg_str(reason)
+
+    if scope == "global":
+        conflict = "(pillar_path) WHERE scope='global'"
+    elif scope == "role":
+        conflict = "(role_name, pillar_path) WHERE scope='role'"
+    elif scope == "minion":
+        conflict = "(minion_id, pillar_path) WHERE scope='minion'"
+    else:
+        raise ValueError(f"unknown scope {scope!r}")
+
+    sql = (
+        "BEGIN;\n"
+        f"SELECT set_config('so_pillar.change_reason', {reason_sql}, true);\n"
+        f"INSERT INTO so_pillar.pillar_entry "
+        f"(scope, role_name, minion_id, pillar_path, data, change_reason) "
+        f"VALUES ({pg_str(scope)}, {role_sql}, {minion_sql}, {pg_str(pillar_path)}, {pg_jsonb(data_json)}, {reason_sql}) "
+        f"ON CONFLICT {conflict} DO UPDATE "
+        f"SET data = EXCLUDED.data, change_reason = EXCLUDED.change_reason;\n"
+        "COMMIT;\n"
+    )
+    docker_psql(sql)
+
+
+def pg_str(s):
+    """Escape a Python str for inclusion in literal SQL. Pillar content has
+    already been validated as YAML; we just need standard SQL escaping."""
+    if s is None:
+        return "NULL"
+    return "'" + str(s).replace("'", "''") + "'"
+
+
+def pg_jsonb(json_str):
+    return pg_str(json_str) + "::jsonb"
+
+
+def walk_pillar_root(root, paths):
+    if not root.is_dir():
+        return
+    for path in root.rglob("*.sls"):
+        if path.is_file():
+            paths.append(path)
+
+
+def import_minion(minion_id, node_type, dry_run, reason):
+    """Re-import every pillar file for a single minion."""
+    if not minion_id:
+        raise ValueError("minion_id required for --scope minion")
+
+    upsert_minion(minion_id, node_type)
+    log("INFO", f"Upserted minion row {minion_id} (node_type={node_type})")
+
+    targets = [
+        PILLAR_LOCAL_ROOT / "minions" / f"{minion_id}.sls",
+        PILLAR_LOCAL_ROOT / "minions" / f"adv_{minion_id}.sls",
+    ]
+    for path in targets:
+        if not path.exists():
+            log("INFO", f"  (no file at {path})")
+            continue
+        klass = classify(path)
+        if not klass:
+            log("INFO", f"  skip {path} (excluded)")
+            continue
+        scope, role, mid, pillar_path = klass
+        data = parse_yaml_file(path)
+        if data is None:
+            log("WARN", f"  skip {path} (Jinja-templated; stays disk-only)")
+            continue
+        if dry_run:
+            log("DRY", f"  would upsert {scope}/{pillar_path} = {len(json.dumps(data))} bytes")
+            continue
+        upsert_pillar_entry(scope, role, mid, pillar_path, data, reason)
+        log("INFO", f"  imported {scope}/{pillar_path}")
+
+
+def import_all(dry_run, reason):
+    """Walk the entire local pillar tree and import every eligible file."""
+    paths = []
+    walk_pillar_root(PILLAR_LOCAL_ROOT, paths)
+
+    imported = 0
+    skipped = 0
+    minions_seen = set()
+
+    for path in sorted(paths):
+        klass = classify(path)
+        if not klass:
+            skipped += 1
+            continue
+        scope, role, minion_id, pillar_path = klass
+        data = parse_yaml_file(path)
+        if data is None:
+            log("WARN", f"skip {path} (Jinja-templated; stays disk-only)")
+            skipped += 1
+            continue
+
+        if scope == "minion" and minion_id not in minions_seen:
+            node_type = derive_node_type(minion_id)
+            if not dry_run:
+                upsert_minion(minion_id, node_type)
+            minions_seen.add(minion_id)
+
+        if dry_run:
+            log("DRY", f"would upsert {scope}/{pillar_path} ({len(json.dumps(data))} bytes)")
+        else:
+            upsert_pillar_entry(scope, role, minion_id, pillar_path, data, reason)
+            log("INFO", f"imported {scope}/{pillar_path}")
+        imported += 1
+
+    log("INFO", f"done: {imported} imported, {skipped} skipped")
+
+
+def main():
+    ap = argparse.ArgumentParser(description=__doc__)
+    ap.add_argument("--scope", choices=("global", "role", "minion", "all"), default="all")
+    ap.add_argument("--minion-id")
+    ap.add_argument("--node-type", help="override node_type for --scope minion (default: derived from minion_id)")
+    ap.add_argument("--delete", action="store_true",
+                    help="With --scope minion, remove the minion row (and its pillar rows via CASCADE)")
+    ap.add_argument("--dry-run", action="store_true")
+    ap.add_argument("--diff", action="store_true",
+                    help="(reserved) print structural diffs vs current DB content")
+    ap.add_argument("--yes", action="store_true",
+                    help="Skip confirmation prompts (currently unused; reserved)")
+    ap.add_argument("--reason", default="so-pillar-import",
+                    help="change_reason recorded in pillar_entry_history")
+    args = ap.parse_args()
+
+    try:
+        if args.scope == "minion":
+            if not args.minion_id:
+                ap.error("--minion-id required when --scope minion")
+            if args.delete:
+                if args.dry_run:
+                    log("DRY", f"would delete {args.minion_id}")
+                else:
+                    delete_minion(args.minion_id)
+                    log("INFO", f"deleted {args.minion_id}")
+            else:
+                node_type = args.node_type or derive_node_type(args.minion_id)
+                import_minion(args.minion_id, node_type, args.dry_run, args.reason)
+        elif args.scope == "all":
+            import_all(args.dry_run, args.reason)
+        else:
+            log("ERROR", f"--scope {args.scope} not yet implemented; use --scope all or --scope minion")
+            return 2
+    except Exception as e:
+        log("ERROR", str(e))
+        return 1
+
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(main())
@@ -0,0 +1,54 @@
+#!/bin/bash
+
+# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
+# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
+# https://securityonion.net/license; you may not use this file except in compliance with the
+# Elastic License 2.0.
+
+# Single writer for the Telegraf Postgres credentials pillar. Thin wrapper
+# around so-yaml.py that generates a password on first add and no-ops on
+# re-add so the cred is stable across repeated so-minion runs.
+#
+# Note: so-yaml.py splits keys on '.' with no escape. SO minion ids are
+# dot-free by construction (setup/so-functions:1884 takes the short_name
+# before the first '.'), so using the raw minion id as the key is safe.
+
+CREDS=/opt/so/saltstack/local/pillar/telegraf/creds.sls
+
+usage() {
+    echo "Usage: $0 <add|remove> <minion_id>" >&2
+    exit 2
+}
+
+seed_creds_file() {
+    mkdir -p "$(dirname "$CREDS")" || return 1
+    if [[ ! -f "$CREDS" ]]; then
+        (umask 027 && printf 'telegraf:\n  postgres_creds: {}\n' > "$CREDS") || return 1
+        chown socore:socore "$CREDS" 2>/dev/null || true
+        chmod 640 "$CREDS" || return 1
+    fi
+}
+
+OP=$1
+MID=$2
+[[ -z "$OP" || -z "$MID" ]] && usage
+
+case "$OP" in
+    add)
+        SAFE=$(echo "$MID" | tr '.-' '__' | tr '[:upper:]' '[:lower:]')
+        seed_creds_file || exit 1
+        if so-yaml.py get -r "$CREDS" "telegraf.postgres_creds.${MID}.user" >/dev/null 2>&1; then
+            exit 0
+        fi
+        PASS=$(tr -dc 'A-Za-z0-9~!@#^&*()_=+[]|;:,.<>?-' < /dev/urandom | head -c 72)
+        so-yaml.py replace "$CREDS" "telegraf.postgres_creds.${MID}.user" "so_telegraf_${SAFE}" >/dev/null
+        so-yaml.py replace "$CREDS" "telegraf.postgres_creds.${MID}.pass" "$PASS" >/dev/null
+        ;;
+    remove)
+        [[ -f "$CREDS" ]] || exit 0
+        so-yaml.py remove "$CREDS" "telegraf.postgres_creds.${MID}" >/dev/null 2>&1 || true
+        ;;
+    *)
+        usage
+        ;;
+esac
@@ -13,6 +13,64 @@ import json

 lockFile = "/tmp/so-yaml.lock"

+# postsalt: so-yaml supports three backend modes for PG-managed pillar paths:
+#
+#   dual      — write disk + mirror to so_pillar.*. Reads from disk.
+#               Used during the migration transition when disk is still
+#               canonical and PG runs as a shadow.
+#   postgres  — write to so_pillar.* only. Reads from so_pillar.*. No disk
+#               file is touched. The end state once cutover is complete.
+#   disk      — disk only, no PG. Emergency rollback escape hatch.
+#
+# Bootstrap and mine-driven files (secrets.sls, ca/init.sls, */nodes.sls,
+# top.sls, etc.) are always handled on disk regardless of mode — those paths
+# are explicitly excluded by so_yaml_postgres.locate() raising SkipPath.
+#
+# Mode resolution: SO_YAML_BACKEND env var, then /opt/so/conf/so-yaml/mode,
+# then default 'dual' (safe upgrade behavior — flipping to 'postgres' is
+# done by schema_pillar.sls after the schema is in place and the importer
+# has run at least once).
+
+MODE_FILE = "/opt/so/conf/so-yaml/mode"
+VALID_MODES = ("dual", "postgres", "disk")
+DEFAULT_MODE = "dual"
+
+try:
+    sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
+    import so_yaml_postgres
+    _SO_YAML_PG_AVAILABLE = True
+except Exception as _exc:
+    _SO_YAML_PG_AVAILABLE = False
+
+
+def _resolveBackendMode():
+    env = os.environ.get("SO_YAML_BACKEND")
+    if env and env in VALID_MODES:
+        return env
+    try:
+        with open(MODE_FILE, "r") as fh:
+            value = fh.read().strip()
+        if value in VALID_MODES:
+            return value
+    except (IOError, OSError):
+        pass
+    return DEFAULT_MODE
+
+
+_BACKEND_MODE = _resolveBackendMode()
+
+
+def _isPgManaged(filename):
+    """True when so-yaml should route this file's reads/writes through
+    so_pillar.*. False for bootstrap/mine-driven files that always live on
+    disk, and for arbitrary YAML paths outside the pillar tree."""
+    if not _SO_YAML_PG_AVAILABLE:
+        return False
+    try:
+        return so_yaml_postgres.is_pg_managed(filename)
+    except Exception:
+        return False
+

 def showUsage(args):
    print('Usage: {} <COMMAND> <YAML_FILE> [ARGS...]'.format(sys.argv[0]), file=sys.stderr)
@@ -25,8 +83,14 @@ def showUsage(args):
    print('    get [-r]         - Displays (to stdout) the value stored in the given key. Requires KEY arg. Use -r for raw output without YAML formatting.', file=sys.stderr)
    print('    remove           - Removes a yaml key, if it exists. Requires KEY arg.', file=sys.stderr)
    print('    replace          - Replaces (or adds) a new key and set its value. Requires KEY and VALUE args.', file=sys.stderr)
+    print('    purge            - Delete the YAML file from disk and remove its rows from so_pillar.* (no KEY arg).', file=sys.stderr)
    print('    help             - Prints this usage information.', file=sys.stderr)
    print('', file=sys.stderr)
+    print('  Backend mode:', file=sys.stderr)
+    print('    Resolved from $SO_YAML_BACKEND, then /opt/so/conf/so-yaml/mode, default "dual".', file=sys.stderr)
+    print('    Valid values: dual | postgres | disk. Bootstrap pillar files (secrets, ca, *.nodes.sls)', file=sys.stderr)
+    print('    are always handled on disk regardless of mode.', file=sys.stderr)
+    print('', file=sys.stderr)
    print('  Where:', file=sys.stderr)
    print('   YAML_FILE          - Path to the file that will be modified. Ex: /opt/so/conf/service/conf.yaml', file=sys.stderr)
    print('   KEY                - YAML key, does not support \' or " characters at this time. Ex: level1.level2', file=sys.stderr)
@@ -39,14 +103,128 @@ def showUsage(args):


 def loadYaml(filename):
-    file = open(filename, "r")
-    content = file.read()
-    return yaml.safe_load(content)
+    """Load a YAML file's content as a dict.
+
+    PG-canonical mode (`postgres`): for PG-managed paths, read from
+    so_pillar.pillar_entry. A missing row is treated as an empty dict so
+    that `replace`/`add` on a fresh path can populate it from scratch.
+
+    Other modes / non-PG-managed paths: read from disk as today.
+    """
+    if _BACKEND_MODE == "postgres" and _isPgManaged(filename):
+        try:
+            data = so_yaml_postgres.read_yaml(filename)
+        except so_yaml_postgres.SkipPath:
+            data = None
+        except Exception as e:
+            print(f"so-yaml: pg read failed for {filename}: {e}", file=sys.stderr)
+            sys.exit(1)
+        return data if data is not None else {}
+
+    try:
+        with open(filename, "r") as file:
+            content = file.read()
+            return yaml.safe_load(content)
+    except FileNotFoundError:
+        print(f"File not found: {filename}", file=sys.stderr)
+        sys.exit(1)
+    except Exception as e:
+        print(f"Error reading file {filename}: {e}", file=sys.stderr)
+        sys.exit(1)


 def writeYaml(filename, content):
+    """Persist `content` for `filename`.
+
+    PG-canonical mode + PG-managed path: write only to so_pillar.*. A PG
+    failure is fatal (no disk fallback) — caller must retry.
+
+    Dual mode: write disk, then mirror to PG (failures are warnings).
+
+    Disk mode or non-PG-managed path: write disk only.
+    """
+    if _BACKEND_MODE == "postgres" and _isPgManaged(filename):
+        if not _SO_YAML_PG_AVAILABLE:
+            print("so-yaml: PG-canonical mode requires so_yaml_postgres module", file=sys.stderr)
+            sys.exit(1)
+        ok, msg = so_yaml_postgres.write_yaml(
+            filename, content,
+            reason="so-yaml " + " ".join(sys.argv[1:2]))
+        if not ok:
+            print(f"so-yaml: pg write failed for {filename}: {msg}", file=sys.stderr)
+            sys.exit(1)
+        return None
+
    file = open(filename, "w")
-    return yaml.safe_dump(content, file)
+    result = yaml.safe_dump(content, file)
+    file.close()
+
+    if _BACKEND_MODE == "dual":
+        _mirrorToPostgres(filename, content)
+    return result
+
+
+def _mirrorToPostgres(filename, content):
+    """Best-effort dual-write of a YAML mutation into so_pillar.*. Skips
+    files outside the PG-managed pillar surface (secrets.sls,
+    elasticsearch/nodes.sls, etc.) and silently degrades when so-postgres
+    is unreachable. Disk write is canonical in dual mode; this never
+    raises.
+
+    Only real PG failures (`pg write failed: ...`) are logged so the
+    common cases (skipped path, postgres not running) don't pollute
+    stderr."""
+    if not _SO_YAML_PG_AVAILABLE:
+        return
+    try:
+        ok, msg = so_yaml_postgres.write_yaml(filename, content,
+                                              reason="so-yaml " + " ".join(sys.argv[1:2]))
+        if not ok and msg.startswith("pg write failed"):
+            print(f"so-yaml: {msg}", file=sys.stderr)
+    except Exception as e:  # pragma: no cover — defensive: never break disk write
+        print(f"so-yaml: pg mirror exception: {e}", file=sys.stderr)
+
+
+def purgeFile(filename):
+    """Delete a YAML file from disk and remove the matching rows from
+    so_pillar.*. Idempotent — missing file/row counts as success.
+
+    PG-canonical mode + PG-managed path: PG delete is canonical. If a stale
+    disk file from the dual-write era happens to still exist, it's removed
+    too as a cleanup courtesy. PG failure is fatal in this mode.
+
+    Dual / disk modes: remove disk first; PG cleanup is best-effort."""
+    if _BACKEND_MODE == "postgres" and _isPgManaged(filename):
+        if not _SO_YAML_PG_AVAILABLE:
+            print("so-yaml: PG-canonical mode requires so_yaml_postgres module", file=sys.stderr)
+            return 1
+        ok, msg = so_yaml_postgres.purge_yaml(filename, reason="so-yaml purge")
+        if not ok:
+            print(f"so-yaml: pg purge failed for {filename}: {msg}", file=sys.stderr)
+            return 1
+        if os.path.exists(filename):
+            try:
+                os.remove(filename)
+            except Exception as e:
+                print(f"so-yaml: warn — could not remove stale disk file {filename}: {e}", file=sys.stderr)
+        return 0
+
+    if os.path.exists(filename):
+        try:
+            os.remove(filename)
+        except Exception as e:
+            print(f"Failed to remove {filename}: {e}", file=sys.stderr)
+            return 1
+
+    if _BACKEND_MODE == "dual" and _SO_YAML_PG_AVAILABLE:
+        try:
+            ok, msg = so_yaml_postgres.purge_yaml(filename,
+                                                  reason="so-yaml purge")
+            if not ok and msg.startswith("pg purge failed"):
+                print(f"so-yaml: {msg}", file=sys.stderr)
+        except Exception as e:
+            print(f"so-yaml: pg purge exception: {e}", file=sys.stderr)
+    return 0


 def appendItem(content, key, listItem):
@@ -364,6 +542,18 @@ def get(args):
    return 0


+def purge(args):
+    """purge YAML_FILE — delete the file from disk and remove the matching
+    rows from so_pillar.* in so-postgres. Used by so-minion's delete path
+    (in place of `rm -f`) so the audit log captures the deletion and
+    role_member rows get cleaned up via FK CASCADE on so_pillar.minion."""
+    if len(args) != 1:
+        print('Missing filename arg', file=sys.stderr)
+        showUsage(None)
+        return 1
+    return purgeFile(args[0])
+
+
 def main():
    args = sys.argv[1:]

@@ -381,6 +571,7 @@ def main():
        "get": get,
        "remove": remove,
        "replace": replace,
+        "purge": purge,
    }

    code = 1
@@ -973,3 +973,347 @@ class TestReplaceListObject(unittest.TestCase):

        expected = "key1:\n- id: '1'\n  status: updated\n- id: '2'\n  status: inactive\n"
        self.assertEqual(actual, expected)
+
+
+class TestLoadYaml(unittest.TestCase):
+
+    def test_load_yaml_missing_file(self):
+        with patch('sys.exit', new=MagicMock()) as sysmock:
+            with patch('sys.stderr', new=StringIO()) as mock_stderr:
+                soyaml.loadYaml("/tmp/so-yaml_test-does-not-exist.yaml")
+                sysmock.assert_called_with(1)
+                self.assertIn("File not found:", mock_stderr.getvalue())
+
+    def test_load_yaml_read_error(self):
+        with patch('sys.exit', new=MagicMock()) as sysmock:
+            with patch('sys.stderr', new=StringIO()) as mock_stderr:
+                with patch('builtins.open', side_effect=PermissionError("denied")):
+                    soyaml.loadYaml("/tmp/so-yaml_test-unreadable.yaml")
+                    sysmock.assert_called_with(1)
+                    self.assertIn("Error reading file", mock_stderr.getvalue())
+
+
+class TestPurge(unittest.TestCase):
+
+    def test_purge_missing_arg(self):
+        # showUsage calls sys.exit(1); patch it like the other tests do.
+        with patch('sys.exit', new=MagicMock()):
+            with patch('sys.stderr', new=StringIO()) as mock_stderr:
+                rc = soyaml.purge([])
+                self.assertEqual(rc, 1)
+                self.assertIn("Missing filename", mock_stderr.getvalue())
+
+    def test_purge_existing_file(self):
+        filename = "/tmp/so-yaml_test_purge.yaml"
+        with open(filename, "w") as f:
+            f.write("key: value\n")
+        # Disable PG mirror so the test doesn't shell out to docker.
+        with patch.object(soyaml, '_SO_YAML_PG_AVAILABLE', False):
+            rc = soyaml.purge([filename])
+        self.assertEqual(rc, 0)
+        import os as _os
+        self.assertFalse(_os.path.exists(filename))
+
+    def test_purge_missing_file_idempotent(self):
+        filename = "/tmp/so-yaml_test_purge_missing.yaml"
+        import os as _os
+        if _os.path.exists(filename):
+            _os.remove(filename)
+        with patch.object(soyaml, '_SO_YAML_PG_AVAILABLE', False):
+            rc = soyaml.purge([filename])
+        self.assertEqual(rc, 0)
+
+
+class TestSoYamlPostgres(unittest.TestCase):
+    """Tests the path-locator and write/purge contract of the dual-write
+    backend module without actually contacting Postgres."""
+
+    def setUp(self):
+        import importlib
+        self.mod = importlib.import_module("so_yaml_postgres")
+
+    def test_locate_global_soc(self):
+        scope, role, mid, path = self.mod.locate(
+            "/opt/so/saltstack/local/pillar/soc/soc_soc.sls")
+        self.assertEqual(scope, "global")
+        self.assertIsNone(role)
+        self.assertIsNone(mid)
+        self.assertEqual(path, "soc.soc_soc")
+
+    def test_locate_global_advanced(self):
+        scope, role, mid, path = self.mod.locate(
+            "/opt/so/saltstack/local/pillar/soc/adv_soc.sls")
+        self.assertEqual(scope, "global")
+        self.assertEqual(path, "soc.adv_soc")
+
+    def test_locate_minion(self):
+        scope, role, mid, path = self.mod.locate(
+            "/opt/so/saltstack/local/pillar/minions/h1_sensor.sls")
+        self.assertEqual(scope, "minion")
+        self.assertEqual(mid, "h1_sensor")
+        self.assertEqual(path, "minions.h1_sensor")
+
+    def test_locate_minion_advanced(self):
+        scope, role, mid, path = self.mod.locate(
+            "/opt/so/saltstack/local/pillar/minions/adv_h1_sensor.sls")
+        self.assertEqual(scope, "minion")
+        self.assertEqual(mid, "h1_sensor")
+        self.assertEqual(path, "minions.adv_h1_sensor")
+
+    def test_locate_skip_secrets(self):
+        with self.assertRaises(self.mod.SkipPath):
+            self.mod.locate("/opt/so/saltstack/local/pillar/secrets.sls")
+
+    def test_locate_skip_postgres_auth(self):
+        with self.assertRaises(self.mod.SkipPath):
+            self.mod.locate("/opt/so/saltstack/local/pillar/postgres/auth.sls")
+
+    def test_locate_skip_mine_driven(self):
+        with self.assertRaises(self.mod.SkipPath):
+            self.mod.locate("/opt/so/saltstack/local/pillar/elasticsearch/nodes.sls")
+
+    def test_locate_skip_top(self):
+        with self.assertRaises(self.mod.SkipPath):
+            self.mod.locate("/opt/so/saltstack/local/pillar/top.sls")
+
+    def test_locate_skip_unrelated(self):
+        with self.assertRaises(self.mod.SkipPath):
+            self.mod.locate("/etc/hostname")
+
+    def test_pg_str_escapes(self):
+        self.assertEqual(self.mod._pg_str("a'b"), "'a''b'")
+        self.assertEqual(self.mod._pg_str(None), "NULL")
+
+    def test_conflict_target(self):
+        self.assertIn("scope='global'", self.mod._conflict_target("global"))
+        self.assertIn("scope='role'", self.mod._conflict_target("role"))
+        self.assertIn("scope='minion'", self.mod._conflict_target("minion"))
+        with self.assertRaises(ValueError):
+            self.mod._conflict_target("bogus")
+
+    def test_write_yaml_skips_disk_only_path(self):
+        with patch.object(self.mod, '_is_enabled', return_value=True):
+            ok, msg = self.mod.write_yaml(
+                "/opt/so/saltstack/local/pillar/secrets.sls",
+                {"secrets": {"foo": "bar"}})
+        self.assertFalse(ok)
+        self.assertIn("disk-only", msg)
+
+    def test_write_yaml_unreachable(self):
+        with patch.object(self.mod, '_is_enabled', return_value=False):
+            ok, msg = self.mod.write_yaml(
+                "/opt/so/saltstack/local/pillar/soc/soc_soc.sls",
+                {"soc": {"foo": "bar"}})
+        self.assertFalse(ok)
+        self.assertEqual(msg, "postgres unreachable")
+
+    def test_is_pg_managed_true(self):
+        self.assertTrue(self.mod.is_pg_managed(
+            "/opt/so/saltstack/local/pillar/minions/h1_sensor.sls"))
+        self.assertTrue(self.mod.is_pg_managed(
+            "/opt/so/saltstack/local/pillar/soc/soc_soc.sls"))
+
+    def test_is_pg_managed_false_for_bootstrap(self):
+        self.assertFalse(self.mod.is_pg_managed(
+            "/opt/so/saltstack/local/pillar/secrets.sls"))
+        self.assertFalse(self.mod.is_pg_managed(
+            "/opt/so/saltstack/local/pillar/postgres/auth.sls"))
+        self.assertFalse(self.mod.is_pg_managed(
+            "/opt/so/saltstack/local/pillar/elasticsearch/nodes.sls"))
+
+    def test_read_yaml_unreachable(self):
+        with patch.object(self.mod, '_is_enabled', return_value=False):
+            self.assertIsNone(self.mod.read_yaml(
+                "/opt/so/saltstack/local/pillar/soc/soc_soc.sls"))
+
+    def test_read_yaml_skips_disk_only(self):
+        with patch.object(self.mod, '_is_enabled', return_value=True):
+            with self.assertRaises(self.mod.SkipPath):
+                self.mod.read_yaml(
+                    "/opt/so/saltstack/local/pillar/secrets.sls")
+
+    def test_read_yaml_returns_data(self):
+        with patch.object(self.mod, '_is_enabled', return_value=True):
+            with patch.object(self.mod, '_docker_psql',
+                              return_value='{"soc": {"foo": "bar"}}\n'):
+                data = self.mod.read_yaml(
+                    "/opt/so/saltstack/local/pillar/soc/soc_soc.sls")
+        self.assertEqual(data, {"soc": {"foo": "bar"}})
+
+    def test_read_yaml_returns_none_when_no_row(self):
+        with patch.object(self.mod, '_is_enabled', return_value=True):
+            with patch.object(self.mod, '_docker_psql', return_value=''):
+                data = self.mod.read_yaml(
+                    "/opt/so/saltstack/local/pillar/soc/soc_soc.sls")
+        self.assertIsNone(data)
+
+    def test_read_yaml_minion_query_shape(self):
+        captured = {}
+
+        def fake_psql(sql):
+            captured['sql'] = sql
+            return '{"host": {"mainip": "10.0.0.1"}}'
+
+        with patch.object(self.mod, '_is_enabled', return_value=True):
+            with patch.object(self.mod, '_docker_psql', side_effect=fake_psql):
+                data = self.mod.read_yaml(
+                    "/opt/so/saltstack/local/pillar/minions/h1_sensor.sls")
+        self.assertEqual(data, {"host": {"mainip": "10.0.0.1"}})
+        self.assertIn("scope='minion'", captured['sql'])
+        self.assertIn("'h1_sensor'", captured['sql'])
+        self.assertIn("'minions.h1_sensor'", captured['sql'])
+
+    def test_is_enabled_public_alias(self):
+        with patch.object(self.mod, '_is_enabled', return_value=True):
+            self.assertTrue(self.mod.is_enabled())
+        with patch.object(self.mod, '_is_enabled', return_value=False):
+            self.assertFalse(self.mod.is_enabled())
+
+
+class TestSoYamlBackendMode(unittest.TestCase):
+    """Tests so-yaml's backend-mode resolution and PG-canonical routing
+    for read/write/purge. The PG calls themselves are stubbed; what we're
+    asserting is that the right backend is chosen for each (mode, path)
+    combination."""
+
+    def test_resolve_mode_env_overrides_file(self):
+        with patch.dict('os.environ', {'SO_YAML_BACKEND': 'postgres'}):
+            self.assertEqual(soyaml._resolveBackendMode(), 'postgres')
+        with patch.dict('os.environ', {'SO_YAML_BACKEND': 'disk'}):
+            self.assertEqual(soyaml._resolveBackendMode(), 'disk')
+
+    def test_resolve_mode_invalid_env_falls_back(self):
+        with patch.dict('os.environ', {'SO_YAML_BACKEND': 'garbage'}, clear=False):
+            with patch('builtins.open', side_effect=IOError):
+                self.assertEqual(soyaml._resolveBackendMode(), 'dual')
+
+    def test_resolve_mode_default_dual(self):
+        env = {k: v for k, v in __import__('os').environ.items()
+               if k != 'SO_YAML_BACKEND'}
+        with patch.dict('os.environ', env, clear=True):
+            with patch('builtins.open', side_effect=IOError):
+                self.assertEqual(soyaml._resolveBackendMode(), 'dual')
+
+    def test_is_pg_managed_proxies(self):
+        with patch.object(soyaml, '_SO_YAML_PG_AVAILABLE', True):
+            self.assertTrue(soyaml._isPgManaged(
+                "/opt/so/saltstack/local/pillar/minions/h1_sensor.sls"))
+            self.assertFalse(soyaml._isPgManaged(
+                "/opt/so/saltstack/local/pillar/secrets.sls"))
+
+    def test_is_pg_managed_false_when_module_unavailable(self):
+        with patch.object(soyaml, '_SO_YAML_PG_AVAILABLE', False):
+            self.assertFalse(soyaml._isPgManaged(
+                "/opt/so/saltstack/local/pillar/minions/h1_sensor.sls"))
+
+    def test_load_yaml_postgres_mode_reads_pg(self):
+        with patch.object(soyaml, '_BACKEND_MODE', 'postgres'):
+            with patch.object(soyaml, '_SO_YAML_PG_AVAILABLE', True):
+                with patch.object(soyaml.so_yaml_postgres, 'is_pg_managed',
+                                  return_value=True):
+                    with patch.object(soyaml.so_yaml_postgres, 'read_yaml',
+                                      return_value={"a": 1}):
+                        result = soyaml.loadYaml(
+                            "/opt/so/saltstack/local/pillar/soc/soc_soc.sls")
+        self.assertEqual(result, {"a": 1})
+
+    def test_load_yaml_postgres_mode_returns_empty_when_no_row(self):
+        with patch.object(soyaml, '_BACKEND_MODE', 'postgres'):
+            with patch.object(soyaml, '_SO_YAML_PG_AVAILABLE', True):
+                with patch.object(soyaml.so_yaml_postgres, 'is_pg_managed',
+                                  return_value=True):
+                    with patch.object(soyaml.so_yaml_postgres, 'read_yaml',
+                                      return_value=None):
+                        result = soyaml.loadYaml(
+                            "/opt/so/saltstack/local/pillar/soc/soc_soc.sls")
+        self.assertEqual(result, {})
+
+    def test_load_yaml_postgres_mode_reads_disk_for_bootstrap(self):
+        import tempfile, os as _os
+        with tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) as f:
+            f.write("foo: bar\n")
+            tmp = f.name
+        try:
+            with patch.object(soyaml, '_BACKEND_MODE', 'postgres'):
+                with patch.object(soyaml, '_SO_YAML_PG_AVAILABLE', True):
+                    with patch.object(soyaml.so_yaml_postgres,
+                                      'is_pg_managed', return_value=False):
+                        result = soyaml.loadYaml(tmp)
+            self.assertEqual(result, {"foo": "bar"})
+        finally:
+            _os.unlink(tmp)
+
+    def test_write_yaml_postgres_mode_skips_disk(self):
+        import tempfile, os as _os
+        with tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) as f:
+            tmp = f.name
+        _os.unlink(tmp)
+        try:
+            with patch.object(soyaml, '_BACKEND_MODE', 'postgres'):
+                with patch.object(soyaml, '_SO_YAML_PG_AVAILABLE', True):
+                    with patch.object(soyaml.so_yaml_postgres, 'is_pg_managed',
+                                      return_value=True):
+                        with patch.object(soyaml.so_yaml_postgres, 'write_yaml',
+                                          return_value=(True, 'ok')) as mock_w:
+                            soyaml.writeYaml(tmp, {"x": 1})
+            self.assertFalse(_os.path.exists(tmp))
+            mock_w.assert_called_once()
+        finally:
+            if _os.path.exists(tmp):
+                _os.unlink(tmp)
+
+    def test_write_yaml_postgres_mode_failure_is_fatal(self):
+        with patch.object(soyaml, '_BACKEND_MODE', 'postgres'):
+            with patch.object(soyaml, '_SO_YAML_PG_AVAILABLE', True):
+                with patch.object(soyaml.so_yaml_postgres, 'is_pg_managed',
+                                  return_value=True):
+                    with patch.object(soyaml.so_yaml_postgres, 'write_yaml',
+                                      return_value=(False, 'pg write failed: connection refused')):
+                        with patch('sys.exit', new=MagicMock()) as sysmock:
+                            with patch('sys.stderr', new=StringIO()) as mock_err:
+                                soyaml.writeYaml(
+                                    "/opt/so/saltstack/local/pillar/soc/soc_soc.sls",
+                                    {"x": 1})
+        sysmock.assert_called_with(1)
+
+    def test_write_yaml_disk_mode_skips_pg(self):
+        import tempfile, os as _os
+        with tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) as f:
+            tmp = f.name
+        try:
+            with patch.object(soyaml, '_BACKEND_MODE', 'disk'):
+                with patch.object(soyaml, '_SO_YAML_PG_AVAILABLE', True):
+                    with patch.object(soyaml.so_yaml_postgres, 'write_yaml') as mock_w:
+                        soyaml.writeYaml(tmp, {"x": 1})
+            mock_w.assert_not_called()
+            with open(tmp) as f:
+                self.assertIn('x: 1', f.read())
+        finally:
+            _os.unlink(tmp)
+
+    def test_purge_postgres_mode_calls_pg_only(self):
+        import tempfile, os as _os
+        with tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) as f:
+            tmp = f.name
+        _os.unlink(tmp)
+        with patch.object(soyaml, '_BACKEND_MODE', 'postgres'):
+            with patch.object(soyaml, '_SO_YAML_PG_AVAILABLE', True):
+                with patch.object(soyaml.so_yaml_postgres, 'is_pg_managed',
+                                  return_value=True):
+                    with patch.object(soyaml.so_yaml_postgres, 'purge_yaml',
+                                      return_value=(True, 'ok')) as mock_p:
+                        rc = soyaml.purgeFile(tmp)
+        self.assertEqual(rc, 0)
+        mock_p.assert_called_once()
+
+    def test_purge_postgres_mode_failure_returns_nonzero(self):
+        with patch.object(soyaml, '_BACKEND_MODE', 'postgres'):
+            with patch.object(soyaml, '_SO_YAML_PG_AVAILABLE', True):
+                with patch.object(soyaml.so_yaml_postgres, 'is_pg_managed',
+                                  return_value=True):
+                    with patch.object(soyaml.so_yaml_postgres, 'purge_yaml',
+                                      return_value=(False, 'pg purge failed: x')):
+                        with patch('sys.stderr', new=StringIO()):
+                            rc = soyaml.purgeFile(
+                                "/opt/so/saltstack/local/pillar/minions/h1_sensor.sls")
+        self.assertEqual(rc, 1)
@@ -0,0 +1,320 @@
+# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
+# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
+# https://securityonion.net/license; you may not use this file except in compliance with the
+# Elastic License 2.0.
+
+"""
+so_yaml_postgres — Postgres-backed dual-write helpers for so-yaml.py.
+
+so-yaml.py writes YAML pillar files on disk; this module mirrors those
+writes into so_pillar.* in so-postgres so ext_pillar and the SOC
+PostgresConfigstore see the same data. During the postsalt transition
+disk is canonical; PG writes are best-effort and never fail the disk
+operation.
+
+Connection: shells out to `docker exec so-postgres psql -U postgres -d
+securityonion`. Same pattern so-pillar-import uses; avoids needing a
+separate DSN config at install time. Performance is fine because so-yaml
+is invoked from infrequent code paths (setup scripts, so-minion,
+so-firewall); SOC's hot path uses the in-process pgxpool in
+PostgresConfigstore, not so-yaml.
+
+Path-to-row mapping mirrors PostgresConfigstore.locateSetting in
+securityonion-soc:
+
+  /opt/so/saltstack/local/pillar/<section>/soc_<section>.sls
+        -> scope=global, pillar_path=<section>.soc_<section>
+  /opt/so/saltstack/local/pillar/<section>/adv_<section>.sls
+        -> scope=global, pillar_path=<section>.adv_<section>
+  /opt/so/saltstack/local/pillar/minions/<id>.sls
+        -> scope=minion, minion_id=<id>, pillar_path=minions.<id>
+  /opt/so/saltstack/local/pillar/minions/adv_<id>.sls
+        -> scope=minion, minion_id=<id>, pillar_path=minions.adv_<id>
+
+Files outside that mapping (notably secrets.sls, postgres/auth.sls,
+elasticsearch/nodes.sls, etc.) are skipped — they stay disk-only forever
+or render dynamically and don't belong in PG.
+"""
+
+import json
+import os
+import shlex
+import subprocess
+import sys
+
+DOCKER_CONTAINER = os.environ.get("SO_PILLAR_PG_CONTAINER", "so-postgres")
+PG_DATABASE = os.environ.get("SO_PILLAR_PG_DATABASE", "securityonion")
+PG_USER = os.environ.get("SO_PILLAR_PG_USER", "postgres")
+
+# File paths whose mutations stay disk-only forever. Mirrors EXCLUDE_*
+# in so-pillar-import.
+DISK_ONLY_PATHS = (
+    "/opt/so/saltstack/local/pillar/secrets.sls",
+    "/opt/so/saltstack/local/pillar/postgres/auth.sls",
+    "/opt/so/saltstack/local/pillar/elasticsearch/auth.sls",
+    "/opt/so/saltstack/local/pillar/kibana/secrets.sls",
+)
+DISK_ONLY_FRAGMENTS = (
+    "/elasticsearch/nodes.sls",
+    "/redis/nodes.sls",
+    "/kafka/nodes.sls",
+    "/hypervisor/nodes.sls",
+    "/logstash/nodes.sls",
+    "/node_data/ips.sls",
+    "/top.sls",
+)
+
+
+class SkipPath(Exception):
+    """Raised when a file path is intentionally not mirrored to PG."""
+
+
+def is_enabled():
+    """Public alias for callers that want to probe PG reachability without
+    relying on a leading-underscore private name."""
+    return _is_enabled()
+
+
+def _is_enabled():
+    """PG dual-write only fires if so-postgres is reachable. Cheap probe.
+    Returns True when docker exec succeeds, False otherwise. We never
+    want a PG hiccup to fail a disk write on a manager whose Postgres is
+    momentarily unreachable."""
+    try:
+        proc = subprocess.run(
+            ["docker", "exec", DOCKER_CONTAINER,
+             "pg_isready", "-h", "127.0.0.1", "-U", PG_USER, "-q"],
+            capture_output=True, timeout=5, check=False,
+        )
+        return proc.returncode == 0
+    except (FileNotFoundError, subprocess.TimeoutExpired, OSError):
+        return False
+
+
+def locate(path):
+    """Translate a so-yaml file path to (scope, role_name, minion_id, pillar_path).
+    Raises SkipPath when the file is not part of the PG-managed surface."""
+    norm = os.path.normpath(path)
+
+    if norm in DISK_ONLY_PATHS:
+        raise SkipPath(f"{path}: explicit disk-only allowlist")
+    for frag in DISK_ONLY_FRAGMENTS:
+        if frag in norm:
+            raise SkipPath(f"{path}: matches disk-only fragment {frag}")
+
+    parent = os.path.basename(os.path.dirname(norm))
+    grandparent = os.path.basename(os.path.dirname(os.path.dirname(norm)))
+    name = os.path.basename(norm)
+    if not name.endswith(".sls"):
+        raise SkipPath(f"{path}: not a .sls file")
+    stem = name[:-4]
+
+    if parent == "minions":
+        if stem.startswith("adv_"):
+            mid = stem[4:]
+            return ("minion", None, mid, f"minions.adv_{mid}")
+        return ("minion", None, stem, f"minions.{stem}")
+
+    # /local/pillar/<section>/<file>.sls
+    if grandparent == "pillar" and parent and parent != "":
+        if stem.startswith("soc_") or stem.startswith("adv_"):
+            return ("global", None, None, f"{parent}.{stem}")
+        raise SkipPath(f"{path}: <section>/{stem}.sls is not a soc_/adv_ file")
+
+    raise SkipPath(f"{path}: unrecognised pillar layout")
+
+
+def _pg_str(s):
+    if s is None:
+        return "NULL"
+    return "'" + str(s).replace("'", "''") + "'"
+
+
+def _docker_psql(sql):
+    """Run sql via docker exec ... psql. Returns stdout. Caller catches
+    exceptions and downgrades to a warning."""
+    proc = subprocess.run(
+        ["docker", "exec", "-i", DOCKER_CONTAINER,
+         "psql", "-U", PG_USER, "-d", PG_DATABASE,
+         "-tA", "-q", "-v", "ON_ERROR_STOP=1"],
+        input=sql.encode(), capture_output=True, check=False, timeout=30,
+    )
+    if proc.returncode != 0:
+        raise RuntimeError(proc.stderr.decode(errors="replace") or
+                           f"docker exec psql exit {proc.returncode}")
+    return proc.stdout.decode(errors="replace")
+
+
+def _conflict_target(scope):
+    if scope == "global":
+        return "(pillar_path) WHERE scope='global'"
+    if scope == "role":
+        return "(role_name, pillar_path) WHERE scope='role'"
+    if scope == "minion":
+        return "(minion_id, pillar_path) WHERE scope='minion'"
+    raise ValueError(f"unknown scope {scope!r}")
+
+
+def is_pg_managed(path):
+    """True if this path maps to a so_pillar.* row (locate() succeeds).
+    Bootstrap and mine-driven files return False — they always live on
+    disk regardless of so-yaml's backend mode."""
+    try:
+        locate(path)
+        return True
+    except SkipPath:
+        return False
+
+
+def read_yaml(path):
+    """Return the content dict stored in so_pillar.pillar_entry for `path`,
+    or None when no row exists. Raises SkipPath when `path` is not part of
+    the PG-managed surface (caller should read disk in that case).
+
+    Used by so-yaml.py PG-canonical mode so `replace`, `get`, etc. resolve
+    against the database rather than a stale (or absent) disk file."""
+    if not _is_enabled():
+        return None
+    scope, role, minion_id, pillar_path = locate(path)
+
+    if scope == "minion":
+        sql = ("SELECT data FROM so_pillar.pillar_entry "
+               "WHERE scope='minion' "
+               f"AND minion_id={_pg_str(minion_id)} "
+               f"AND pillar_path={_pg_str(pillar_path)}")
+    elif scope == "role":
+        sql = ("SELECT data FROM so_pillar.pillar_entry "
+               "WHERE scope='role' "
+               f"AND role_name={_pg_str(role)} "
+               f"AND pillar_path={_pg_str(pillar_path)}")
+    else:
+        sql = ("SELECT data FROM so_pillar.pillar_entry "
+               "WHERE scope='global' "
+               f"AND pillar_path={_pg_str(pillar_path)}")
+
+    try:
+        out = _docker_psql(sql).strip()
+    except Exception:
+        return None
+    if not out:
+        return None
+    try:
+        return json.loads(out)
+    except (ValueError, TypeError):
+        return None
+
+
+def write_yaml(path, content_dict, *, reason="so-yaml dual-write"):
+    """Mirror the disk write at `path` (whose content was just rendered as
+    `content_dict`) into so_pillar.pillar_entry. Best-effort: any failure
+    is swallowed so the caller (so-yaml.py) does not see it as a fatal."""
+    if not _is_enabled():
+        return False, "postgres unreachable"
+    try:
+        scope, role, minion_id, pillar_path = locate(path)
+    except SkipPath as e:
+        return False, str(e)
+
+    data_json = json.dumps(content_dict if content_dict is not None else {})
+    role_sql = _pg_str(role)
+    minion_sql = _pg_str(minion_id)
+    reason_sql = _pg_str(reason)
+    conflict = _conflict_target(scope)
+
+    sql_parts = []
+    if scope == "minion":
+        # FK requires the minion row before pillar_entry can reference it.
+        sql_parts.append(
+            f"INSERT INTO so_pillar.minion (minion_id) VALUES ({minion_sql}) "
+            "ON CONFLICT (minion_id) DO NOTHING;"
+        )
+    sql_parts.append(
+        "BEGIN;\n"
+        f"SELECT set_config('so_pillar.change_reason', {reason_sql}, true);\n"
+        "INSERT INTO so_pillar.pillar_entry "
+        "(scope, role_name, minion_id, pillar_path, data, change_reason) "
+        f"VALUES ({_pg_str(scope)}, {role_sql}, {minion_sql}, "
+        f"{_pg_str(pillar_path)}, {_pg_str(data_json)}::jsonb, {reason_sql}) "
+        f"ON CONFLICT {conflict} DO UPDATE "
+        "SET data = EXCLUDED.data, change_reason = EXCLUDED.change_reason;\n"
+        "COMMIT;\n"
+    )
+
+    try:
+        _docker_psql("\n".join(sql_parts))
+    except Exception as e:
+        return False, f"pg write failed: {e}"
+    return True, "ok"
+
+
+def purge_yaml(path, *, reason="so-yaml purge"):
+    """Mirror the disk file deletion at `path` by deleting the matching
+    pillar_entry rows. For minion files also deletes the so_pillar.minion
+    row (CASCADE removes pillar_entry + role_member rows)."""
+    if not _is_enabled():
+        return False, "postgres unreachable"
+    try:
+        scope, role, minion_id, pillar_path = locate(path)
+    except SkipPath as e:
+        return False, str(e)
+
+    reason_sql = _pg_str(reason)
+    parts = ["BEGIN;",
+             f"SELECT set_config('so_pillar.change_reason', {reason_sql}, true);"]
+
+    if scope == "minion":
+        # If both <id>.sls and adv_<id>.sls are gone the trigger / CASCADE
+        # cleans up role_member; otherwise we just remove this one row.
+        parts.append(
+            f"DELETE FROM so_pillar.pillar_entry "
+            f"WHERE scope='minion' AND minion_id={_pg_str(minion_id)} "
+            f"AND pillar_path={_pg_str(pillar_path)};"
+        )
+        parts.append(
+            f"DELETE FROM so_pillar.minion WHERE minion_id={_pg_str(minion_id)} "
+            "AND NOT EXISTS (SELECT 1 FROM so_pillar.pillar_entry "
+            f"WHERE minion_id={_pg_str(minion_id)});"
+        )
+    else:
+        parts.append(
+            f"DELETE FROM so_pillar.pillar_entry "
+            f"WHERE scope={_pg_str(scope)} AND pillar_path={_pg_str(pillar_path)};"
+        )
+
+    parts.append("COMMIT;")
+
+    try:
+        _docker_psql("\n".join(parts))
+    except Exception as e:
+        return False, f"pg purge failed: {e}"
+    return True, "ok"
+
+
+# CLI for diagnostics. Not exercised by so-yaml.py itself.
+def _main(argv):
+    import argparse
+    ap = argparse.ArgumentParser()
+    ap.add_argument("op", choices=("locate", "ping"))
+    ap.add_argument("path", nargs="?")
+    args = ap.parse_args(argv)
+
+    if args.op == "ping":
+        ok = _is_enabled()
+        print("ok" if ok else "unreachable")
+        return 0 if ok else 1
+    if args.op == "locate":
+        if not args.path:
+            ap.error("locate requires PATH")
+        try:
+            scope, role, minion_id, pillar_path = locate(args.path)
+            print(f"scope={scope} role={role} minion_id={minion_id} pillar_path={pillar_path}")
+            return 0
+        except SkipPath as e:
+            print(f"SKIP: {e}", file=sys.stderr)
+            return 2
+
+    return 1
+
+
+if __name__ == "__main__":
+    sys.exit(_main(sys.argv[1:]))
@@ -24,6 +24,14 @@ BACKUPTOPFILE=/opt/so/saltstack/default/salt/top.sls.backup
 SALTUPGRADED=false
 SALT_CLOUD_INSTALLED=false
 SALT_CLOUD_CONFIGURED=false
+# Check if salt-cloud is installed
+if rpm -q salt-cloud &>/dev/null; then
+  SALT_CLOUD_INSTALLED=true
+fi
+# Check if salt-cloud is configured
+if [[ -f /etc/salt/cloud.profiles.d/socloud.conf ]]; then
+  SALT_CLOUD_CONFIGURED=true
+fi
 # used to display messages to the user at the end of soup
 declare -a FINAL_MESSAGE_QUEUE=()

@@ -477,7 +485,44 @@ elasticsearch_backup_index_templates() {
  tar -czf /nsm/backup/3.0.0_elasticsearch_index_templates.tar.gz -C /opt/so/conf/elasticsearch/templates/index/ .
 }

+ensure_postgres_local_pillar() {
+  # Postgres was added as a service after 3.0.0, so the new pillar/top.sls
+  # references postgres.soc_postgres / postgres.adv_postgres unconditionally.
+  # Managers upgrading from 3.0.0 have no /opt/so/saltstack/local/pillar/postgres/
+  # (make_some_dirs only runs at install time), so the stubs must be created
+  # here before salt-master restarts against the new top.sls.
+  echo "Ensuring postgres local pillar stubs exist."
+  local dir=/opt/so/saltstack/local/pillar/postgres
+  mkdir -p "$dir"
+  [[ -f "$dir/soc_postgres.sls" ]] || touch "$dir/soc_postgres.sls"
+  [[ -f "$dir/adv_postgres.sls" ]] || touch "$dir/adv_postgres.sls"
+  chown -R socore:socore "$dir"
+}
+
+ensure_postgres_secret() {
+  # On a fresh install, generate_passwords + secrets_pillar seed
+  # secrets:postgres_pass in /opt/so/saltstack/local/pillar/secrets.sls. That
+  # code path is skipped on upgrade (secrets.sls already exists from 3.0.0
+  # with import_pass/influx_pass but no postgres_pass), so the postgres
+  # container's POSTGRES_PASSWORD_FILE and SOC's PG_ADMIN_PASS would be empty
+  # after highstate. Generate one now if missing.
+  local secrets_file=/opt/so/saltstack/local/pillar/secrets.sls
+  if [[ ! -f "$secrets_file" ]]; then
+    echo "WARNING: $secrets_file missing; skipping postgres_pass backfill."
+    return 0
+  fi
+  if so-yaml.py get -r "$secrets_file" secrets.postgres_pass >/dev/null 2>&1; then
+    echo "secrets.postgres_pass already set; leaving as-is."
+    return 0
+  fi
+  echo "Seeding secrets.postgres_pass in $secrets_file."
+  so-yaml.py add "$secrets_file" secrets.postgres_pass "$(get_random_value)"
+  chown socore:socore "$secrets_file"
+}
+
 up_to_3.1.0() {
+  ensure_postgres_local_pillar
+  ensure_postgres_secret
  determine_elastic_agent_upgrade
  elasticsearch_backup_index_templates
  # Clear existing component template state file.
@@ -489,33 +534,25 @@ up_to_3.1.0() {

 post_to_3.1.0() {
  /usr/sbin/so-kibana-space-defaults
-
-  # One-time backfill for minions that existed before the postgres Telegraf
-  # feature shipped. Generate the aggregate pillar on the manager and create
-  # the per-minion DB roles, then fan each minion's cred into its own pillar
-  # file. Going forward the reactor handles each new salt-key accept with a
-  # targeted fan-out, so a manager highstate no longer needs to iterate.
-  echo "Provisioning Telegraf Postgres users for existing minions."
-  salt-call --local state.apply postgres.auth,postgres.telegraf_users queue=True || true
-
-  AGGREGATE_PILLAR=/opt/so/saltstack/local/pillar/postgres/auth.sls
-  MINIONS_DIR=/opt/so/saltstack/local/pillar/minions
-  if [[ -f "$AGGREGATE_PILLAR" && -d "$MINIONS_DIR" ]]; then
-    for pillar_file in "$MINIONS_DIR"/*.sls; do
-      [[ -f "$pillar_file" ]] || continue
-      mid=$(basename "$pillar_file" .sls)
-      [[ "$mid" == adv_* ]] && continue
-      safe=$(echo "$mid" | tr '.-' '__' | tr '[:upper:]' '[:lower:]')
-      existing_user=$(so-yaml.py get -r "$pillar_file" postgres.telegraf.user 2>/dev/null || true)
-      [[ "$existing_user" == "so_telegraf_${safe}" ]] && continue
-      user=$(so-yaml.py get -r "$AGGREGATE_PILLAR" "postgres.auth.users.telegraf_${safe}.user" 2>/dev/null || true)
-      pass=$(so-yaml.py get -r "$AGGREGATE_PILLAR" "postgres.auth.users.telegraf_${safe}.pass" 2>/dev/null || true)
-      [[ -z "$user" || -z "$pass" ]] && continue
-      so-yaml.py replace "$pillar_file" postgres.telegraf.user "$user" >/dev/null
-      so-yaml.py replace "$pillar_file" postgres.telegraf.pass "$pass" >/dev/null
-    done
+  # ensure manager has new version of socloud.conf
+  if [[ $SALT_CLOUD_CONFIGURED == true ]]; then
+    salt-call state.apply salt.cloud.config concurrent=True
  fi

+  # Backfill the Telegraf creds pillar for every accepted minion. so-telegraf-cred
+  # add is idempotent — it no-ops when an entry already exists — so this is safe
+  # to run on every soup. The subsequent state.apply creates/updates the matching
+  # Postgres roles from the reconciled pillar.
+  echo "Reconciling Telegraf Postgres creds for accepted minions."
+  for mid in $(salt-key --out=json --list=accepted 2>/dev/null | jq -r '.minions[]?' 2>/dev/null); do
+    [[ -n "$mid" ]] || continue
+    /usr/sbin/so-telegraf-cred add "$mid" || echo "  warning: so-telegraf-cred add $mid failed" >&2
+  done
+  # Run through the master (not --local) so state compilation uses the
+  # master's configured file_roots; the manager's /etc/salt/minion has no
+  # file_roots of its own and --local would fail with "No matching sls found".
+  salt-call state.apply postgres.telegraf_users queue=True || true
+
  POSTVERSION=3.1.0
 }

@@ -689,15 +726,6 @@ upgrade_check_salt() {
 upgrade_salt() {
  echo "Performing upgrade of Salt from $INSTALLEDSALTVERSION to $NEWSALTVERSION."
  echo ""
-  # Check if salt-cloud is installed
-  if rpm -q salt-cloud &>/dev/null; then
-    SALT_CLOUD_INSTALLED=true
-  fi
-  # Check if salt-cloud is configured
-  if [[ -f /etc/salt/cloud.profiles.d/socloud.conf ]]; then
-    SALT_CLOUD_CONFIGURED=true
-  fi
-
  echo "Removing yum versionlock for Salt."
  echo ""
  yum versionlock delete "salt"
@@ -25,8 +25,33 @@ manager_run_es_soc:
        - salt: {{NEWNODE}}_update_mine
 {% endif %}

+# so-minion has already added the new minion's entry to telegraf/creds.sls
+# via so-telegraf-cred before this orch fires. Reconcile the Postgres role
+# on the manager so the new minion can authenticate on its first highstate,
+# then refresh the minion's pillar so its telegraf.conf renders with the
+# freshly-written cred.
+manager_create_postgres_telegraf_role:
+  salt.state:
+    - tgt: {{ MANAGER }}
+    - sls:
+      - postgres.telegraf_users
+    - queue: True
+    - require:
+      - salt: {{NEWNODE}}_update_mine
+
+{{NEWNODE}}_refresh_pillar:
+  salt.function:
+    - name: saltutil.refresh_pillar
+    - tgt: {{ NEWNODE }}
+    - kwarg:
+        wait: True
+    - require:
+      - salt: manager_create_postgres_telegraf_role
+
 {{NEWNODE}}_run_highstate:
  salt.state:
    - tgt: {{ NEWNODE }}
    - highstate: True
    - queue: True
+    - require:
+      - salt: {{NEWNODE}}_refresh_pillar
@@ -0,0 +1,112 @@
+# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
+# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
+# https://securityonion.net/license; you may not use this file except in compliance with the
+# Elastic License 2.0.
+
+# Driven by the so_pillar_changed reactor. Translates a so_pillar.pillar_entry
+# change into (cache.clear_pillar -> saltutil.refresh_pillar -> state.apply)
+# on the appropriate target.
+#
+# Routing rules live in the DISPATCH map below — one entry per
+# (pillar_path prefix) -> (state sls, role grain). Add new services here
+# rather than wiring more reactors.
+#
+# Idempotent: state.apply is idempotent; if the pillar value didn't actually
+# change anything observable, the affected state runs a no-op. Bulk imports
+# and replays are safe.
+
+{% set change       = salt['pillar.get']('so_pillar_change', {}) %}
+{% set scope        = change.get('scope') %}
+{% set role         = change.get('role_name') %}
+{% set minion       = change.get('minion_id') %}
+{% set changes      = change.get('changes', []) %}
+
+{# (pillar_path prefix) -> {sls: <state to apply>, role: <role grain that runs it>}
+   role is a grain value (e.g. 'so-sensor'), used to compute compound targets
+   when the change is global or role-scoped. #}
+{% set DISPATCH = {
+    'suricata.':       {'sls': 'suricata.config',     'roles': ['so-sensor', 'so-heavynode', 'so-standalone']},
+    'sensor.':         {'sls': 'suricata.config',     'roles': ['so-sensor', 'so-heavynode', 'so-standalone']},
+    'zeek.':           {'sls': 'zeek.config',         'roles': ['so-sensor', 'so-heavynode', 'so-standalone']},
+    'stenographer.':   {'sls': 'stenographer.config', 'roles': ['so-sensor', 'so-heavynode', 'so-standalone']},
+    'pcap.':           {'sls': 'pcap.config',         'roles': ['so-sensor', 'so-heavynode', 'so-standalone']},
+    'logstash.':       {'sls': 'logstash.config',     'roles': ['so-manager', 'so-managersearch', 'so-managerhype', 'so-receiver']},
+    'redis.':          {'sls': 'redis.config',        'roles': ['so-manager', 'so-managersearch', 'so-managerhype', 'so-standalone']},
+    'kafka.':          {'sls': 'kafka.config',        'roles': ['so-manager', 'so-managersearch', 'so-managerhype', 'so-receiver', 'so-searchnode']},
+    'elasticsearch.':  {'sls': 'elasticsearch.config','roles': ['so-manager', 'so-managersearch', 'so-managerhype', 'so-searchnode', 'so-heavynode', 'so-standalone']},
+    'kibana.':         {'sls': 'kibana.config',       'roles': ['so-manager', 'so-managersearch', 'so-managerhype', 'so-standalone']},
+    'soc.':            {'sls': 'soc.config',          'roles': ['so-manager', 'so-managersearch', 'so-managerhype', 'so-standalone']},
+    'telegraf.':       {'sls': 'telegraf.config',     'roles': ['*']},
+    'fleet.':          {'sls': 'fleet.config',        'roles': ['so-fleet']},
+    'strelka.':        {'sls': 'strelka.config',      'roles': ['so-sensor', 'so-heavynode', 'so-standalone']},
+} %}
+
+{# Collect a deduplicated set of (sls, target_kind) actions. target_kind is
+   either 'minion:<id>' (scope=minion) or 'roles:so-x,so-y' (scope=role/global). #}
+{% set actions = {} %}
+
+{% for c in changes %}
+{%   set path = c.get('pillar_path', '') %}
+{%   for prefix, action in DISPATCH.items() %}
+{%     if path.startswith(prefix) %}
+{%       set sls = action['sls'] %}
+{%       if scope == 'minion' and minion %}
+{%         set key = sls ~ '|minion|' ~ minion %}
+{%         set _ = actions.update({key: {'sls': sls, 'tgt': minion, 'tgt_type': 'glob'}}) %}
+{%       else %}
+{%         set role_targets = action['roles'] %}
+{%         if '*' in role_targets %}
+{%           set tgt = '*' %}
+{%           set tgt_type = 'glob' %}
+{%         else %}
+{%           set tgt = ('I@role:' ~ role_targets|join(' or I@role:')) %}
+{%           set tgt_type = 'compound' %}
+{%         endif %}
+{%         set key = sls ~ '|' ~ tgt %}
+{%         set _ = actions.update({key: {'sls': sls, 'tgt': tgt, 'tgt_type': tgt_type}}) %}
+{%       endif %}
+{%     endif %}
+{%   endfor %}
+{% endfor %}
+
+{% if actions %}
+
+{% for key, action in actions.items() %}
+{% set safe_id = loop.index0 | string %}
+
+so_pillar_reload_clear_cache_{{ safe_id }}:
+  salt.runner:
+    - name: cache.clear_pillar
+    - tgt: '{{ action.tgt }}'
+    - tgt_type: '{{ action.tgt_type }}'
+
+so_pillar_reload_refresh_pillar_{{ safe_id }}:
+  salt.function:
+    - name: saltutil.refresh_pillar
+    - tgt: '{{ action.tgt }}'
+    - tgt_type: '{{ action.tgt_type }}'
+    - kwarg:
+        wait: True
+    - require:
+      - salt: so_pillar_reload_clear_cache_{{ safe_id }}
+
+so_pillar_reload_apply_state_{{ safe_id }}:
+  salt.state:
+    - tgt: '{{ action.tgt }}'
+    - tgt_type: '{{ action.tgt_type }}'
+    - sls:
+      - {{ action.sls }}
+    - queue: True
+    - require:
+      - salt: so_pillar_reload_refresh_pillar_{{ safe_id }}
+{% endfor %}
+
+{% else %}
+
+{# No DISPATCH entry matched. Pillar still gets refreshed so any other states
+   read fresh values, but no service-specific reload is invoked. #}
+so_pillar_reload_unmapped_path_noop:
+  test.nop
+  {% do salt.log.info('orch.so_pillar_reload: no dispatch match for %s' % changes) %}
+
+{% endif %}
@@ -1,28 +0,0 @@
-# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
-# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
-# https://securityonion.net/license; you may not use this file except in compliance with the
-# Elastic License 2.0.
-
-# Fired by salt/reactor/telegraf_user_sync.sls when salt-key accepts a new
-# minion. Only provisions the per-minion pillar entry and DB role on the
-# manager; the minion itself will pick up its telegraf config on its first
-# highstate during onboarding, so there's no need to push the telegraf state
-# from here.
-#
-# Target the manager via role grains — same pattern as orch/delete_hypervisor.sls.
-# The reactor doesn't know the manager's minion id, and grains.master on the
-# runner is a hostname, not a targetable id.
-{% set FANOUT_MINION = salt['pillar.get']('postgres_fanout_minion', '') %}
-
-manager_sync_telegraf_pg_users:
-  salt.state:
-    - tgt: 'G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone or G@role:so-eval'
-    - tgt_type: compound
-    - sls:
-      - postgres.auth
-      - postgres.telegraf_users
-    - queue: True
-    {% if FANOUT_MINION %}
-    - pillar:
-        postgres_fanout_minion: {{ FANOUT_MINION }}
-    {% endif %}
@@ -13,24 +13,8 @@
  {% set CHARS = DIGITS~LOWERCASE~UPPERCASE~SYMBOLS %}
  {% set so_postgres_user_pass = salt['pillar.get']('postgres:auth:users:so_postgres_user:pass', salt['random.get_str'](72, chars=CHARS)) %}

-  {# Per-minion Telegraf Postgres credentials. Merge currently-up minions with any #}
-  {# previously-known entries in pillar so existing passwords persist across runs. #}
-  {% set existing = salt['pillar.get']('postgres:auth:users', {}) %}
-  {% set up_minions = salt['saltutil.runner']('manage.up') or [] %}
-  {% set telegraf_users = {} %}
-  {% for key, entry in existing.items() %}
-    {%- if key.startswith('telegraf_') and entry.get('user') and entry.get('pass') %}
-      {%- do telegraf_users.update({key: entry}) %}
-    {%- endif %}
-  {% endfor %}
-  {% for mid in up_minions %}
-    {%- set safe = mid | replace('.','_') | replace('-','_') | lower %}
-    {%- set key = 'telegraf_' ~ safe %}
-    {%- if key not in telegraf_users %}
-      {%- do telegraf_users.update({key: {'user': 'so_telegraf_' ~ safe, 'pass': salt['random.get_str'](72, chars=CHARS)}}) %}
-    {%- endif %}
-  {% endfor %}
-
+# Admin cred only. Per-minion Telegraf creds live in telegraf/creds.sls,
+# managed by /usr/sbin/so-telegraf-cred (called from so-minion).
 postgres_auth_pillar:
  file.managed:
    - name: /opt/so/saltstack/local/pillar/postgres/auth.sls
@@ -43,57 +27,7 @@ postgres_auth_pillar:
              so_postgres_user:
                user: so_postgres
                pass: "{{ so_postgres_user_pass }}"
-              {% for key, entry in telegraf_users.items() %}
-              {{ key }}:
-                user: {{ entry.user }}
-                pass: "{{ entry.pass }}"
-              {% endfor %}
    - show_changes: False
-
-  {# Fan a specific minion's telegraf cred out to its own pillar file.
-     Two triggers populate the target list:
-       - grains.id (always) so the manager's own pillar is populated on every
-         postgres.auth run — otherwise the manager's telegraf has no cred on
-         a fresh install and can't write to its own postgres.
-       - pillar postgres_fanout_minion (when the reactor fires on a new
-         minion's salt-key accept).
-     The `unless` guard keeps re-runs idempotent, so this is one so-yaml.py
-     check per target, not per minion in the grid. Bulk backfill for
-     already-accepted minions lives in soup. #}
-  {% set fanout_targets = [] %}
-  {% if grains.id %}
-  {%-   do fanout_targets.append(grains.id) %}
-  {% endif %}
-  {% set fanout_mid = salt['pillar.get']('postgres_fanout_minion') %}
-  {% if fanout_mid and fanout_mid not in fanout_targets %}
-  {%-   do fanout_targets.append(fanout_mid) %}
-  {% endif %}
-
-  {% for mid in fanout_targets %}
-    {%- set safe = mid | replace('.','_') | replace('-','_') | lower %}
-    {%- set key = 'telegraf_' ~ safe %}
-    {%- set entry = telegraf_users.get(key) %}
-    {%- if entry %}
-
-postgres_telegraf_minion_pillar_{{ safe }}:
-  cmd.run:
-    - name: |
-        set -e
-        PILLAR_FILE=/opt/so/saltstack/local/pillar/minions/{{ mid }}.sls
-        if [ ! -f "$PILLAR_FILE" ]; then
-          echo '{}' > "$PILLAR_FILE"
-          chown socore:socore "$PILLAR_FILE" 2>/dev/null || true
-          chmod 640 "$PILLAR_FILE"
-        fi
-        /usr/sbin/so-yaml.py replace "$PILLAR_FILE" postgres.telegraf.user '{{ entry.user }}'
-        /usr/sbin/so-yaml.py replace "$PILLAR_FILE" postgres.telegraf.pass '{{ entry.pass }}'
-    - unless: |
-        [ "$(/usr/sbin/so-yaml.py get -r /opt/so/saltstack/local/pillar/minions/{{ mid }}.sls postgres.telegraf.user 2>/dev/null)" = '{{ entry.user }}' ]
-    - require:
-      - file: postgres_auth_pillar
-
-    {%- endif %}
-  {% endfor %}
 {% else %}

 {{sls}}_state_not_allowed:
@@ -0,0 +1,124 @@
+-- so_pillar schema: queryable, versioned, audited pillar config store.
+-- Replaces flat-file Salt pillar consumed via salt.pillar.postgres ext_pillar.
+-- Idempotent. Run via salt/postgres/schema_pillar.sls inside the so-postgres container.
+
+CREATE SCHEMA IF NOT EXISTS so_pillar;
+
+CREATE TABLE IF NOT EXISTS so_pillar.scope (
+    scope_kind   text PRIMARY KEY,
+    precedence   int  NOT NULL,
+    description  text
+);
+
+INSERT INTO so_pillar.scope(scope_kind, precedence, description) VALUES
+    ('global', 100, 'Applies to every minion'),
+    ('role',   200, 'Applies to minions whose minion_id matches a top.sls compound role match'),
+    ('minion', 300, 'Applies only to a single minion (per-minion overlay)')
+ON CONFLICT (scope_kind) DO NOTHING;
+
+CREATE TABLE IF NOT EXISTS so_pillar.role (
+    role_name      text PRIMARY KEY,
+    match_kind     text NOT NULL CHECK (match_kind IN ('compound','grain','glob','list')),
+    match_expr     text NOT NULL,
+    description    text
+);
+
+CREATE TABLE IF NOT EXISTS so_pillar.minion (
+    minion_id     text PRIMARY KEY,
+    node_type     text,
+    hostname      text,
+    extra_roles   text[] NOT NULL DEFAULT '{}',
+    created_at    timestamptz NOT NULL DEFAULT now(),
+    updated_at    timestamptz NOT NULL DEFAULT now()
+);
+
+CREATE TABLE IF NOT EXISTS so_pillar.role_member (
+    role_name   text NOT NULL REFERENCES so_pillar.role(role_name) ON DELETE CASCADE,
+    minion_id   text NOT NULL REFERENCES so_pillar.minion(minion_id) ON DELETE CASCADE,
+    source      text NOT NULL DEFAULT 'computed' CHECK (source IN ('computed','manual','imported')),
+    PRIMARY KEY (role_name, minion_id)
+);
+
+CREATE INDEX IF NOT EXISTS ix_role_member_minion ON so_pillar.role_member(minion_id);
+
+-- pillar_entry holds the actual data. as_json=True ext_pillar reads `data` directly.
+CREATE TABLE IF NOT EXISTS so_pillar.pillar_entry (
+    id              bigserial PRIMARY KEY,
+    scope           text NOT NULL REFERENCES so_pillar.scope(scope_kind),
+    role_name       text REFERENCES so_pillar.role(role_name) ON DELETE CASCADE,
+    minion_id       text REFERENCES so_pillar.minion(minion_id) ON DELETE CASCADE,
+    pillar_path     text NOT NULL,
+    data            jsonb NOT NULL,
+    is_secret       boolean NOT NULL DEFAULT false,
+    sort_key        int NOT NULL DEFAULT 0,
+    version         int NOT NULL DEFAULT 1,
+    updated_at      timestamptz NOT NULL DEFAULT now(),
+    updated_by      text NOT NULL DEFAULT current_user,
+    change_reason   text,
+    CONSTRAINT pillar_entry_scope_target CHECK (
+           (scope='global' AND role_name IS NULL AND minion_id IS NULL)
+        OR (scope='role'   AND role_name IS NOT NULL AND minion_id IS NULL)
+        OR (scope='minion' AND role_name IS NULL AND minion_id IS NOT NULL)
+    ),
+    -- Reserved namespaces that MUST stay rendered from SLS (mine-driven). Nothing
+    -- under these prefixes is allowed in the database; the merge logic relies on
+    -- ext_pillar leaving these subtrees alone.
+    CONSTRAINT pillar_entry_reserved_paths CHECK (
+        pillar_path NOT LIKE 'elasticsearch.nodes%'
+        AND pillar_path NOT LIKE 'redis.nodes%'
+        AND pillar_path NOT LIKE 'kafka.nodes%'
+        AND pillar_path NOT LIKE 'hypervisor.nodes%'
+        AND pillar_path NOT LIKE 'logstash.nodes%'
+        AND pillar_path NOT LIKE 'node_data.ips%'
+    )
+);
+
+CREATE UNIQUE INDEX IF NOT EXISTS ux_pillar_entry_global ON so_pillar.pillar_entry(pillar_path)
+    WHERE scope = 'global';
+CREATE UNIQUE INDEX IF NOT EXISTS ux_pillar_entry_role ON so_pillar.pillar_entry(role_name, pillar_path)
+    WHERE scope = 'role';
+CREATE UNIQUE INDEX IF NOT EXISTS ux_pillar_entry_minion ON so_pillar.pillar_entry(minion_id, pillar_path)
+    WHERE scope = 'minion';
+
+CREATE INDEX IF NOT EXISTS ix_pillar_entry_minion_hot ON so_pillar.pillar_entry(minion_id)
+    WHERE scope = 'minion';
+CREATE INDEX IF NOT EXISTS ix_pillar_entry_role_hot ON so_pillar.pillar_entry(role_name)
+    WHERE scope = 'role';
+
+-- Append-only audit log for every change to pillar_entry. No FK to entry so DELETE
+-- history survives the row removal.
+CREATE TABLE IF NOT EXISTS so_pillar.pillar_entry_history (
+    history_id     bigserial PRIMARY KEY,
+    entry_id       bigint,
+    op             text NOT NULL CHECK (op IN ('INSERT','UPDATE','DELETE')),
+    scope          text NOT NULL,
+    role_name      text,
+    minion_id      text,
+    pillar_path    text NOT NULL,
+    old_data       jsonb,
+    new_data       jsonb,
+    is_secret      boolean,
+    version        int,
+    changed_at     timestamptz NOT NULL DEFAULT now(),
+    changed_by     text NOT NULL DEFAULT current_user,
+    change_reason  text
+);
+
+CREATE INDEX IF NOT EXISTS ix_pillar_history_entry  ON so_pillar.pillar_entry_history(entry_id, changed_at DESC);
+CREATE INDEX IF NOT EXISTS ix_pillar_history_minion ON so_pillar.pillar_entry_history(minion_id, changed_at DESC);
+CREATE INDEX IF NOT EXISTS ix_pillar_history_role   ON so_pillar.pillar_entry_history(role_name,  changed_at DESC);
+
+-- Drift watch — populated by a pg_cron job that re-renders the on-disk SLS files
+-- and compares them to pillar_entry. Cleared once cutover completes.
+CREATE TABLE IF NOT EXISTS so_pillar.drift_log (
+    id              bigserial PRIMARY KEY,
+    scope           text NOT NULL,
+    role_name       text,
+    minion_id       text,
+    pillar_path     text NOT NULL,
+    disk_data       jsonb,
+    db_data         jsonb,
+    detected_at     timestamptz NOT NULL DEFAULT now()
+);
+
+CREATE INDEX IF NOT EXISTS ix_drift_log_detected ON so_pillar.drift_log(detected_at DESC);
@@ -0,0 +1,49 @@
+-- Views consumed by the Salt master's salt.pillar.postgres ext_pillar with
+-- as_json=True. Each view exposes data ordered by (sort_key, pillar_path) so
+-- the deep-merge in ext_pillar resolves precedence deterministically.
+--
+-- ext_pillar always binds exactly one parameter to the query: (minion_id,).
+-- Master-config queries reference these views and add WHERE clauses, e.g.:
+--   SELECT data FROM so_pillar.v_pillar_role WHERE minion_id = %s
+--   SELECT data FROM so_pillar.v_pillar_minion WHERE minion_id = %s
+-- For v_pillar_global the binding is satisfied with `WHERE %s IS NOT NULL`.
+
+CREATE OR REPLACE VIEW so_pillar.v_pillar_global AS
+    SELECT pillar_path, sort_key, data
+      FROM so_pillar.pillar_entry
+     WHERE scope = 'global'
+       AND is_secret = false
+     ORDER BY sort_key, pillar_path;
+
+-- Role view exposes minion_id so the master-config WHERE clause can filter to
+-- the rows that apply to the requesting minion. JOIN to role_member fans out
+-- one row per (role assignment, pillar entry) tuple.
+CREATE OR REPLACE VIEW so_pillar.v_pillar_role AS
+    SELECT rm.minion_id,
+           pe.role_name,
+           pe.pillar_path,
+           pe.sort_key,
+           pe.data
+      FROM so_pillar.pillar_entry pe
+      JOIN so_pillar.role_member rm ON rm.role_name = pe.role_name
+     WHERE pe.scope = 'role'
+       AND pe.is_secret = false;
+
+CREATE OR REPLACE VIEW so_pillar.v_pillar_minion AS
+    SELECT minion_id,
+           pillar_path,
+           sort_key,
+           data
+      FROM so_pillar.pillar_entry
+     WHERE scope = 'minion'
+       AND is_secret = false;
+
+-- v_pillar_secrets is filled in by 004_secrets.sql once pgcrypto is available;
+-- placeholder here returns no rows so initial schema deploy succeeds even on a
+-- container that has not yet loaded pgcrypto.
+CREATE OR REPLACE VIEW so_pillar.v_pillar_secrets AS
+    SELECT NULL::text AS minion_id,
+           NULL::text AS pillar_path,
+           NULL::int  AS sort_key,
+           '{}'::jsonb AS data
+     WHERE false;
@@ -0,0 +1,120 @@
+-- Audit trigger: every INSERT/UPDATE/DELETE on so_pillar.pillar_entry writes a
+-- row to pillar_entry_history. Captures the actor (current_user), reason
+-- (passed via SET LOCAL so_pillar.change_reason), and full before/after data.
+
+CREATE OR REPLACE FUNCTION so_pillar.fn_pillar_entry_audit() RETURNS trigger
+LANGUAGE plpgsql AS $fn$
+DECLARE
+    v_reason text := current_setting('so_pillar.change_reason', true);
+BEGIN
+    IF (TG_OP = 'INSERT') THEN
+        INSERT INTO so_pillar.pillar_entry_history(
+            entry_id, op, scope, role_name, minion_id, pillar_path,
+            old_data, new_data, is_secret, version, changed_by, change_reason)
+        VALUES (NEW.id, 'INSERT', NEW.scope, NEW.role_name, NEW.minion_id, NEW.pillar_path,
+                NULL, NEW.data, NEW.is_secret, NEW.version, NEW.updated_by, v_reason);
+        RETURN NEW;
+    ELSIF (TG_OP = 'UPDATE') THEN
+        IF OLD.data IS DISTINCT FROM NEW.data
+           OR OLD.is_secret IS DISTINCT FROM NEW.is_secret THEN
+            INSERT INTO so_pillar.pillar_entry_history(
+                entry_id, op, scope, role_name, minion_id, pillar_path,
+                old_data, new_data, is_secret, version, changed_by, change_reason)
+            VALUES (NEW.id, 'UPDATE', NEW.scope, NEW.role_name, NEW.minion_id, NEW.pillar_path,
+                    OLD.data, NEW.data, NEW.is_secret, NEW.version, NEW.updated_by, v_reason);
+        END IF;
+        RETURN NEW;
+    ELSIF (TG_OP = 'DELETE') THEN
+        INSERT INTO so_pillar.pillar_entry_history(
+            entry_id, op, scope, role_name, minion_id, pillar_path,
+            old_data, new_data, is_secret, version, changed_by, change_reason)
+        VALUES (OLD.id, 'DELETE', OLD.scope, OLD.role_name, OLD.minion_id, OLD.pillar_path,
+                OLD.data, NULL, OLD.is_secret, OLD.version, current_user, v_reason);
+        RETURN OLD;
+    END IF;
+    RETURN NULL;
+END
+$fn$;
+
+DROP TRIGGER IF EXISTS pillar_entry_audit ON so_pillar.pillar_entry;
+CREATE TRIGGER pillar_entry_audit
+    AFTER INSERT OR UPDATE OR DELETE ON so_pillar.pillar_entry
+    FOR EACH ROW EXECUTE FUNCTION so_pillar.fn_pillar_entry_audit();
+
+-- updated_at + version maintenance: bump version on every UPDATE that changes data.
+CREATE OR REPLACE FUNCTION so_pillar.fn_pillar_entry_versioning() RETURNS trigger
+LANGUAGE plpgsql AS $fn$
+BEGIN
+    IF (TG_OP = 'UPDATE') THEN
+        IF OLD.data IS DISTINCT FROM NEW.data
+           OR OLD.is_secret IS DISTINCT FROM NEW.is_secret THEN
+            NEW.version := OLD.version + 1;
+            NEW.updated_at := now();
+        ELSE
+            NEW.version := OLD.version;
+            NEW.updated_at := OLD.updated_at;
+        END IF;
+    END IF;
+    RETURN NEW;
+END
+$fn$;
+
+DROP TRIGGER IF EXISTS pillar_entry_versioning ON so_pillar.pillar_entry;
+CREATE TRIGGER pillar_entry_versioning
+    BEFORE UPDATE ON so_pillar.pillar_entry
+    FOR EACH ROW EXECUTE FUNCTION so_pillar.fn_pillar_entry_versioning();
+
+-- Recompute role_member rows for a minion based on node_type.
+-- Compound matchers in pillar/top.sls are pure suffix patterns of the form
+-- '*_<rolename>' plus the special multi-role 'manager/managersearch/managerhype'
+-- bucket. node_type is split on common dashes/underscores; any token that
+-- matches a known role_name produces a role_member row.
+CREATE OR REPLACE FUNCTION so_pillar.fn_recompute_role_members(p_minion_id text)
+RETURNS void LANGUAGE plpgsql AS $fn$
+DECLARE
+    v_node_type text;
+    v_extra     text[];
+    v_role      text;
+BEGIN
+    SELECT node_type, extra_roles INTO v_node_type, v_extra
+      FROM so_pillar.minion WHERE minion_id = p_minion_id;
+
+    IF v_node_type IS NULL THEN
+        RETURN;
+    END IF;
+
+    DELETE FROM so_pillar.role_member
+     WHERE minion_id = p_minion_id AND source = 'computed';
+
+    -- Main role from node_type.
+    IF EXISTS (SELECT 1 FROM so_pillar.role WHERE role_name = lower(v_node_type)) THEN
+        INSERT INTO so_pillar.role_member(role_name, minion_id, source)
+        VALUES (lower(v_node_type), p_minion_id, 'computed')
+        ON CONFLICT DO NOTHING;
+    END IF;
+
+    -- Extra roles supplied by the importer / reactor for compound matchers
+    -- that need to apply multiple buckets (e.g. managersearch also gets the
+    -- 'manager' bucket per top.sls line 36 grouping).
+    FOREACH v_role IN ARRAY COALESCE(v_extra, '{}'::text[]) LOOP
+        IF EXISTS (SELECT 1 FROM so_pillar.role WHERE role_name = v_role) THEN
+            INSERT INTO so_pillar.role_member(role_name, minion_id, source)
+            VALUES (v_role, p_minion_id, 'computed')
+            ON CONFLICT DO NOTHING;
+        END IF;
+    END LOOP;
+END
+$fn$;
+
+CREATE OR REPLACE FUNCTION so_pillar.fn_minion_after_change() RETURNS trigger
+LANGUAGE plpgsql AS $fn$
+BEGIN
+    PERFORM so_pillar.fn_recompute_role_members(COALESCE(NEW.minion_id, OLD.minion_id));
+    RETURN COALESCE(NEW, OLD);
+END
+$fn$;
+
+DROP TRIGGER IF EXISTS minion_role_sync ON so_pillar.minion;
+CREATE TRIGGER minion_role_sync
+    AFTER INSERT OR UPDATE OF node_type, extra_roles ON so_pillar.minion
+    FOR EACH ROW EXECUTE FUNCTION so_pillar.fn_minion_after_change();
@@ -0,0 +1,130 @@
+-- pgcrypto-backed secret storage for pillar_entry rows where is_secret = true.
+-- The plaintext value is encrypted with a symmetric key held in a server-side
+-- GUC (so_pillar.master_key) which is set per-role via ALTER ROLE so the key
+-- never touches a flat file readable by Salt itself.
+
+CREATE EXTENSION IF NOT EXISTS pgcrypto WITH SCHEMA public;
+
+-- Encrypt a JSONB value using the configured master key. Stored as a JSONB
+-- envelope {"_enc": "<armored ciphertext>"} so the same column type is reused.
+CREATE OR REPLACE FUNCTION so_pillar.fn_encrypt_jsonb(p_value jsonb)
+RETURNS jsonb LANGUAGE plpgsql AS $fn$
+DECLARE
+    v_key text := current_setting('so_pillar.master_key', true);
+BEGIN
+    IF v_key IS NULL OR v_key = '' THEN
+        RAISE EXCEPTION 'so_pillar.master_key GUC not configured';
+    END IF;
+    RETURN jsonb_build_object(
+        '_enc',
+        encode(pgp_sym_encrypt(p_value::text, v_key), 'base64')
+    );
+END
+$fn$;
+
+-- Decrypt the envelope produced by fn_encrypt_jsonb. SECURITY DEFINER so callers
+-- with no direct access to pgcrypto/master_key can still pull plaintext via the
+-- v_pillar_secrets view.
+CREATE OR REPLACE FUNCTION so_pillar.fn_decrypt_jsonb(p_envelope jsonb)
+RETURNS jsonb LANGUAGE plpgsql SECURITY DEFINER AS $fn$
+DECLARE
+    v_key text := current_setting('so_pillar.master_key', true);
+    v_ct  text;
+BEGIN
+    IF v_key IS NULL OR v_key = '' THEN
+        RAISE EXCEPTION 'so_pillar.master_key GUC not configured';
+    END IF;
+    v_ct := p_envelope->>'_enc';
+    IF v_ct IS NULL THEN
+        RETURN p_envelope;          -- not encrypted; pass through
+    END IF;
+    RETURN pgp_sym_decrypt(decode(v_ct, 'base64'), v_key)::jsonb;
+END
+$fn$;
+
+REVOKE ALL ON FUNCTION so_pillar.fn_decrypt_jsonb(jsonb) FROM PUBLIC;
+
+-- Secrets view consumed by ext_pillar. Decrypts at the boundary so Salt sees
+-- plaintext JSONB. Filters the rows to those that apply to the requesting
+-- minion via current_setting, since views can't take parameters and ext_pillar
+-- can only bind one parameter per query.
+--
+-- Master-config query: SELECT data FROM so_pillar.v_pillar_secrets WHERE %s IS NOT NULL
+-- The %s satisfies the bound parameter; the view itself reads the minion_id
+-- from a session GUC set by a small wrapper function (see fn_pillar_secrets).
+CREATE OR REPLACE FUNCTION so_pillar.fn_pillar_secrets(p_minion_id text)
+RETURNS TABLE(data jsonb)
+LANGUAGE sql STABLE SECURITY DEFINER AS $fn$
+    SELECT so_pillar.fn_decrypt_jsonb(pe.data)
+      FROM so_pillar.pillar_entry pe
+     WHERE pe.is_secret = true
+       AND ( pe.scope = 'global'
+          OR (pe.scope = 'role'
+              AND pe.role_name IN (
+                  SELECT role_name FROM so_pillar.role_member
+                   WHERE minion_id = p_minion_id))
+          OR (pe.scope = 'minion' AND pe.minion_id = p_minion_id))
+     ORDER BY pe.sort_key, pe.pillar_path;
+$fn$;
+
+-- Replace the placeholder view from 002 with a parameterised version. Master
+-- config query becomes:
+--   SELECT data FROM so_pillar.fn_pillar_secrets(%s) AS s
+DROP VIEW IF EXISTS so_pillar.v_pillar_secrets;
+CREATE OR REPLACE VIEW so_pillar.v_pillar_secrets AS
+    SELECT NULL::text AS minion_id,
+           NULL::text AS pillar_path,
+           NULL::int  AS sort_key,
+           '{}'::jsonb AS data
+     WHERE false;
+COMMENT ON VIEW so_pillar.v_pillar_secrets IS
+    'Deprecated placeholder; use SELECT data FROM so_pillar.fn_pillar_secrets(minion_id) instead';
+
+-- Convenience helper for so-yaml.py and the importer to set a secret without
+-- ever exposing the master_key to the caller. SECURITY DEFINER means the
+-- caller does not need read access to so_pillar.master_key.
+CREATE OR REPLACE FUNCTION so_pillar.fn_set_secret(
+    p_scope         text,
+    p_role_name     text,
+    p_minion_id     text,
+    p_pillar_path   text,
+    p_value         jsonb,
+    p_change_reason text DEFAULT NULL
+) RETURNS bigint LANGUAGE plpgsql SECURITY DEFINER AS $fn$
+DECLARE
+    v_envelope jsonb := so_pillar.fn_encrypt_jsonb(p_value);
+    v_id       bigint;
+BEGIN
+    PERFORM set_config('so_pillar.change_reason',
+                       COALESCE(p_change_reason, 'fn_set_secret'),
+                       true);
+
+    INSERT INTO so_pillar.pillar_entry(
+        scope, role_name, minion_id, pillar_path, data, is_secret, change_reason)
+    VALUES (p_scope, p_role_name, p_minion_id, p_pillar_path, v_envelope, true, p_change_reason)
+    ON CONFLICT (pillar_path) WHERE scope='global' DO UPDATE
+       SET data = EXCLUDED.data, is_secret = true, change_reason = EXCLUDED.change_reason
+       RETURNING id INTO v_id;
+
+    IF v_id IS NULL THEN
+        UPDATE so_pillar.pillar_entry
+           SET data = v_envelope, is_secret = true, change_reason = p_change_reason
+         WHERE scope = p_scope
+           AND COALESCE(role_name,'') = COALESCE(p_role_name,'')
+           AND COALESCE(minion_id,'') = COALESCE(p_minion_id,'')
+           AND pillar_path = p_pillar_path
+         RETURNING id INTO v_id;
+
+        IF v_id IS NULL THEN
+            INSERT INTO so_pillar.pillar_entry(
+                scope, role_name, minion_id, pillar_path, data, is_secret, change_reason)
+            VALUES (p_scope, p_role_name, p_minion_id, p_pillar_path, v_envelope, true, p_change_reason)
+            RETURNING id INTO v_id;
+        END IF;
+    END IF;
+
+    RETURN v_id;
+END
+$fn$;
+
+REVOKE ALL ON FUNCTION so_pillar.fn_set_secret(text,text,text,text,jsonb,text) FROM PUBLIC;
@@ -0,0 +1,39 @@
+-- Seed the so_pillar.role table with the role buckets defined in pillar/top.sls.
+-- The match_expr column preserves the original Salt compound expression purely
+-- as documentation; PG-side membership is materialised in role_member.
+-- Idempotent: ON CONFLICT lets re-application leave existing rows untouched.
+
+INSERT INTO so_pillar.role(role_name, match_kind, match_expr, description) VALUES
+    ('manager',       'compound', '*_manager or *_managersearch or *_managerhype',
+        'Manager-class node. Includes managersearch and managerhype subtypes.'),
+    ('managersearch', 'compound', '*_managersearch',
+        'Combined manager + searchnode role.'),
+    ('managerhype',   'compound', '*_managerhype',
+        'Combined manager + hypervisor role.'),
+    ('sensor',        'compound', '*_sensor',
+        'Sensor node running zeek/suricata/strelka.'),
+    ('eval',          'compound', '*_eval',
+        'Single-node evaluation install (manager + sensor + storage on one host).'),
+    ('standalone',    'compound', '*_standalone',
+        'Single-node production install (no distributed cluster).'),
+    ('heavynode',     'compound', '*_heavynode',
+        'Distributed manager node carrying logstash + ES.'),
+    ('idh',           'compound', '*_idh',
+        'Intrusion-detection-honeypot node.'),
+    ('searchnode',    'compound', '*_searchnode',
+        'Distributed Elasticsearch search node.'),
+    ('receiver',      'compound', '*_receiver',
+        'Kafka receiver node.'),
+    ('import',        'compound', '*_import',
+        'Single-node import-only install.'),
+    ('fleet',         'compound', '*_fleet',
+        'Elastic Fleet server node.'),
+    ('hypervisor',    'compound', '*_hypervisor',
+        'Hypervisor host (libvirt). Hosts VM minions.'),
+    ('desktop',       'compound', '*_desktop',
+        'Desktop minion (no firewall/nginx pillars apply).'),
+    ('not_desktop',   'compound', '* and not *_desktop',
+        'Pseudo-role; matches every minion that is not a desktop. Used for global firewall/nginx.'),
+    ('libvirt',       'grain',    'salt-cloud:driver:libvirt',
+        'Pseudo-role; matches any minion with grain salt-cloud.driver = libvirt.')
+ON CONFLICT (role_name) DO NOTHING;
@@ -0,0 +1,106 @@
+-- Roles + Row-Level Security policies for the so_pillar schema.
+-- Three roles:
+--   so_pillar_master         — connected by salt-master ext_pillar. Read-only.
+--                              RLS forces it to skip is_secret rows; reads
+--                              encrypted secrets only via fn_pillar_secrets().
+--   so_pillar_writer         — connected by so-yaml dual-write and the SOC
+--                              PostgresConfigstore. Read+write on pillar_entry,
+--                              minion, role_member.
+--   so_pillar_secret_owner   — owns the master encryption key GUC; sole role
+--                              allowed to call fn_set_secret directly. Other
+--                              writers reach this function only via grants.
+--
+-- The existing app role so_postgres_user (created by init-users.sh) is granted
+-- INTO so_pillar_writer so SOC keeps using its existing connection but inherits
+-- pillar-write capability.
+
+DO $$
+BEGIN
+    IF NOT EXISTS (SELECT 1 FROM pg_roles WHERE rolname = 'so_pillar_master') THEN
+        CREATE ROLE so_pillar_master NOLOGIN;
+    END IF;
+    IF NOT EXISTS (SELECT 1 FROM pg_roles WHERE rolname = 'so_pillar_writer') THEN
+        CREATE ROLE so_pillar_writer NOLOGIN;
+    END IF;
+    IF NOT EXISTS (SELECT 1 FROM pg_roles WHERE rolname = 'so_pillar_secret_owner') THEN
+        CREATE ROLE so_pillar_secret_owner NOLOGIN;
+    END IF;
+END
+$$;
+
+GRANT USAGE ON SCHEMA so_pillar TO so_pillar_master, so_pillar_writer, so_pillar_secret_owner;
+
+-- Read access for ext_pillar through the views only.
+GRANT SELECT ON so_pillar.v_pillar_global,
+                so_pillar.v_pillar_role,
+                so_pillar.v_pillar_minion
+    TO so_pillar_master;
+GRANT EXECUTE ON FUNCTION so_pillar.fn_pillar_secrets(text) TO so_pillar_master;
+
+-- Engine reads + drains the change queue from the salt-master process. It
+-- needs SELECT to find unprocessed rows and UPDATE to mark them processed.
+-- The queue contains only locator metadata (no pillar data), so the master
+-- role's existing privilege footprint is unchanged in practice.
+GRANT SELECT, UPDATE ON so_pillar.change_queue TO so_pillar_master;
+GRANT USAGE ON SEQUENCE so_pillar.change_queue_id_seq TO so_pillar_master;
+-- Writer needs INSERT (the trigger runs as table owner, so this is just for
+-- direct testing / manual replays from psql).
+GRANT INSERT ON so_pillar.change_queue TO so_pillar_writer;
+
+-- Writer needs CRUD on pillar_entry/minion/role_member plus access to seed tables.
+GRANT SELECT, INSERT, UPDATE, DELETE
+    ON so_pillar.pillar_entry,
+       so_pillar.minion,
+       so_pillar.role_member
+    TO so_pillar_writer;
+GRANT SELECT ON so_pillar.role, so_pillar.scope TO so_pillar_writer;
+GRANT SELECT, INSERT, UPDATE, DELETE ON so_pillar.drift_log TO so_pillar_writer;
+GRANT USAGE, SELECT ON ALL SEQUENCES IN SCHEMA so_pillar TO so_pillar_writer;
+GRANT SELECT ON so_pillar.pillar_entry_history TO so_pillar_writer;
+
+-- Secret owner can call fn_set_secret directly; writer goes through it via the
+-- function's SECURITY DEFINER attribute, which executes as the function owner.
+GRANT EXECUTE ON FUNCTION so_pillar.fn_set_secret(text,text,text,text,jsonb,text)
+    TO so_pillar_writer, so_pillar_secret_owner;
+
+-- so_postgres_user (SOC's existing app user, created by init-users.sh) inherits
+-- writer privilege so the PostgresConfigstore in SOC can mutate pillars without
+-- a second connection pool. Inheritance is per-PG default (NOINHERIT must be
+-- explicit), so this just works.
+DO $$
+BEGIN
+    IF EXISTS (SELECT 1 FROM pg_roles WHERE rolname = current_setting('so_pillar.app_role', true))
+    THEN
+        EXECUTE format('GRANT so_pillar_writer TO %I',
+                       current_setting('so_pillar.app_role', true));
+    ELSIF EXISTS (SELECT 1 FROM pg_roles WHERE rolname = 'so_postgres_user') THEN
+        GRANT so_pillar_writer TO so_postgres_user;
+    END IF;
+END
+$$;
+
+-- RLS on pillar_entry: master sees only non-secret rows. Writer sees all
+-- (it must, to UPDATE secret rows when so-yaml replaces them). Secret rows
+-- still require fn_decrypt_jsonb to read plaintext.
+ALTER TABLE so_pillar.pillar_entry ENABLE ROW LEVEL SECURITY;
+ALTER TABLE so_pillar.pillar_entry FORCE ROW LEVEL SECURITY;
+
+DROP POLICY IF EXISTS pillar_entry_master_read    ON so_pillar.pillar_entry;
+DROP POLICY IF EXISTS pillar_entry_writer_all     ON so_pillar.pillar_entry;
+DROP POLICY IF EXISTS pillar_entry_owner_all      ON so_pillar.pillar_entry;
+
+CREATE POLICY pillar_entry_master_read ON so_pillar.pillar_entry
+    FOR SELECT TO so_pillar_master
+    USING (NOT is_secret);
+
+CREATE POLICY pillar_entry_writer_all ON so_pillar.pillar_entry
+    FOR ALL TO so_pillar_writer
+    USING (true)
+    WITH CHECK (true);
+
+CREATE POLICY pillar_entry_owner_all ON so_pillar.pillar_entry
+    FOR ALL TO so_pillar_secret_owner
+    USING (true)
+    WITH CHECK (true);
+
+-- minion / role_member do not need RLS — they hold no secrets.
@@ -0,0 +1,43 @@
+-- Drift detection + retention via pg_cron. Optional — the schema_pillar.sls
+-- state guards this file behind the postgres:so_pillar:drift_check_enabled
+-- pillar flag because pg_cron may not be loaded on every install.
+
+CREATE EXTENSION IF NOT EXISTS pg_cron;
+
+-- Retention: trim pillar_entry_history older than a year. Adjustable via the
+-- so_pillar.history_retention_days GUC (default 365 if unset).
+CREATE OR REPLACE FUNCTION so_pillar.fn_history_retain()
+RETURNS void LANGUAGE plpgsql AS $fn$
+DECLARE
+    v_days int := COALESCE(current_setting('so_pillar.history_retention_days', true)::int, 365);
+BEGIN
+    DELETE FROM so_pillar.pillar_entry_history
+     WHERE changed_at < (now() - (v_days::text || ' days')::interval);
+END
+$fn$;
+
+-- Drift retention: keep two weeks of drift_log.
+CREATE OR REPLACE FUNCTION so_pillar.fn_drift_retain()
+RETURNS void LANGUAGE plpgsql AS $fn$
+BEGIN
+    DELETE FROM so_pillar.drift_log
+     WHERE detected_at < (now() - interval '14 days');
+END
+$fn$;
+
+-- pg_cron schedules (idempotent — unschedule any existing same-named job first).
+DO $$
+DECLARE
+    v_jobid bigint;
+BEGIN
+    SELECT jobid INTO v_jobid FROM cron.job WHERE jobname = 'so_pillar_history_retain';
+    IF v_jobid IS NOT NULL THEN PERFORM cron.unschedule(v_jobid); END IF;
+    PERFORM cron.schedule('so_pillar_history_retain', '15 3 * * *',
+                          'SELECT so_pillar.fn_history_retain();');
+
+    SELECT jobid INTO v_jobid FROM cron.job WHERE jobname = 'so_pillar_drift_retain';
+    IF v_jobid IS NOT NULL THEN PERFORM cron.unschedule(v_jobid); END IF;
+    PERFORM cron.schedule('so_pillar_drift_retain', '20 3 * * *',
+                          'SELECT so_pillar.fn_drift_retain();');
+END
+$$;
@@ -0,0 +1,77 @@
+-- pg_notify-driven change fan-out for so_pillar.pillar_entry.
+--
+-- Two layers:
+--   1. so_pillar.change_queue          — durable, drained by the salt-master
+--                                        engine. Survives engine downtime,
+--                                        de-duplicated by id, processed once.
+--   2. pg_notify('so_pillar_change')   — wakeup signal. Payload is the
+--                                        change_queue row id and locator
+--                                        (no secret data — channels are
+--                                        snoopable by anyone with LISTEN).
+--
+-- The salt-master engine LISTENs on the channel for low-latency wakeup,
+-- then SELECTs unprocessed change_queue rows so a missed notification
+-- (engine restart, network blip) self-heals on the next event.
+
+CREATE TABLE IF NOT EXISTS so_pillar.change_queue (
+    id            bigserial PRIMARY KEY,
+    scope         text        NOT NULL,
+    role_name     text,
+    minion_id     text,
+    pillar_path   text        NOT NULL,
+    op            text        NOT NULL CHECK (op IN ('INSERT','UPDATE','DELETE')),
+    enqueued_at   timestamptz NOT NULL DEFAULT now(),
+    processed_at  timestamptz
+);
+
+-- Hot index for the engine's drain query.
+CREATE INDEX IF NOT EXISTS ix_change_queue_unprocessed
+    ON so_pillar.change_queue (id)
+    WHERE processed_at IS NULL;
+
+-- Retention index: pg_cron job in 007 sweeps processed rows older than 7d.
+CREATE INDEX IF NOT EXISTS ix_change_queue_processed_at
+    ON so_pillar.change_queue (processed_at)
+    WHERE processed_at IS NOT NULL;
+
+CREATE OR REPLACE FUNCTION so_pillar.fn_pillar_entry_notify()
+    RETURNS trigger
+    LANGUAGE plpgsql
+AS $$
+DECLARE
+    v_row record;
+    v_id  bigint;
+BEGIN
+    IF TG_OP = 'DELETE' THEN
+        v_row := OLD;
+    ELSE
+        v_row := NEW;
+    END IF;
+
+    INSERT INTO so_pillar.change_queue
+        (scope, role_name, minion_id, pillar_path, op)
+    VALUES
+        (v_row.scope, v_row.role_name, v_row.minion_id, v_row.pillar_path, TG_OP)
+    RETURNING id INTO v_id;
+
+    -- Payload is the queue id + locator only. Engine joins back to
+    -- pillar_entry if it needs the data — keeps secrets off the wire.
+    PERFORM pg_notify('so_pillar_change', json_build_object(
+        'queue_id',    v_id,
+        'scope',       v_row.scope,
+        'role_name',   v_row.role_name,
+        'minion_id',   v_row.minion_id,
+        'pillar_path', v_row.pillar_path,
+        'op',          TG_OP
+    )::text);
+
+    RETURN NULL;
+END;
+$$;
+
+DROP TRIGGER IF EXISTS tg_pillar_entry_notify ON so_pillar.pillar_entry;
+CREATE TRIGGER tg_pillar_entry_notify
+    AFTER INSERT OR UPDATE OR DELETE
+    ON so_pillar.pillar_entry
+    FOR EACH ROW
+    EXECUTE FUNCTION so_pillar.fn_pillar_entry_notify();
@@ -8,6 +8,7 @@
 include:
 {% if PGMERGED.enabled %}
  - postgres.enabled
+  - postgres.schema_pillar
 {% else %}
  - postgres.disabled
 {% endif %}
@@ -0,0 +1,140 @@
+# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
+# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
+# https://securityonion.net/license; you may not use this file except in compliance with the
+# Elastic License 2.0.
+
+{% from 'allowed_states.map.jinja' import allowed_states %}
+{% if sls.split('.')[0] in allowed_states %}
+{%   from 'vars/globals.map.jinja' import GLOBALS %}
+
+# Deploys the so_pillar schema (tables, views, audit triggers, secrets,
+# RLS, pg_cron retention) inside the so-postgres container. Idempotent —
+# every CREATE / GRANT is wrapped in IF NOT EXISTS / ON CONFLICT or DO
+# blocks so re-running the state is a no-op when the schema is current.
+#
+# Gated on the postgres:so_pillar:enabled feature flag (default false).
+# Flip to true once the postsalt branch is ready to bring ext_pillar live.
+
+include:
+  - postgres.enabled
+
+{% set so_pillar_enabled = salt['pillar.get']('postgres:so_pillar:enabled', False) %}
+{% if so_pillar_enabled %}
+
+{% set drift_enabled = salt['pillar.get']('postgres:so_pillar:drift_check_enabled', False) %}
+{% set schema_dir = '/opt/so/saltstack/default/salt/postgres/files/schema/pillar' %}
+
+# Wait for postgres to actually accept TCP connections. Same idiom as
+# telegraf_users.sls. The docker_container.running state returns earlier than
+# the database is ready on first init.
+so_pillar_postgres_wait_ready:
+  cmd.run:
+    - name: |
+        for i in $(seq 1 60); do
+          if docker exec so-postgres pg_isready -h 127.0.0.1 -U postgres -q 2>/dev/null; then
+            exit 0
+          fi
+          sleep 2
+        done
+        echo "so-postgres did not accept TCP connections within 120s" >&2
+        exit 1
+    - require:
+      - docker_container: so-postgres
+
+{% set sql_files = [
+     '001_schema.sql',
+     '002_views.sql',
+     '003_history_trigger.sql',
+     '004_secrets.sql',
+     '005_seed_roles.sql',
+     '006_rls.sql',
+] %}
+
+{% if drift_enabled %}
+{%   do sql_files.append('007_drift_pgcron.sql') %}
+{% endif %}
+
+# 008 always applies — pg_notify-driven change fan-out is what the salt-master
+# pg_notify_pillar engine consumes. Without it reactor wiring sees no events.
+{% do sql_files.append('008_change_notify.sql') %}
+
+{% for sql_file in sql_files %}
+so_pillar_apply_{{ sql_file | replace('.', '_') }}:
+  cmd.run:
+    - name: |
+        docker exec -i so-postgres psql -v ON_ERROR_STOP=1 -U postgres -d securityonion \
+          < {{ schema_dir }}/{{ sql_file }}
+    - require:
+      - cmd: so_pillar_postgres_wait_ready
+{%   if not loop.first %}
+      - cmd: so_pillar_apply_{{ sql_files[loop.index0 - 1] | replace('.', '_') }}
+{%   endif %}
+{% endfor %}
+
+# Set the master encryption key GUC on the secret-owner role. The key itself
+# is generated by setup/so-functions::secrets_pillar() (extended for postsalt)
+# and lives in /opt/so/conf/postgres/so_pillar.key (mode 0400) — never read by
+# Salt itself; the value flows into PG via ALTER ROLE so it sits only in the
+# server's role catalog.
+so_pillar_master_key_configure:
+  cmd.run:
+    - name: |
+        if [ -r /opt/so/conf/postgres/so_pillar.key ]; then
+          KEY="$(< /opt/so/conf/postgres/so_pillar.key)"
+          docker exec -i so-postgres psql -v ON_ERROR_STOP=1 -U postgres -d securityonion <<EOSQL
+        ALTER ROLE so_pillar_secret_owner SET so_pillar.master_key = '$KEY';
+        ALTER ROLE so_pillar_master       SET so_pillar.master_key = '$KEY';
+        ALTER ROLE so_pillar_writer       SET so_pillar.master_key = '$KEY';
+        EOSQL
+        else
+          echo "so_pillar.key not present yet; setup/so-functions must generate it before schema_pillar.sls" >&2
+          exit 1
+        fi
+    - require:
+      - cmd: so_pillar_apply_{{ sql_files[-1] | replace('.', '_') }}
+
+# Run the importer once after the schema is in place. Idempotent — re-runs
+# with no SLS edits produce zero row changes.
+so_pillar_initial_import:
+  cmd.run:
+    - name: /usr/sbin/so-pillar-import --yes --reason 'schema_pillar.sls initial import'
+    - require:
+      - cmd: so_pillar_master_key_configure
+
+# Flip so-yaml from dual-write to PG-canonical for managed paths now that
+# the schema and importer are both in place. Bootstrap files (secrets.sls,
+# postgres/auth.sls, ca/init.sls, *.nodes.sls, top.sls, ...) remain on disk
+# regardless because so_yaml_postgres.locate() raises SkipPath for them.
+so_pillar_so_yaml_mode_dir:
+  file.directory:
+    - name: /opt/so/conf/so-yaml
+    - user: socore
+    - group: socore
+    - mode: '0755'
+    - makedirs: True
+
+so_pillar_so_yaml_mode_postgres:
+  file.managed:
+    - name: /opt/so/conf/so-yaml/mode
+    - contents: postgres
+    - user: socore
+    - group: socore
+    - mode: '0644'
+    - require:
+      - file: so_pillar_so_yaml_mode_dir
+      - cmd: so_pillar_initial_import
+
+{% else %}
+
+so_pillar_disabled_noop:
+  test.nop
+
+{% endif %}
+
+{% else %}
+
+{{sls}}_state_not_allowed:
+  test.fail_without_changes:
+    - name: {{sls}}_state_not_allowed
+
+{% endif %}
@@ -10,7 +10,7 @@

 {# postgres_wait_ready below requires `docker_container: so-postgres`, which is
   declared in postgres.enabled. Include it here so state.apply postgres.telegraf_users
-   on its own (from the reactor orch or from soup) still has that ID in scope. Salt
+   on its own (e.g. from orch.deploy_newnode) still has that ID in scope. Salt
   de-duplicates the circular include. #}
 include:
  - postgres.enabled
@@ -96,9 +96,9 @@ postgres_telegraf_group_role:
    - require:
      - cmd: postgres_create_telegraf_db

-{%   set users = salt['pillar.get']('postgres:auth:users', {}) %}
-{%   for key, entry in users.items() %}
-{%     if key.startswith('telegraf_') and entry.get('user') and entry.get('pass') %}
+{%   set creds = salt['pillar.get']('telegraf:postgres_creds', {}) %}
+{%   for mid, entry in creds.items() %}
+{%     if entry.get('user') and entry.get('pass') %}
 {%       set u = entry.user %}
 {%       set p = entry.pass | replace("'", "''") %}

@@ -0,0 +1,27 @@
+# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
+# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
+# https://securityonion.net/license; you may not use this file except in compliance with the
+# Elastic License 2.0.
+
+# Fires for every event tagged 'so/pillar/changed'. Source of those events
+# is the pg_notify_pillar engine on the salt-master, which in turn drains
+# so_pillar.change_queue (populated by the AFTER trigger on
+# so_pillar.pillar_entry — see 008_change_notify.sql).
+#
+# All routing logic — which pillar paths reload which services on which
+# targets — lives in orch.so_pillar_reload so it stays editable as one
+# YAML table without touching reactor wiring.
+
+{% set payload = data.get('data', {}) %}
+{% do salt.log.info('so_pillar_changed reactor: %s' % payload) %}
+
+so_pillar_dispatch_reload:
+  runner.state.orchestrate:
+    - args:
+      - mods: orch.so_pillar_reload
+      - pillar:
+          so_pillar_change:
+            scope: {{ payload.get('scope') | json }}
+            role_name: {{ payload.get('role_name') | json }}
+            minion_id: {{ payload.get('minion_id') | json }}
+            changes: {{ payload.get('changes', []) | json }}
@@ -6,39 +6,74 @@
 # Elastic License 2.0.

 import logging
-from subprocess import call
-import yaml
+import os
+import re
+import shlex
+import subprocess

 log = logging.getLogger(__name__)

+SO_MINION = '/usr/sbin/so-minion'
+
+_NODETYPE_RE = re.compile(r'^[A-Z][A-Z0-9_]{0,31}$')
+_MINIONID_RE = re.compile(r'^[A-Za-z0-9._-]{1,253}$')
+_HOSTPART_RE = re.compile(r'^[A-Za-z0-9._-]{1,253}$')
+_IPV4_RE = re.compile(
+    r'^(?:(?:25[0-5]|2[0-4]\d|[01]?\d?\d)\.){3}'
+    r'(?:25[0-5]|2[0-4]\d|[01]?\d?\d)$'
+)
+_HEAP_RE = re.compile(r'^\d{1,6}[kKmMgG]?$')
+
+
+def _check(name, value, pattern):
+  s = str(value)
+  if not pattern.match(s):
+    raise ValueError("sominion_setup_reactor: refusing unsafe %s=%r" % (name, value))
+  return s
+
+
 def run():
  log.info('sominion_setup_reactor: Running')
  minionid = data['id']
  DATA = data['data']
-  hv_name = DATA['HYPERVISOR_HOST']
  log.info('sominion_setup_reactor: DATA: %s' % DATA)

-  # Build the base command
-  cmd = "NODETYPE=" + DATA['NODETYPE'] + " /usr/sbin/so-minion -o=addVM -m=" + minionid + " -n=" + DATA['MNIC'] + " -i=" + DATA['MAINIP'] + " -c=" + str(DATA['CPUCORES']) + " -d='" + DATA['NODE_DESCRIPTION'] + "'"
-  
-  # Add optional arguments only if they exist in DATA
+  nodetype = _check('NODETYPE', DATA['NODETYPE'], _NODETYPE_RE)
+
+  argv = [
+    SO_MINION,
+    '-o=addVM',
+    '-m=' + _check('minionid', minionid,        _MINIONID_RE),
+    '-n=' + _check('MNIC',     DATA['MNIC'],    _HOSTPART_RE),
+    '-i=' + _check('MAINIP',   DATA['MAINIP'],  _IPV4_RE),
+    '-c=' + str(int(DATA['CPUCORES'])),
+    '-d=' + str(DATA['NODE_DESCRIPTION']),
+  ]
+
  if 'CORECOUNT' in DATA:
-    cmd += " -C=" + str(DATA['CORECOUNT'])
-    
+    argv.append('-C=' + str(int(DATA['CORECOUNT'])))
+
  if 'INTERFACE' in DATA:
-    cmd += " -a=" + DATA['INTERFACE']
-  
+    argv.append('-a=' + _check('INTERFACE', DATA['INTERFACE'], _HOSTPART_RE))
+
  if 'ES_HEAP_SIZE' in DATA:
-    cmd += " -e=" + DATA['ES_HEAP_SIZE']
-  
+    argv.append('-e=' + _check('ES_HEAP_SIZE', DATA['ES_HEAP_SIZE'], _HEAP_RE))
+
  if 'LS_HEAP_SIZE' in DATA:
-    cmd += " -l=" + DATA['LS_HEAP_SIZE']
+    argv.append('-l=' + _check('LS_HEAP_SIZE', DATA['LS_HEAP_SIZE'], _HEAP_RE))

  if 'LSHOSTNAME' in DATA:
-    cmd += " -L=" + DATA['LSHOSTNAME']
-  
-  log.info('sominion_setup_reactor: Command: %s' % cmd)
-  rc = call(cmd, shell=True)
+    argv.append('-L=' + _check('LSHOSTNAME', DATA['LSHOSTNAME'], _HOSTPART_RE))
+
+  env = os.environ.copy()
+  env['NODETYPE'] = nodetype
+
+  log.info(
+    'sominion_setup_reactor: argv: %s (NODETYPE=%s)',
+    ' '.join(shlex.quote(a) for a in argv),
+    shlex.quote(nodetype),
+  )
+  rc = subprocess.call(argv, shell=False, env=env)

  log.info('sominion_setup_reactor: rc: %s' % rc)

@@ -1,18 +0,0 @@
-# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
-# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
-# https://securityonion.net/license; you may not use this file except in compliance with the
-# Elastic License 2.0.
-
-{# Fires on salt/key. Only act on successful key acceptance — not reauth. #}
-{% if data.get('act') == 'accept' and data.get('result') == True and data.get('id') %}
-
-{{ data['id'] }}_telegraf_pg_sync:
-  runner.state.orchestrate:
-    - args:
-      - mods: orch.telegraf_postgres_sync
-      - pillar:
-          postgres_fanout_minion: {{ data['id'] }}
-
-{% do salt.log.info('telegraf_user_sync reactor: syncing telegraf PG user for minion %s' % data['id']) %}
-
-{% endif %}
@@ -27,6 +27,7 @@ sool9_{{host}}:
    log_file: /opt/so/log/salt/minion
  grains:
    hypervisor_host: {{host ~ "_" ~ role}}
+    sosmodel: HVGUEST
  preflight_cmds:
    - |
      {%- set hostnames = [MANAGERHOSTNAME] %}
@@ -0,0 +1,200 @@
+# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
+# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
+# https://securityonion.net/license; you may not use this file except in compliance with the
+# Elastic License 2.0.
+
+# -*- coding: utf-8 -*-
+
+"""
+pg_notify_pillar — Salt master engine that bridges so_pillar.change_queue
+into the Salt event bus.
+
+Architecture (see 008_change_notify.sql):
+  pillar_entry  -- AFTER trigger -->  change_queue (durable)
+                                       + pg_notify('so_pillar_change') (wakeup)
+                                                                |
+                                       LISTEN <-- this engine <-+
+                                       SELECT/UPDATE change_queue
+                                                                |
+                                       fire_event('so/pillar/changed', ...)
+                                                                |
+                                       reactor matches tag --> orch
+
+Why a queue + notify rather than just notify: pg_notify is fire-and-forget
+within a session. If the engine is down or the LISTEN connection is broken
+when a write happens, the notification is lost forever. The change_queue
+lets us recover — on (re)connect, we drain everything still flagged
+processed_at IS NULL.
+
+Debounce: bulk operations (so-pillar-import, fresh installs) can fire
+hundreds of notifications per second. The engine collects whatever lands in
+a short window and emits one event per (scope, role, minion) tuple so the
+reactor isn't stampeded.
+"""
+
+import json
+import logging
+import os
+import select
+import time
+
+import salt.utils.event
+
+log = logging.getLogger(__name__)
+
+__virtualname__ = 'pg_notify_pillar'
+
+DEFAULT_CHANNEL = 'so_pillar_change'
+DEFAULT_DEBOUNCE_MS = 500
+DEFAULT_RECONNECT_BACKOFF = 5
+DEFAULT_BACKLOG_INTERVAL = 30
+DEFAULT_BATCH_LIMIT = 500
+
+EVENT_TAG = 'so/pillar/changed'
+
+
+def __virtual__():
+    try:
+        import psycopg2  # noqa: F401
+        return __virtualname__
+    except ImportError:
+        return False, 'pg_notify_pillar engine requires psycopg2'
+
+
+def start(dsn=None,
+          host='127.0.0.1',
+          port=5432,
+          dbname='securityonion',
+          user='so_pillar_master',
+          password=None,
+          channel=DEFAULT_CHANNEL,
+          debounce_ms=DEFAULT_DEBOUNCE_MS,
+          reconnect_backoff=DEFAULT_RECONNECT_BACKOFF,
+          backlog_interval=DEFAULT_BACKLOG_INTERVAL,
+          batch_limit=DEFAULT_BATCH_LIMIT,
+          password_file=None):
+    """
+    Run the change-queue bridge until the master shuts the engine down.
+
+    Either pass a full ``dsn`` string, or supply discrete kwargs. The
+    password may also be read from ``password_file`` (mode 0400) so the
+    engine config in ``/etc/salt/master.d/`` doesn't have to embed it
+    inline — only the file path.
+    """
+    import psycopg2
+    import psycopg2.extensions
+
+    if dsn is None:
+        if password is None and password_file:
+            try:
+                with open(password_file, 'r') as fh:
+                    password = fh.read().strip()
+            except (IOError, OSError) as exc:
+                log.error('pg_notify_pillar: cannot read password_file %s: %s',
+                          password_file, exc)
+                return
+        dsn = _build_dsn(host=host, port=port, dbname=dbname,
+                         user=user, password=password)
+
+    bus = salt.utils.event.get_master_event(
+        __opts__, __opts__['sock_dir'], listen=False)
+
+    log.info('pg_notify_pillar: starting (channel=%s debounce=%dms)',
+             channel, debounce_ms)
+
+    while True:
+        conn = None
+        try:
+            conn = psycopg2.connect(dsn)
+            conn.set_isolation_level(
+                psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT)
+            cur = conn.cursor()
+            cur.execute('LISTEN {0}'.format(channel))
+            log.info('pg_notify_pillar: connected; LISTEN %s', channel)
+
+            _drain(cur, bus, batch_limit)
+
+            while True:
+                ready, _, _ = select.select([conn], [], [], backlog_interval)
+                if not ready:
+                    _drain(cur, bus, batch_limit)
+                    continue
+
+                conn.poll()
+                _consume_notifies(conn)
+
+                if debounce_ms > 0:
+                    time.sleep(debounce_ms / 1000.0)
+                    conn.poll()
+                    _consume_notifies(conn)
+
+                _drain(cur, bus, batch_limit)
+
+        except Exception as exc:  # psycopg2.Error subclasses + OS errors
+            log.error('pg_notify_pillar: %s; reconnecting in %ds',
+                      exc, reconnect_backoff)
+        finally:
+            if conn is not None:
+                try:
+                    conn.close()
+                except Exception:
+                    pass
+        time.sleep(reconnect_backoff)
+
+
+def _build_dsn(host, port, dbname, user, password):
+    parts = ['host={0}'.format(host),
+             'port={0}'.format(port),
+             'dbname={0}'.format(dbname),
+             'user={0}'.format(user)]
+    if password:
+        parts.append('password={0}'.format(password))
+    return ' '.join(parts)
+
+
+def _consume_notifies(conn):
+    # We don't use the payload directly — the queue table is the source of
+    # truth, and draining it covers any notifications we missed. So just
+    # discard them; their presence already proved there's something to drain.
+    while conn.notifies:
+        conn.notifies.pop(0)
+
+
+def _drain(cur, bus, batch_limit):
+    """Mark unprocessed change_queue rows processed and emit one event per
+    (scope, role_name, minion_id) group. SKIP LOCKED so multiple masters
+    sharing a Postgres don't double-process."""
+    cur.execute("""
+        UPDATE so_pillar.change_queue
+           SET processed_at = now()
+         WHERE id IN (
+             SELECT id FROM so_pillar.change_queue
+              WHERE processed_at IS NULL
+              ORDER BY id
+              FOR UPDATE SKIP LOCKED
+              LIMIT %s)
+        RETURNING id, scope, role_name, minion_id, pillar_path, op
+    """, (batch_limit,))
+    rows = cur.fetchall()
+    if not rows:
+        return
+
+    groups = {}
+    for row_id, scope, role_name, minion_id, pillar_path, op in rows:
+        key = (scope, role_name, minion_id)
+        groups.setdefault(key, []).append({
+            'queue_id':    row_id,
+            'pillar_path': pillar_path,
+            'op':          op,
+        })
+
+    for (scope, role_name, minion_id), changes in groups.items():
+        payload = {
+            'scope':     scope,
+            'role_name': role_name,
+            'minion_id': minion_id,
+            'changes':   changes,
+        }
+        log.debug('pg_notify_pillar: firing %s for %s',
+                  EVENT_TAG, payload)
+        bus.fire_event(payload, EVENT_TAG)
@@ -14,6 +14,8 @@

 include:
  - salt.minion
+  - salt.master.ext_pillar_postgres
+  - salt.master.pg_notify_pillar_engine
 {%   if 'vrt' in salt['pillar.get']('features', []) %}
  - salt.cloud
  - salt.cloud.reactor_config_hypervisor
@@ -62,19 +64,6 @@ engines_config:
    - name: /etc/salt/master.d/engines.conf
    - source: salt://salt/files/engines.conf

-reactor_config_telegraf:
-  file.managed:
-    - name: /etc/salt/master.d/reactor_telegraf.conf
-    - contents: |
-        reactor:
-          - 'salt/key':
-            - /opt/so/saltstack/default/salt/reactor/telegraf_user_sync.sls
-    - user: root
-    - group: root
-    - mode: 644
-    - watch_in:
-      - service: salt_master_service
-
 # update the bootstrap script when used for salt-cloud
 salt_bootstrap_cloud:
  file.managed:
@@ -0,0 +1,46 @@
+# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
+# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
+# https://securityonion.net/license; you may not use this file except in compliance with the
+# Elastic License 2.0.
+
+# Drops /etc/salt/master.d/ext_pillar_postgres.conf so the salt-master loads
+# pillar overlays from the so_pillar.* schema in so-postgres alongside the
+# on-disk SLS pillar tree. Gated on the postgres:so_pillar:enabled feature
+# flag (default false) so the file only appears once the schema is deployed
+# and the importer has run at least once.
+
+{% from 'allowed_states.map.jinja' import allowed_states %}
+{% if sls.split('.')[0] in allowed_states %}
+
+{% if salt['pillar.get']('postgres:so_pillar:enabled', False) %}
+
+ext_pillar_postgres_config:
+  file.managed:
+    - name: /etc/salt/master.d/ext_pillar_postgres.conf
+    - source: salt://salt/master/files/ext_pillar_postgres.conf.jinja
+    - template: jinja
+    - mode: '0640'
+    - user: root
+    - group: salt
+    - watch_in:
+      - service: salt_master_service
+
+{% else %}
+
+# When the flag is off make sure any previously-deployed config is removed
+# so a rollback flips behavior cleanly.
+ext_pillar_postgres_config_absent:
+  file.absent:
+    - name: /etc/salt/master.d/ext_pillar_postgres.conf
+    - watch_in:
+      - service: salt_master_service
+
+{% endif %}
+
+{% else %}
+
+{{sls}}_state_not_allowed:
+  test.fail_without_changes:
+    - name: {{sls}}_state_not_allowed
+
+{% endif %}
@@ -0,0 +1,38 @@
+# /etc/salt/master.d/ext_pillar_postgres.conf
+# Rendered by salt/salt/master/ext_pillar_postgres.sls.
+# Reads the so_pillar.* schema in so-postgres and overlays it onto SLS pillar.
+# SLS still renders first (ext_pillar_first: False) so bootstrap and mine-driven
+# pillars work before Postgres is reachable; PG values overlay/override on top.
+
+postgres:
+  host: {{ pillar.get('postgres', {}).get('host', '127.0.0.1') }}
+  port: {{ pillar.get('postgres', {}).get('port', 5432) }}
+  db: securityonion
+  user: so_pillar_master
+  pass: {{ pillar['secrets']['pillar_master_pass'] }}
+
+ext_pillar_first: False
+pillar_source_merging_strategy: smart
+pillar_merge_lists: False
+
+pillar_cache: True
+pillar_cache_backend: disk
+pillar_cache_ttl: {{ pillar.get('postgres', {}).get('so_pillar', {}).get('pillar_cache_ttl', 60) }}
+
+# List form (not mapping form) so result rows merge into the pillar root rather
+# than under a named subtree. Verified against salt/pillar/sql_base.py: list
+# entries pass root=None to enter_root() which sets self.focus = self.result.
+ext_pillar:
+  - postgres:
+      - query: "SELECT data FROM so_pillar.v_pillar_global WHERE %s IS NOT NULL ORDER BY sort_key, pillar_path"
+        as_json: True
+        ignore_null: True
+      - query: "SELECT data FROM so_pillar.v_pillar_role WHERE minion_id = %s ORDER BY sort_key, pillar_path"
+        as_json: True
+        ignore_null: True
+      - query: "SELECT data FROM so_pillar.v_pillar_minion WHERE minion_id = %s ORDER BY sort_key, pillar_path"
+        as_json: True
+        ignore_null: True
+      - query: "SELECT data FROM so_pillar.fn_pillar_secrets(%s)"
+        as_json: True
+        ignore_null: True
@@ -0,0 +1,20 @@
+# /etc/salt/master.d/pg_notify_pillar_engine.conf
+# Rendered by salt/salt/master/pg_notify_pillar_engine.sls.
+#
+# Subscribes the salt-master to so_pillar.change_queue via LISTEN
+# so_pillar_change. The engine drains queued changes and re-publishes
+# them on the event bus as 'so/pillar/changed'. Reactor wiring is in
+# so_pillar_reactor.conf.
+
+engines:
+  - pg_notify_pillar:
+      host: {{ pillar.get('postgres', {}).get('host', '127.0.0.1') }}
+      port: {{ pillar.get('postgres', {}).get('port', 5432) }}
+      dbname: securityonion
+      user: so_pillar_master
+      password: {{ pillar['secrets']['pillar_master_pass'] }}
+      channel: so_pillar_change
+      debounce_ms: {{ pillar.get('postgres', {}).get('so_pillar', {}).get('engine_debounce_ms', 500) }}
+      reconnect_backoff: {{ pillar.get('postgres', {}).get('so_pillar', {}).get('engine_reconnect_backoff', 5) }}
+      backlog_interval: {{ pillar.get('postgres', {}).get('so_pillar', {}).get('engine_backlog_interval', 30) }}
+      batch_limit: {{ pillar.get('postgres', {}).get('so_pillar', {}).get('engine_batch_limit', 500) }}
@@ -0,0 +1,12 @@
+# /etc/salt/master.d/so_pillar_reactor.conf
+# Wires the so/pillar/changed event tag — emitted by the pg_notify_pillar
+# engine — to the so_pillar_changed reactor, which dispatches to
+# orch.so_pillar_reload.
+#
+# Lives in its own file (rather than appended to reactor_hypervisor.conf)
+# so the postgres:so_pillar:enabled flag can flip it on/off independently
+# of hypervisor reactor wiring.
+
+reactor:
+  - 'so/pillar/changed':
+    - /opt/so/saltstack/default/salt/reactor/so_pillar_changed.sls
@@ -0,0 +1,81 @@
+# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
+# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
+# https://securityonion.net/license; you may not use this file except in compliance with the
+# Elastic License 2.0.
+
+# Deploys the pg_notify_pillar engine module + its master.d config so the
+# salt-master subscribes to so_pillar.change_queue and republishes changes
+# on the salt event bus as so/pillar/changed. Reactor (so_pillar_changed.sls)
+# matches that tag and dispatches the appropriate orch.
+#
+# Gated on the same postgres:so_pillar:enabled flag as the schema and
+# ext_pillar config so the three components flip together.
+
+{% from 'allowed_states.map.jinja' import allowed_states %}
+{% if sls.split('.')[0] in allowed_states %}
+
+{% if salt['pillar.get']('postgres:so_pillar:enabled', False) %}
+
+pg_notify_pillar_engine_module:
+  file.managed:
+    - name: /etc/salt/engines/pg_notify_pillar.py
+    - source: salt://salt/engines/master/pg_notify_pillar.py
+    - mode: '0644'
+    - user: root
+    - group: root
+    - makedirs: True
+    - watch_in:
+      - service: salt_master_service
+
+pg_notify_pillar_engine_config:
+  file.managed:
+    - name: /etc/salt/master.d/pg_notify_pillar_engine.conf
+    - source: salt://salt/master/files/pg_notify_pillar_engine.conf.jinja
+    - template: jinja
+    - mode: '0640'
+    - user: root
+    - group: salt
+    - watch_in:
+      - service: salt_master_service
+
+pg_notify_pillar_reactor_config:
+  file.managed:
+    - name: /etc/salt/master.d/so_pillar_reactor.conf
+    - source: salt://salt/master/files/so_pillar_reactor.conf
+    - mode: '0644'
+    - user: root
+    - group: root
+    - watch_in:
+      - service: salt_master_service
+
+{% else %}
+
+# When the flag flips off, peel everything back so a rollback returns to
+# pure-disk pillar with no orphan engine churning on a dead listen socket.
+pg_notify_pillar_engine_module_absent:
+  file.absent:
+    - name: /etc/salt/engines/pg_notify_pillar.py
+    - watch_in:
+      - service: salt_master_service
+
+pg_notify_pillar_engine_config_absent:
+  file.absent:
+    - name: /etc/salt/master.d/pg_notify_pillar_engine.conf
+    - watch_in:
+      - service: salt_master_service
+
+pg_notify_pillar_reactor_config_absent:
+  file.absent:
+    - name: /etc/salt/master.d/so_pillar_reactor.conf
+    - watch_in:
+      - service: salt_master_service
+
+{% endif %}
+
+{% else %}
+
+{{sls}}_state_not_allowed:
+  test.fail_without_changes:
+    - name: {{sls}}_state_not_allowed
+
+{% endif %}
@@ -3,6 +3,7 @@ soc:
    description: Enables or disables SOC. WARNING - Disabling this setting is unsupported and will cause the grid to malfunction. Re-enabling this setting is a manual effort via SSH.
    forcedType: bool
    advanced: True
+    readonly: True
  telemetryEnabled:
    title: SOC Telemetry
    description: When this setting is enabled and the grid is not in airgap mode, SOC will provide feature usage data to the Security Onion development team via Google Analytics. This data helps Security Onion developers determine which product features are being used and can also provide insight into improving the user interface. When changing this setting, wait for the grid to fully synchronize and then perform a hard browser refresh on SOC, to force the browser cache to update and reflect the new setting.
@@ -890,12 +891,16 @@ soc:
            suricata:
              description: The template used when creating a new Suricata detection. [publicId] will be replaced with an unused Public Id.
              multiline: True
+              forcedType: string
            strelka:
              description: The template used when creating a new Strelka detection.
              multiline: True
+              forcedType: string
            elastalert:
              description: The template used when creating a new ElastAlert detection. [publicId] will be replaced with an unused Public Id.
              multiline: True
+              forcedType: string
+
        grid:
          maxUploadSize:
            description: The maximum number of bytes for an uploaded PCAP import file.
@@ -10,12 +10,12 @@
 {%- set LOGSTASH_ENABLED = LOGSTASH_MERGED.enabled %}
 {%- set TG_OUT = TELEGRAFMERGED.output | upper %}
 {%- set PG_HOST = GLOBALS.manager_ip %}
-{#- Per-minion telegraf creds are written into the minion's own pillar file
-    (/opt/so/saltstack/local/pillar/minions/<id>.sls) by postgres.auth on the
-    manager. Each minion only sees its own password — the aggregate map in
-    postgres:auth:users is manager-scoped. #}
-{%- set PG_USER = salt['pillar.get']('postgres:telegraf:user', '') %}
-{%- set PG_PASS = salt['pillar.get']('postgres:telegraf:pass', '') %}
+{#- Per-minion telegraf creds live in the grid-wide telegraf/creds.sls pillar,
+    written by /usr/sbin/so-telegraf-cred on the manager. Each minion looks up
+    its own entry by grains.id. #}
+{%- set PG_ENTRY = salt['pillar.get']('telegraf:postgres_creds:' ~ grains.id, {}) %}
+{%- set PG_USER = PG_ENTRY.get('user', '') %}
+{%- set PG_PASS = PG_ENTRY.get('pass', '') %}
 # Global tags can be specified here in key="value" format.
 [global_tags]
  role = "{{ GLOBALS.role.split('-') | last }}"
@@ -202,10 +202,10 @@ check_service_status() {
 	systemctl status $service_name > /dev/null 2>&1
 	local status=$?
 	if [ $status -gt 0 ]; then
-		info "  $service_name is not running" 
+		info "$service_name is not running" 
 		return 1;
 	else
-		info "  $service_name is running"
+		info "$service_name is running"
 		return 0;
 	fi

@@ -1057,6 +1057,11 @@ generate_passwords(){
  POSTGRESPASS=$(get_random_value)
  SOCSRVKEY=$(get_random_value 64)
  IMPORTPASS=$(get_random_value)
+  # postsalt: salt-master connects to so_pillar.* as so_pillar_master, and the
+  # so-postgres container needs a symmetric key for pgcrypto-encrypted secrets.
+  # Both are generated here so they survive reinstall like the other secrets.
+  PILLARMASTERPASS=$(get_random_value)
+  SO_PILLAR_KEY=$(get_random_value 64)
 }

 generate_interface_vars() {
@@ -1549,13 +1554,8 @@ clear_previous_setup_results() {
 reinstall_init() {
 	info "Putting system in state to run setup again"

-	if [[ $install_type =~ ^(MANAGER|EVAL|MANAGERSEARCH|MANAGERHYPE|STANDALONE|FLEET|IMPORT)$ ]]; then
-		local salt_services=( "salt-master" "salt-minion" )
-	else
-		local salt_services=( "salt-minion" )
-	fi
-
-	local service_retry_count=20
+	# Always include both services. check_service_status skips units that aren't present.
+	local salt_services=( "salt-master" "salt-minion" )

 	{
 		# remove all of root's cronjobs
@@ -1571,31 +1571,51 @@ reinstall_init() {

 		salt-call state.apply ca.remove -linfo --local --file-root=../salt

-		# Kill any salt processes (safely)
+		# Stop salt services and force-kill any lingering salt processes (including orphans
+		# from an earlier reinstall attempt where the unit file is gone but processes survive)
+		# so dnf remove salt can run cleanly
 		for service in "${salt_services[@]}"; do
-			# Stop the service in the background so we can exit after a certain amount of time
 			if check_service_status "$service"; then
-				systemctl stop "$service" &
+				info "Stopping $service via systemctl"
+				systemctl stop "$service"
 			fi
-			local pid=$!
-
-			local count=0
-			while check_service_status "$service"; do
-				if [[ $count -gt $service_retry_count ]]; then
-					echo "Could not stop $service after 1 minute, exiting setup."
-
-					# Stop the systemctl process trying to kill the service, show user a message, then exit setup
-					kill -9 $pid
-					fail_setup
-				fi
-				
-				sleep 5
-				((count++))
-			done
 		done

+		# Unconditionally force-kill any remaining salt binaries — these may be orphaned
+		# from a prior aborted reinstall (no unit file, so systemctl can't see them).
+		for salt_bin in salt-master salt-minion salt-call salt-cloud; do
+			if pgrep -f "/usr/bin/${salt_bin}" > /dev/null 2>&1; then
+				info "Force-killing lingering $salt_bin processes"
+				pkill -9 -ef "/usr/bin/${salt_bin}" 2>/dev/null
+			fi
+		done
+		# Catch stray `salt` CLI children from saltutil.kill_all_jobs / state.apply invocations
+		pkill -9 -ef "/usr/bin/python3 /bin/salt" 2>/dev/null
+
+		# Give the kernel a moment to reap the killed processes before dnf removes the binaries
+		local kill_wait=0
+		while pgrep -f "/usr/bin/salt-" > /dev/null 2>&1; do
+			if [[ $kill_wait -gt 10 ]]; then
+				info "Salt processes still present after SIGKILL + 10s wait; proceeding anyway"
+				pgrep -af "/usr/bin/salt-" | while read -r line; do info "  lingering: $line"; done
+				break
+			fi
+			sleep 1
+			((kill_wait++))
+		done
+
+		# Clear the 'failed' state SIGKILL left on the units before removing the package
+		systemctl reset-failed salt-master.service salt-minion.service 2>/dev/null || true
+
 		# Remove all salt configs
-		rm -rf /etc/salt/engines/* /etc/salt/grains /etc/salt/master /etc/salt/master.d/* /etc/salt/minion /etc/salt/minion.d/* /etc/salt/pki/* /etc/salt/proxy /etc/salt/proxy.d/* /var/cache/salt/
+		dnf -y remove salt
+		rm -rf /etc/salt/ /var/cache/salt/
+
+		# Drop systemd's in-memory references to the now-removed units
+		systemctl daemon-reload
+
+		# Uninstall local Elastic Agent, if installed
+		elastic-agent uninstall -f

 		if command -v docker &> /dev/null; then
 			# Stop and remove all so-* containers so files can be changed with more safety
@@ -1619,10 +1639,7 @@ reinstall_init() {
 		backup_dir /nsm/hydra "$date_string"
 		backup_dir /nsm/influxdb "$date_string"

-		# Uninstall local Elastic Agent, if installed
-		elastic-agent uninstall -f
-
-	} >> "$setup_log" 2>&1
+	} 2>&1 | tee -a "$setup_log"

 	info "System reinstall init has been completed."
 }
@@ -1841,7 +1858,34 @@ secrets_pillar(){
 		"secrets:"\
 		"  import_pass: $IMPORTPASS"\
 		"  influx_pass: $INFLUXPASS"\
+		"  pillar_master_pass: $PILLARMASTERPASS"\
 		"  postgres_pass: $POSTGRESPASS" > $local_salt_dir/pillar/secrets.sls
+  elif ! grep -q '^[[:space:]]*pillar_master_pass:' $local_salt_dir/pillar/secrets.sls; then
+	# Existing install pre-postsalt — append the new key without disturbing
+	# the values already on disk. Keys we already wrote stay; only the new
+	# pillar_master_pass is added.
+	info "Appending pillar_master_pass to existing Secrets Pillar"
+	if [ -z "$PILLARMASTERPASS" ]; then
+	  PILLARMASTERPASS=$(get_random_value)
+	fi
+	printf '  pillar_master_pass: %s\n' "$PILLARMASTERPASS" >> $local_salt_dir/pillar/secrets.sls
+  fi
+
+  # postsalt: write the so_pillar pgcrypto master key to a 0400 file owned by
+  # root. The key itself is never read by Salt — schema_pillar.sls loads it
+  # into the so-postgres container via ALTER ROLE so_pillar_secret_owner SET
+  # so_pillar.master_key = '<key>'; the file just lets the value survive
+  # container restarts.
+  if [ ! -f /opt/so/conf/postgres/so_pillar.key ]; then
+	info "Generating so_pillar pgcrypto master key"
+	mkdir -p /opt/so/conf/postgres
+	if [ -z "$SO_PILLAR_KEY" ]; then
+	  SO_PILLAR_KEY=$(get_random_value 64)
+	fi
+	umask 077
+	printf '%s' "$SO_PILLAR_KEY" > /opt/so/conf/postgres/so_pillar.key
+	chmod 0400 /opt/so/conf/postgres/so_pillar.key
+	chown root:root /opt/so/conf/postgres/so_pillar.key
  fi
 }

@@ -219,7 +219,7 @@ if [ -n "$test_profile" ]; then
 	WEBUSER=onionuser@somewhere.invalid
 	WEBPASSWD1=0n10nus3r
 	WEBPASSWD2=0n10nus3r
-	NODE_DESCRIPTION="${HOSTNAME} - ${install_type} - ${MAINIP}"
+	NODE_DESCRIPTION="${HOSTNAME} - ${install_type} - ${MSRVIP_OFFSET}"

 	update_sudoers_for_testing
 fi
Author	SHA1	Message	Date
Mike Reeves	3d11694d51	make so-yaml PG-canonical and add pillar-change reactor stack Two coupled changes that together let so_pillar.* be the canonical config store, with config edits driving service reloads automatically: so-yaml PG-canonical mode - Adds /opt/so/conf/so-yaml/mode (and SO_YAML_BACKEND env override) with three values: dual (legacy), postgres (PG-only for managed paths), disk (emergency rollback). Bootstrap files (secrets.sls, ca/init.sls, .nodes.sls, top.sls, ...) stay disk-only regardless via the existing SkipPath allowlist in so_yaml_postgres.locate. - loadYaml/writeYaml/purgeFile now route to so_pillar. in postgres mode: replace/add/get all read+write the database with no disk file ever appearing. PG failure is fatal in postgres mode (no silent fallback); dual mode preserves the prior best-effort mirror. - so_yaml_postgres gains read_yaml(path), is_pg_managed(path), and is_enabled() so so-yaml can answer "is this path PG-managed and is PG up" without reaching into private helpers. - schema_pillar.sls writes /opt/so/conf/so-yaml/mode = postgres after the importer succeeds, so flipping postgres:so_pillar:enabled flips so-yaml's behavior in lockstep with the schema being live. pg_notify-driven change fan-out - 008_change_notify.sql adds so_pillar.change_queue + an AFTER trigger on pillar_entry that enqueues the locator and pg_notifies 'so_pillar_change'. Queue is drained at-least-once so engine restarts don't lose events; pg_notify is just the wakeup signal. - New salt-master engine pg_notify_pillar.py LISTENs on the channel, drains the queue with FOR UPDATE SKIP LOCKED, debounces bursts, and fires 'so/pillar/changed' events grouped by (scope, role, minion). - Reactor so_pillar_changed.sls catches the tag and dispatches to orch.so_pillar_reload, which carries a DISPATCH map of pillar-path prefix -> (state sls, role grain set) so adding a new service to the auto-reload list is a one-line edit instead of a new reactor. - Engine + reactor wiring is gated on the same postgres:so_pillar:enabled flag as the schema and ext_pillar config so the whole stack flips on/off together. Tests: 21 new cases (112 total, all passing) covering mode resolution, PG-managed detection, and PG-canonical read/write/purge routing with the PG client stubbed.	2026-05-01 09:31:48 -04:00
Mike Reeves	23255f88e0	add so-yaml dual-write to so_pillar.* + purge verb Hooks every so-yaml.py write through a new so_yaml_postgres helper that mirrors disk YAML mutations into so_pillar.pillar_entry via docker exec psql. Disk remains canonical during the transition; PG mirror failures are logged only when a real write error occurs (skipped paths and postgres-unreachable cases stay silent so existing callers don't see new noise on stderr). Adds a `purge YAML_FILE` verb on so-yaml that deletes the file from disk and removes the matching pillar_entry rows. For minion files it also drops the so_pillar.minion row, which CASCADEs to pillar_entry + role_member. Designed for so-minion's delete path (replaces rm -f) so the audit log captures the deletion. setup/so-functions::generate_passwords + secrets_pillar generate secrets:pillar_master_pass and /opt/so/conf/postgres/so_pillar.key on fresh installs, and append the password to existing secrets.sls files on upgrade. - salt/manager/tools/sbin/so_yaml_postgres.py: locate(), write_yaml(), purge_yaml(), and a small CLI for diagnostics. Skips bootstrap and mine-driven paths via the same allowlist used by so-pillar-import. - salt/manager/tools/sbin/so-yaml.py: import the helper, hook writeYaml() to mirror after every disk write, add purgeFile() and the purge verb. - salt/manager/tools/sbin/so-yaml_test.py: 16 new tests covering the purge verb and the path-locator / write contract of so_yaml_postgres without contacting Postgres. All 91 tests pass. - setup/so-functions: generate_passwords adds PILLARMASTERPASS and SO_PILLAR_KEY; secrets_pillar writes pillar_master_pass and the pgcrypto master key file.	2026-04-30 17:09:58 -04:00
Mike Reeves	d30b52b327	add so-pillar-import — seeds so_pillar.* from on-disk pillar tree Idempotent importer that schema_pillar.sls runs once at end of postgres state on first install, and that so-minion can call per-minion on add / delete. UPSERTs into so_pillar.pillar_entry; the audit trigger handles versioning so re-runs without SLS edits produce no version bumps. Connects via docker exec so-postgres psql, so no DSN config is required at first-install time. Skips bootstrap files (secrets.sls, postgres/ auth.sls, etc.), mine-driven nodes.sls files, and any file containing Jinja templates — those stay disk-authoritative and ext_pillar_first: False means they render before the PG overlay. Auto-syncs to /usr/sbin via the existing manager_sbin file.recurse.	2026-04-30 16:34:05 -04:00
Mike Reeves	3fad895d6a	add so_pillar schema + ext_pillar wiring (postsalt foundation) Lays the database-backed pillar foundation for the postsalt branch. Salt continues to read on-disk SLS first; the new ext_pillar config overlays values from the so_pillar.* schema in so-postgres. - salt/postgres/files/schema/pillar/00{1..7}_*.sql: idempotent DDL for scope/role/role_member/minion/pillar_entry/pillar_entry_history/ drift_log, secret pgcrypto helpers, RLS, pg_cron retention. - salt/postgres/schema_pillar.sls: applies the SQL files inside the so-postgres container after it's healthy, configures the master_key GUC, and runs so-pillar-import once. Gated on postgres:so_pillar:enabled feature flag (default false). - salt/salt/master/ext_pillar_postgres.{sls,conf.jinja}: drops /etc/salt/master.d/ext_pillar_postgres.conf with list-form ext_pillar queries (global/role/minion/secrets) and ext_pillar_first: False so bootstrap pillars on disk render before the PG overlay. - salt/postgres/init.sls + salt/salt/master.sls: include the new states. Both new state branches are guarded so a default install with the flag off is a no-op.	2026-04-30 16:30:57 -04:00
Mike Reeves	fa8162de02	Merge pull request #15749 from Security-Onion-Solutions/feature/postgres Add so-postgres Salt states and infrastructure	2026-04-28 10:15:47 -04:00
Josh Patterson	33abc429d1	Merge pull request #15835 from Security-Onion-Solutions/fix/reactor/sominon_setup fix sominion_setup reactor	2026-04-28 08:55:58 -04:00
Jorge Reyes	b22585ca90	Merge pull request #15833 from Security-Onion-Solutions/reyesj2-es933 exclude more transform job errors	2026-04-27 15:05:11 -05:00
reyesj2	9f2ca7012f	exclude more transform job errors	2026-04-27 15:02:13 -05:00
Josh Patterson	21aeb68188	fix sominion_setup reactor	2026-04-27 14:30:41 -04:00
Josh Patterson	81e60ec5bf	Merge pull request #15829 from Security-Onion-Solutions/fix/reinstall2 fix reinstall	2026-04-24 16:20:53 -04:00
Josh Patterson	199c2746f1	stop salt-minion and salt-master regardless of install type. display reinstall on console and save to logfile	2026-04-24 15:24:11 -04:00
Josh Patterson	8eca465ef6	uninstall elastic-agent before stopping dockers on reinstall	2026-04-24 14:35:11 -04:00
Jorge Reyes	a45e59239f	Merge pull request #15826 from Security-Onion-Solutions/reyesj2-es933 heavynode should run es cluster state	2026-04-24 13:07:48 -05:00
Josh Patterson	2ad0bcab7c	Merge pull request #15828 from Security-Onion-Solutions/fix/annotations readonly soc and kratos enabled	2026-04-24 14:00:02 -04:00
Josh Patterson	070d150420	readonly soc and kratos enabled	2026-04-24 13:56:35 -04:00
reyesj2	90ecbe90d8	allow heavynodes to run elasticsearch/cluster state	2026-04-24 12:56:27 -05:00
Josh Patterson	813fa03dc3	Merge pull request #15824 from Security-Onion-Solutions/fix/reinstall2 fix reinstall issue with salt	2026-04-24 12:22:54 -04:00
Josh Patterson	02381fbbe9	stop salt-cloud , belt-and-suspenders against a broken/incomplete salt RPM	2026-04-24 11:33:21 -04:00
Josh Patterson	0722b681b1	redo service stop on reinstall	2026-04-24 11:04:46 -04:00
Josh Patterson	564815e836	redo how services are stopped during reinstall	2026-04-24 10:46:29 -04:00
Jorge Reyes	88b30adf7f	Merge pull request #15823 from Security-Onion-Solutions/reyesj2-es933 typo	2026-04-24 09:27:08 -05:00
reyesj2	b6acf3b522	typo	2026-04-24 09:24:58 -05:00
Jason Ertel	ba55468da8	Merge pull request #15822 from Security-Onion-Solutions/jertel/wip numeric test description	2026-04-24 08:26:55 -04:00
Jason Ertel	cdd217283d	numeric test description	2026-04-24 08:13:36 -04:00
Jorge Reyes	810a582717	Merge pull request #15813 from Security-Onion-Solutions/reyesj2-es933 split up Elastic Fleet state	2026-04-23 14:51:32 -05:00
Mike Reeves	a6948e8dcb	Remove helpLink for influxdb in soc_global.yaml Removed helpLink for influxdb from endgamehost configuration.	2026-04-23 13:56:41 -04:00
Mike Reeves	5f35554fdc	Merge pull request #15712 from Security-Onion-Solutions/soupfix Fix soup	2026-04-23 12:39:50 -04:00
Mike Reeves	0ecc7ae594	soup: drop --local from postgres.telegraf_users reconcile The manager's /etc/salt/minion (written by so-functions:configure_minion) has no file_roots, so salt-call --local falls back to Salt's default /srv/salt and fails with "No matching sls found for 'postgres.telegraf_users' in env 'base'". \|\| true was silently swallowing the error, which meant the DB roles for the pillar entries just populated by the so-telegraf-cred backfill loop never actually got created. Route through salt-master instead; its file_roots already points at the default/local salt trees.	2026-04-23 11:25:44 -04:00
reyesj2	fdfca469cc	prevent non-manager nodes from running elasticsearch.cluster state manually	2026-04-23 09:53:07 -05:00
reyesj2	5f2ec76ba8	prevent fleetnode from being able to run elasticfleet.manager state manually	2026-04-23 09:50:45 -05:00
reyesj2	b015c8ff14	remove docker import	2026-04-23 09:31:30 -05:00
reyesj2	7e70870a9e	remove globals import	2026-04-23 09:25:36 -05:00
Mike Reeves	eadad6c163	soup: bootstrap postgres pillar stubs and secret on 3.0.0 upgrade pillar/top.sls now references postgres.soc_postgres / postgres.adv_postgres unconditionally, but make_some_dirs only runs at install time so managers upgrading from 3.0.0 have no local/pillar/postgres/ and salt-master fails pillar render on the first post-upgrade restart. Similarly, secrets_pillar is a no-op on upgrade (secrets.sls already exists), so secrets:postgres_pass never gets seeded and the postgres container's POSTGRES_PASSWORD_FILE and SOC's PG_ADMIN_PASS would land empty after highstate. Add ensure_postgres_local_pillar and ensure_postgres_secret to up_to_3.1.0 so the stubs and secret exist before masterlock/salt-master restart. Both are idempotent and safe to re-run.	2026-04-23 10:01:38 -04:00
reyesj2	22b32a16dd	include elasticfleet.config	2026-04-23 08:30:47 -05:00
reyesj2	22f869734e	add check for files before attempting to use file pattern to load templates	2026-04-22 23:11:31 -05:00
reyesj2	398bc9e4ed	update kibana discardCorruptObjects version	2026-04-22 20:38:13 -05:00
reyesj2	72dbb69a1c	fix searchnodes running elasticsearch/cluster state	2026-04-22 20:37:48 -05:00
reyesj2	339959d1c0	split up elasticfleet/enabled state	2026-04-22 20:30:40 -05:00
Mike Reeves	d5c0ec4404	so-yaml_test: cover loadYaml error paths Exercises the FileNotFoundError and generic-exception branches added to loadYaml in the previous commit, restoring 100% coverage required by the build.	2026-04-22 14:30:51 -04:00
Mike Reeves	e616b4c120	so-telegraf-cred: make executable and harden error handling so-telegraf-cred was committed with mode 644, causing `so-telegraf-cred add "$MINION_ID"` in so-minion's add_telegraf_to_minion to fail with "Permission denied" and log "Failed to provision postgres telegraf cred for <minion>". Mark it executable. Also bail early in seed_creds_file if mkdir/printf/chmod fail, and in so-yaml.py loadYaml surface a clear stderr message with the filename instead of an unhandled FileNotFoundError traceback.	2026-04-22 14:25:19 -04:00
Mike Reeves	f240a99e22	so-telegraf-cred: thin bash wrapper around so-yaml.py Swap the ~150-line Python implementation for a 48-line bash script that delegates YAML mutation to so-yaml.py — the same helper so-minion and soup already use. Same semantics: seed the creds pillar on first use, idempotent add, silent remove. SO minion ids are dot-free by construction (setup/so-functions:1884 strips everything after the first '.'), so using the raw id as the so-yaml.py key path is safe.	2026-04-22 11:09:53 -04:00
Mike Reeves	614f32c5e0	Split postgres auth from per-minion telegraf creds The old flow had two writers for each per-minion Telegraf password (so-minion wrote the minion pillar; postgres.auth regenerated any missing aggregate entries). They drifted on first-boot and there was no trigger to create DB roles when a new minion joined. Split responsibilities: - pillar/postgres/auth.sls (manager-scoped) keeps only the so_postgres admin cred. - pillar/telegraf/creds.sls (grid-wide) holds a {minion_id: {user, pass}} map, shadowed per-install by the local-pillar copy. - salt/manager/tools/sbin/so-telegraf-cred is the single writer: flock, atomic YAML write, PyYAML safe_dump so passwords never round-trip through so-yaml.py's type coercion. Idempotent add, quiet remove. - so-minion's add/remove hooks now shell out to so-telegraf-cred instead of editing pillar files directly. - postgres.telegraf_users iterates the new pillar key and CREATE/ALTERs roles from it; telegraf.conf reads its own entry via grains.id. - orch.deploy_newnode runs postgres.telegraf_users on the manager and refreshes the new minion's pillar before the new node highstates, so the DB role is in place the first time telegraf tries to connect. - soup's post_to_3.1.0 backfills the creds pillar from accepted salt keys (idempotent) and runs postgres.telegraf_users once to reconcile the DB.	2026-04-22 10:55:15 -04:00
Josh Patterson	cd6707a566	Merge pull request #15800 from Security-Onion-Solutions/feature/vm-raid-status monitor raid for vms	2026-04-22 09:42:44 -04:00
Josh Patterson	edd207a9d5	soup update socloud.conf	2026-04-22 09:20:53 -04:00
Mike Reeves	724d76965f	soup: update postgres backfill comment to reflect reactor removal The reactor path is gone; so-minion now owns add/delete for new minions. The backfill itself is unchanged — postgres.auth's up_minions fallback fills the aggregate, postgres.telegraf_users creates the roles, and the bash loop fans to per-minion pillar files — so the pre-feature upgrade story still works end-to-end. Just refresh the comment so it isn't misleading.	2026-04-21 15:45:05 -04:00
Mike Reeves	dbf4fb66a4	Clean up postgres telegraf cred on so-minion delete Paired with the add path in add_telegraf_to_minion: when a minion is removed, drop its entry from the aggregate postgres pillar and drop the matching so_telegraf_<safe> role from the database. Without this, stale entries and DB roles accumulate over time. Makes rotate-password and compromise-recovery both a clean delete+add: so-minion -o=delete -m=<id> so-minion -o=add -m=<id> The first call drops the role and clears the aggregate pillar; the second generates a brand-new password. The cleanup is best-effort — if so-postgres isn't running or the DROP ROLE fails (e.g., the role owns unexpected objects), we log a warning and continue so the minion delete itself never gets blocked by postgres state. Admins can mop up stray roles manually if that happens.	2026-04-21 15:43:01 -04:00
Mike Reeves	5f28e9b191	Move per-minion telegraf cred provisioning into so-minion Simpler, race-free replacement for the reactor + orch + fan-out chain. - salt/manager/tools/sbin/so-minion: expand add_telegraf_to_minion to generate a random 72-char password, reuse any existing password from the aggregate pillar, write postgres.telegraf.{user,pass} into the minion's own pillar file, and update the aggregate pillar so postgres.telegraf_users can CREATE ROLE on the next manager apply. Every create<ROLE> function already calls this hook, so add / addVM / setup dispatches are all covered identically and synchronously. - salt/postgres/auth.sls: strip the fanout_targets loop and the postgres_telegraf_minion_pillar_<safe> cmd.run block — it's now redundant. The state still manages the so_postgres admin user and writes the aggregate pillar for postgres.telegraf_users to consume. - salt/reactor/telegraf_user_sync.sls: deleted. - salt/orch/telegraf_postgres_sync.sls: deleted. - salt/salt/master.sls: drop the reactor_config_telegraf block that registered the reactor on /etc/salt/master.d/reactor_telegraf.conf. - salt/orch/deploy_newnode.sls: drop the manager_fanout_postgres_telegraf step and the require: it added to the newnode highstate. Back to its original 3/dev shape. No more ephemeral postgres_fanout_minion pillar, no more async salt/key reactor, no more so-minion setupMinionFiles race: the pillar write happens inline inside setupMinionFiles itself.	2026-04-21 15:34:15 -04:00
Jorge Reyes	01bd3b6e06	Merge pull request #15807 from Security-Onion-Solutions/reyesj2-es933 urlencode elasticsearch version	2026-04-21 14:11:04 -05:00
Mike Reeves	1abfd77351	Hide telegraf password from console and close so-minion race Two fixes on the postgres telegraf fan-out path: 1. postgres.auth cmd.run leaked the password to the console because Salt always prints the Name: field and `show_changes: False` does not apply to cmd.run. Move the user and password into the `env:` attribute so the shell body still sees them via $PG_USER / $PG_PASS but Salt's state reporter never renders them. 2. so-minion's addMinion -> setupMinionFiles sequence removes the minion pillar file and rewrites it from scratch, which wipes the postgres.telegraf.* entries the reactor may have already written on salt-key accept. Add a postgres.auth fan-out step to orch.deploy_newnode (the orch so-minion kicks off after setupMinionFiles) and require it from the new minion's highstate. Idempotent via the existing unless: guard in postgres.auth.	2026-04-21 15:10:57 -04:00
reyesj2	06a555fafb	urlencode elasticsearch version	2026-04-21 14:01:31 -05:00
Jason Ertel	7411031e11	Merge pull request #15803 from Security-Onion-Solutions/jertel/wip more error handling during image updates	2026-04-21 10:21:56 -04:00
Jason Ertel	247091766c	more error handling during image updates	2026-04-21 10:18:05 -04:00
Josh Patterson	7f93110d68	Merge remote-tracking branch 'origin/3/dev' into feature/vm-raid-status	2026-04-21 10:10:38 -04:00
Jason Ertel	33ef138866	Merge pull request #15797 from Security-Onion-Solutions/jertel/wip fix template annotation	2026-04-20 17:14:53 -04:00
Jason Ertel	71da27dc8e	fix template annotation	2026-04-20 17:02:25 -04:00
Josh Patterson	ee437265fc	monitor raid for vms	2026-04-20 12:00:02 -04:00
Mike Reeves	664f3fd18a	Fix soup	2026-04-01 14:47:05 -04:00