reduce highstate frequency with active push for rules and pillars

- schedule highstate every 2 hours (was 15 minutes); interval lives in global:push:highstate_interval_hours so the SOC admin UI can tune it and so-salt-minion-check derives its threshold as (interval + 1) * 3600 - add inotify beacon on the manager + master reactor + orch.push_batch that writes per-app intent files, with a so-push-drainer schedule on the manager that debounces, dedupes, and dispatches a single orchestration - pillar_push_map.yaml allowlists the apps whose pillar changes trigger an immediate targeted state.apply (targets verified against salt/top.sls); edits under pillar/minions/ trigger a state.highstate on that one minion - host-batch every push orchestration (batch: 25%, batch_wait: 15) so rule changes don't thundering-herd large fleets - new global:push:enabled kill-switch tears down the beacon, reactor config, and drainer schedule on the next highstate for operators who want to keep highstate-only behavior - set restart_policy: unless-stopped on 23 container states so docker recovers crashes without waiting for the next highstate; leave registry (always), strelka/backend (on-failure), kratos, and hydra alone with inline comments explaining why
Merge pull request #15742 from Security-Onion-Solutions/mwright/ai-query-length
2026-04-11 15:22:34 +02:00 · 2026-04-10 15:43:16 -04:00 · 2026-04-09 11:28:37 -04:00 · 2026-04-09 10:39:13 -04:00 · 2026-04-09 10:18:36 -04:00 · 2026-04-08 16:00:26 -04:00
48 changed files with 973 additions and 31 deletions
--- a/salt/common/tools/sbin_jinja/so-salt-minion-check
+++ b/salt/common/tools/sbin_jinja/so-salt-minion-check
@@ -1,5 +1,3 @@
-{% import_yaml 'salt/minion.defaults.yaml' as SALT_MINION_DEFAULTS -%}
-
 #!/bin/bash
 #
 # Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
@@ -25,7 +23,8 @@ SYSTEM_START_TIME=$(date -d "$(</proc/uptime awk '{print $1}') seconds ago" +%s)
 LAST_HIGHSTATE_END=$([ -e "/opt/so/log/salt/lasthighstate" ] && date -r /opt/so/log/salt/lasthighstate +%s || echo 0)
 LAST_HEALTHCHECK_STATE_APPLY=$([ -e "/opt/so/log/salt/state-apply-test" ] && date -r /opt/so/log/salt/state-apply-test +%s || echo 0)
 # SETTING THRESHOLD TO ANYTHING UNDER 600 seconds may cause a lot of salt-minion restarts since the job to touch the file occurs every 5-8 minutes by default
-THRESHOLD={{SALT_MINION_DEFAULTS.salt.minion.check_threshold}} #within how many seconds the file /opt/so/log/salt/state-apply-test must have been touched/modified before the salt minion is restarted
+# THRESHOLD is derived from the global push highstate interval + 1 hour, so the minion-check grace period tracks the schedule automatically.
+THRESHOLD=$(( ({{ salt['pillar.get']('global:push:highstate_interval_hours', 2) }} + 1) * 3600 )) #within how many seconds the file /opt/so/log/salt/state-apply-test must have been touched/modified before the salt minion is restarted
 THRESHOLD_DATE=$((LAST_HEALTHCHECK_STATE_APPLY+THRESHOLD))

 logCmd() {
--- a/salt/elastalert/enabled.sls
+++ b/salt/elastalert/enabled.sls
@@ -19,6 +19,7 @@ wait_for_elasticsearch:
 so-elastalert:
  docker_container.running:
    - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-elastalert:{{ GLOBALS.so_version }}
+    - restart_policy: unless-stopped
    - hostname: elastalert
    - name: so-elastalert
    - user: so-elastalert
--- a/salt/elastic-fleet-package-registry/enabled.sls
+++ b/salt/elastic-fleet-package-registry/enabled.sls
@@ -15,6 +15,7 @@ include:
 so-elastic-fleet-package-registry:
  docker_container.running:
    - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-elastic-fleet-package-registry:{{ GLOBALS.so_version }}
+    - restart_policy: unless-stopped
    - name: so-elastic-fleet-package-registry
    - hostname: Fleet-package-reg-{{ GLOBALS.hostname }}
    - detach: True
--- a/salt/elasticagent/enabled.sls
+++ b/salt/elasticagent/enabled.sls
@@ -16,6 +16,7 @@ include:
 so-elastic-agent:
  docker_container.running:
    - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-elastic-agent:{{ GLOBALS.so_version }}
+    - restart_policy: unless-stopped
    - name: so-elastic-agent
    - hostname: {{ GLOBALS.hostname }}
    - detach: True
--- a/salt/elasticfleet/enabled.sls
+++ b/salt/elasticfleet/enabled.sls
@@ -88,6 +88,7 @@ elasticagent_syncartifacts:
 so-elastic-fleet:
  docker_container.running:
    - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-elastic-agent:{{ GLOBALS.so_version }}
+    - restart_policy: unless-stopped
    - name: so-elastic-fleet
    - hostname: FleetServer-{{ GLOBALS.hostname }}
    - detach: True
--- a/salt/elasticsearch/enabled.sls
+++ b/salt/elasticsearch/enabled.sls
@@ -23,6 +23,7 @@ include:
 so-elasticsearch:
  docker_container.running:
    - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-elasticsearch:{{ ELASTICSEARCHMERGED.version }}
+    - restart_policy: unless-stopped
    - hostname: elasticsearch
    - name: so-elasticsearch
    - user: elasticsearch
--- a/salt/global/defaults.yaml
+++ b/salt/global/defaults.yaml
@@ -1,3 +1,10 @@
 global:
  pcapengine: SURICATA
-  pipeline: REDIS
+  pipeline: REDIS
+  push:
+    enabled: true
+    highstate_interval_hours: 2
+    debounce_seconds: 30
+    drain_interval: 15
+    batch: '25%'
+    batch_wait: 15
--- a/salt/global/soc_global.yaml
+++ b/salt/global/soc_global.yaml
@@ -11,18 +11,14 @@ global:
    regexFailureMessage: You must enter a valid IP address or CIDR.
  mdengine:
    description: Which engine to use for meta data generation. Options are ZEEK and SURICATA.
-    regex: ^(ZEEK|SURICATA)$
    options:
      - ZEEK
      - SURICATA
-    regexFailureMessage: You must enter either ZEEK or SURICATA.
    global: True
  pcapengine:
    description: Which engine to use for generating pcap. Currently only SURICATA is supported.
-    regex: ^(SURICATA)$
    options:
      - SURICATA
-    regexFailureMessage: You must enter either SURICATA.
    global: True
  ids:
    description: Which IDS engine to use. Currently only Suricata is supported.
@@ -42,11 +38,9 @@ global:
    advanced: True
  pipeline:
    description: Sets which pipeline technology for events to use. The use of Kafka requires a Security Onion Pro license.
-    regex: ^(REDIS|KAFKA)$
    options:
      - REDIS
      - KAFKA
-    regexFailureMessage: You must enter either REDIS or KAFKA.
    global: True
    advanced: True
  repo_host:
@@ -65,4 +59,41 @@ global:
    description: Allows use of Endgame with Security Onion. This feature requires a license from Endgame.
    global: True
    advanced: True
+  push:
+    enabled:
+      description: Master kill-switch for the active push feature. When disabled, rule and pillar changes are picked up at the next scheduled highstate instead of being pushed immediately.
+      forcedType: bool
+      helpLink: push
+      global: True
+    highstate_interval_hours:
+      description: How often every minion in the grid runs a scheduled state.highstate, in hours. Lower values keep minions closer in sync at the cost of more load; higher values reduce load but increase worst-case latency for non-pushed changes. The salt-minion health check restarts a minion if its last highstate is older than this value plus one hour.
+      forcedType: int
+      helpLink: push
+      global: True
+      advanced: True
+    debounce_seconds:
+      description: Trailing-edge debounce window in seconds. A push intent must be quiet for this long before the drainer dispatches. Rapid bursts of edits within this window coalesce into one dispatch.
+      forcedType: int
+      helpLink: push
+      global: True
+      advanced: True
+    drain_interval:
+      description: How often the push drainer checks for ready intents, in seconds. Small values lower dispatch latency at the cost of more background work on the manager.
+      forcedType: int
+      helpLink: push
+      global: True
+      advanced: True
+    batch:
+      description: "Host batch size for push orchestrations. A number (e.g. '10') or a percentage (e.g. '25%'). Limits how many minions run the push state at once so large fleets don't thundering-herd."
+      helpLink: push
+      global: True
+      advanced: True
+      regex: '^([0-9]+%?)$'
+      regexFailureMessage: Enter a whole number or a whole-number percentage (e.g. 10 or 25%).
+    batch_wait:
+      description: Seconds to wait between host batches in a push orchestration. Gives the fleet time to breathe between waves.
+      forcedType: int
+      helpLink: push
+      global: True
+      advanced: True

--- a/salt/hydra/enabled.sls
+++ b/salt/hydra/enabled.sls
@@ -58,6 +58,7 @@ so-hydra:
      - {{ ULIMIT.name }}={{ ULIMIT.soft }}:{{ ULIMIT.hard }}
    {%   endfor %}
    {% endif %}
+    # Intentionally unless-stopped -- matches the fleet default.
    - restart_policy: unless-stopped
    - watch:
      - file: hydraconfig
--- a/salt/idh/enabled.sls
+++ b/salt/idh/enabled.sls
@@ -15,6 +15,7 @@ include:
 so-idh:
  docker_container.running:
    - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-idh:{{ GLOBALS.so_version }}
+    - restart_policy: unless-stopped
    - name: so-idh
    - detach: True
    - network_mode: host
--- a/salt/influxdb/enabled.sls
+++ b/salt/influxdb/enabled.sls
@@ -18,6 +18,7 @@ include:
 so-influxdb:
  docker_container.running:
    - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-influxdb:{{ GLOBALS.so_version }}
+    - restart_policy: unless-stopped
    - hostname: influxdb
    - networks:
      - sobridge:
--- a/salt/influxdb/soc_influxdb.yaml
+++ b/salt/influxdb/soc_influxdb.yaml
@@ -85,7 +85,10 @@ influxdb:
      description: The log level to use for outputting log statements. Allowed values are debug, info, or error.
      global: True
      advanced: false
-      regex: ^(info|debug|error)$
+      options:
+      - info
+      - debug
+      - error
      helpLink: influxdb
    metrics-disabled:
      description: If true, the HTTP endpoint that exposes internal InfluxDB metrics will be inaccessible.
@@ -140,7 +143,9 @@ influxdb:
      description: Determines the type of storage used for secrets. Allowed values are bolt or vault.
      global: True
      advanced: True
-      regex: ^(bolt|vault)$
+      options:
+      - bolt
+      - vault
      helpLink: influxdb
    session-length:
      description: Number of minutes that a user login session can remain authenticated. 
@@ -260,7 +265,9 @@ influxdb:
      description: The type of data store to use for HTTP resources. Allowed values are disk or memory. Memory should not be used for production Security Onion installations.
      global: True
      advanced: True
-      regex: ^(disk|memory)$
+      options:
+      - disk
+      - memory
      helpLink: influxdb
    tls-cert: 
      description: The container path to the certificate to use for TLS encryption of the HTTP requests and responses.
--- a/salt/kafka/enabled.sls
+++ b/salt/kafka/enabled.sls
@@ -27,6 +27,7 @@ include:
 so-kafka:
  docker_container.running:
    - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-kafka:{{ GLOBALS.so_version }}
+    - restart_policy: unless-stopped
    - hostname: so-kafka
    - name: so-kafka
    - networks:
--- a/salt/kafka/soc_kafka.yaml
+++ b/salt/kafka/soc_kafka.yaml
@@ -128,10 +128,13 @@ kafka:
        title: ssl.keystore.password
        sensitive: True
        helpLink: kafka
-      ssl_x_keystore_x_type: 
+      ssl_x_keystore_x_type:
        description: The key store file format.
        title: ssl.keystore.type
-        regex: ^(JKS|PKCS12|PEM)$
+        options:
+        - JKS
+        - PKCS12
+        - PEM
        helpLink: kafka
      ssl_x_truststore_x_location:
        description: The trust store file location within the Docker container.
@@ -160,7 +163,11 @@ kafka:
      security_x_protocol:
        description: 'Broker communication protocol. Options are: SASL_SSL, PLAINTEXT, SSL, SASL_PLAINTEXT'
        title: security.protocol
-        regex: ^(SASL_SSL|PLAINTEXT|SSL|SASL_PLAINTEXT)
+        options:
+        - SASL_SSL
+        - PLAINTEXT
+        - SSL
+        - SASL_PLAINTEXT
        helpLink: kafka
      ssl_x_keystore_x_location:
        description: The key store file location within the Docker container.
@@ -174,7 +181,10 @@ kafka:
      ssl_x_keystore_x_type:
        description: The key store file format.
        title: ssl.keystore.type
-        regex: ^(JKS|PKCS12|PEM)$
+        options:
+        - JKS
+        - PKCS12
+        - PEM
        helpLink: kafka
      ssl_x_truststore_x_location:
        description: The trust store file location within the Docker container.
--- a/salt/kibana/enabled.sls
+++ b/salt/kibana/enabled.sls
@@ -16,6 +16,7 @@ include:
 so-kibana:
  docker_container.running:
    - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-kibana:{{ GLOBALS.so_version }}
+    - restart_policy: unless-stopped
    - hostname: kibana
    - user: kibana
    - networks:
--- a/salt/kratos/enabled.sls
+++ b/salt/kratos/enabled.sls
@@ -51,6 +51,7 @@ so-kratos:
      - {{ ULIMIT.name }}={{ ULIMIT.soft }}:{{ ULIMIT.hard }}
    {%   endfor %}
    {% endif %}
+    # Intentionally unless-stopped -- matches the fleet default.
    - restart_policy: unless-stopped
    - watch:
      - file: kratosschema
--- a/salt/kratos/soc_kratos.yaml
+++ b/salt/kratos/soc_kratos.yaml
@@ -21,8 +21,12 @@ kratos:
        description: "Specify the provider type. Required. Valid values are: auth0, generic, github, google, microsoft"
        global: True
        forcedType: string
-        regex: "auth0|generic|github|google|microsoft"
-        regexFailureMessage: "Valid values are: auth0, generic, github, google, microsoft"
+        options:
+        - auth0
+        - generic
+        - github
+        - google
+        - microsoft
        helpLink: oidc
      client_id: 
        description: Specify the client ID, also referenced as the application ID. Required.
@@ -43,8 +47,9 @@ kratos:
        description: The source of the subject identifier. Typically 'userinfo'. Only used when provider is 'microsoft'.
        global: True
        forcedType: string
-        regex: me|userinfo
-        regexFailureMessage: "Valid values are: me, userinfo"
+        options:
+        - me
+        - userinfo
        helpLink: oidc
      auth_url: 
        description: Provider's auth URL. Required when provider is 'generic'.
--- a/salt/logstash/enabled.sls
+++ b/salt/logstash/enabled.sls
@@ -28,6 +28,7 @@ include:
 so-logstash:
  docker_container.running:
    - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-logstash:{{ GLOBALS.so_version }}
+    - restart_policy: unless-stopped
    - hostname: so-logstash
    - name: so-logstash
    - networks:
--- a/salt/manager/tools/sbin/so-push-drainer
+++ b/salt/manager/tools/sbin/so-push-drainer
@@ -0,0 +1,233 @@
+#!/usr/bin/env python3
+
+# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
+# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
+# https://securityonion.net/license; you may not use this file except in compliance with the
+# Elastic License 2.0.
+
+"""
+so-push-drainer
+===============
+
+Scheduled drainer for the active-push feature. Runs on the manager every
+drain_interval seconds (default 15) via a salt schedule in salt/schedule.sls.
+
+For each intent file under /opt/so/state/push_pending/*.json whose last_touch
+is older than debounce_seconds, this script:
+  * concatenates the actions lists from every ready intent
+  * dedupes by (state or __highstate__, tgt, tgt_type)
+  * dispatches a single `salt-run state.orchestrate orch.push_batch --async`
+    with the deduped actions list passed as pillar kwargs
+  * deletes the contributed intent files on successful dispatch
+
+Reactor sls files (push_suricata, push_strelka, push_pillar) write intents
+but never dispatch directly -- see plan
+/home/mreeves/.claude/plans/goofy-marinating-hummingbird.md for the full design.
+"""
+
+import fcntl
+import glob
+import json
+import logging
+import logging.handlers
+import os
+import subprocess
+import sys
+import time
+
+sys.path.append('/opt/saltstack/salt/lib/python3.10/site-packages/')
+import salt.client
+
+PENDING_DIR = '/opt/so/state/push_pending'
+LOCK_FILE = os.path.join(PENDING_DIR, '.lock')
+LOG_FILE = '/opt/so/log/salt/so-push-drainer.log'
+
+HIGHSTATE_SENTINEL = '__highstate__'
+
+
+def _make_logger():
+    logger = logging.getLogger('so-push-drainer')
+    logger.setLevel(logging.INFO)
+    if not logger.handlers:
+        os.makedirs(os.path.dirname(LOG_FILE), exist_ok=True)
+        handler = logging.handlers.RotatingFileHandler(
+            LOG_FILE, maxBytes=5 * 1024 * 1024, backupCount=3,
+        )
+        handler.setFormatter(logging.Formatter(
+            '%(asctime)s | %(levelname)s | %(message)s',
+        ))
+        logger.addHandler(handler)
+    return logger
+
+
+def _load_push_cfg():
+    """Read the global:push pillar subtree via salt-call. Returns a dict."""
+    caller = salt.client.Caller()
+    cfg = caller.cmd('pillar.get', 'global:push', {})
+    return cfg if isinstance(cfg, dict) else {}
+
+
+def _read_intent(path, log):
+    try:
+        with open(path, 'r') as f:
+            return json.load(f)
+    except (IOError, ValueError) as exc:
+        log.warning('cannot read intent %s: %s', path, exc)
+        return None
+    except Exception:
+        log.exception('unexpected error reading %s', path)
+        return None
+
+
+def _dedupe_actions(actions):
+    seen = set()
+    deduped = []
+    for action in actions:
+        if not isinstance(action, dict):
+            continue
+        state_key = HIGHSTATE_SENTINEL if action.get('highstate') else action.get('state')
+        tgt = action.get('tgt')
+        tgt_type = action.get('tgt_type', 'compound')
+        if not state_key or not tgt:
+            continue
+        key = (state_key, tgt, tgt_type)
+        if key in seen:
+            continue
+        seen.add(key)
+        deduped.append(action)
+    return deduped
+
+
+def _dispatch(actions, log):
+    pillar_arg = json.dumps({'actions': actions})
+    cmd = [
+        'salt-run',
+        'state.orchestrate',
+        'orch.push_batch',
+        'pillar={}'.format(pillar_arg),
+        '--async',
+    ]
+    log.info('dispatching: %s', ' '.join(cmd[:3]) + ' pillar=<{} actions>'.format(len(actions)))
+    try:
+        result = subprocess.run(
+            cmd, check=True, capture_output=True, text=True, timeout=60,
+        )
+    except subprocess.CalledProcessError as exc:
+        log.error('dispatch failed (rc=%s): stdout=%s stderr=%s',
+                  exc.returncode, exc.stdout, exc.stderr)
+        return False
+    except subprocess.TimeoutExpired:
+        log.error('dispatch timed out after 60s')
+        return False
+    except Exception:
+        log.exception('dispatch raised')
+        return False
+    log.info('dispatch accepted: %s', (result.stdout or '').strip())
+    return True
+
+
+def main():
+    log = _make_logger()
+
+    if not os.path.isdir(PENDING_DIR):
+        # Nothing to do; reactors create the dir on first use.
+        return 0
+
+    try:
+        push = _load_push_cfg()
+    except Exception:
+        log.exception('failed to read global:push pillar; aborting drain pass')
+        return 1
+
+    if not push.get('enabled', True):
+        log.debug('push disabled; exiting')
+        return 0
+
+    debounce_seconds = int(push.get('debounce_seconds', 30))
+
+    os.makedirs(PENDING_DIR, exist_ok=True)
+    lock_fd = os.open(LOCK_FILE, os.O_CREAT | os.O_RDWR, 0o644)
+    try:
+        fcntl.flock(lock_fd, fcntl.LOCK_EX)
+
+        intent_files = [
+            p for p in sorted(glob.glob(os.path.join(PENDING_DIR, '*.json')))
+            if os.path.basename(p) != '.lock'
+        ]
+        if not intent_files:
+            return 0
+
+        now = time.time()
+        ready = []
+        skipped = 0
+        broken = []
+        for path in intent_files:
+            intent = _read_intent(path, log)
+            if not isinstance(intent, dict):
+                broken.append(path)
+                continue
+            last_touch = intent.get('last_touch', 0)
+            if now - last_touch < debounce_seconds:
+                skipped += 1
+                continue
+            ready.append((path, intent))
+
+        for path in broken:
+            try:
+                os.unlink(path)
+            except OSError:
+                pass
+
+        if not ready:
+            if skipped:
+                log.debug('no ready intents (%d still in debounce window)', skipped)
+            return 0
+
+        combined_actions = []
+        oldest_first_touch = now
+        all_paths = []
+        for path, intent in ready:
+            combined_actions.extend(intent.get('actions', []) or [])
+            first = intent.get('first_touch', now)
+            if first < oldest_first_touch:
+                oldest_first_touch = first
+            all_paths.extend(intent.get('paths', []) or [])
+
+        deduped = _dedupe_actions(combined_actions)
+        if not deduped:
+            log.warning('%d intent(s) had no usable actions; clearing', len(ready))
+            for path, _ in ready:
+                try:
+                    os.unlink(path)
+                except OSError:
+                    pass
+            return 0
+
+        debounce_duration = now - oldest_first_touch
+        log.info(
+            'draining %d intent(s): %d action(s) after dedupe (raw=%d), '
+            'debounce_duration=%.1fs, paths=%s',
+            len(ready), len(deduped), len(combined_actions),
+            debounce_duration, all_paths[:20],
+        )
+
+        if not _dispatch(deduped, log):
+            log.warning('dispatch failed; leaving intent files in place for retry')
+            return 1
+
+        for path, _ in ready:
+            try:
+                os.unlink(path)
+            except OSError:
+                log.exception('failed to remove drained intent %s', path)
+
+        return 0
+    finally:
+        try:
+            fcntl.flock(lock_fd, fcntl.LOCK_UN)
+        finally:
+            os.close(lock_fd)
+
+
+if __name__ == '__main__':
+    sys.exit(main())
--- a/salt/nginx/enabled.sls
+++ b/salt/nginx/enabled.sls
@@ -34,6 +34,7 @@ make-rule-dir-nginx:
 so-nginx:
  docker_container.running:
    - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-nginx:{{ GLOBALS.so_version }}
+    - restart_policy: unless-stopped
    - hostname: so-nginx
    - networks:
      - sobridge:
--- a/salt/orch/push_batch.sls
+++ b/salt/orch/push_batch.sls
@@ -0,0 +1,37 @@
+{% from 'global/map.jinja' import GLOBALMERGED %}
+{% set actions = salt['pillar.get']('actions', []) %}
+{% set BATCH = GLOBALMERGED.push.batch %}
+{% set BATCH_WAIT = GLOBALMERGED.push.batch_wait %}
+
+{% for action in actions %}
+{%   if action.get('highstate') %}
+apply_highstate_{{ loop.index }}:
+  salt.state:
+    - tgt: '{{ action.tgt }}'
+    - tgt_type: {{ action.get('tgt_type', 'compound') }}
+    - highstate: True
+    - batch: {{ action.get('batch', BATCH) }}
+    - batch_wait: {{ action.get('batch_wait', BATCH_WAIT) }}
+    - kwarg:
+        queue: 2
+{%   else %}
+refresh_pillar_{{ loop.index }}:
+  salt.function:
+    - name: saltutil.refresh_pillar
+    - tgt: '{{ action.tgt }}'
+    - tgt_type: {{ action.get('tgt_type', 'compound') }}
+
+apply_{{ action.state | replace('.', '_') }}_{{ loop.index }}:
+  salt.state:
+    - tgt: '{{ action.tgt }}'
+    - tgt_type: {{ action.get('tgt_type', 'compound') }}
+    - sls:
+      - {{ action.state }}
+    - batch: {{ action.get('batch', BATCH) }}
+    - batch_wait: {{ action.get('batch_wait', BATCH_WAIT) }}
+    - kwarg:
+        queue: 2
+    - require:
+      - salt: refresh_pillar_{{ loop.index }}
+{%   endif %}
+{% endfor %}
--- a/salt/reactor/pillar_push_map.yaml
+++ b/salt/reactor/pillar_push_map.yaml
@@ -0,0 +1,128 @@
+# One pillar directory can map to multiple (state, tgt) actions.
+# tgt is a raw salt compound expression. tgt_type is always "compound".
+# Per-action `batch` / `batch_wait` override the orch defaults (25% / 15s).
+#
+# Notes:
+#   - `bpf` is a pillar-only dir (no state of its own) consumed by both
+#     zeek and suricata via macros, so a bpf pillar change re-applies both.
+#   - suricata/strelka/zeek/elasticsearch/redis/kafka/logstash etc. have
+#     their own pillar dirs AND their own state, so they map 1:1 (or 1:2
+#     in strelka's case, because of the split init.sls / manager.sls).
+#   - `data` and `node_data` pillar dirs are intentionally omitted --
+#     they're pillar-only data consumed by many states; trying to handle
+#     them generically would amount to a highstate.
+#
+# The role sets here were verified line-by-line against salt/top.sls. If
+# salt/top.sls changes how an app is targeted, update the corresponding
+# compound here.
+
+# firewall: the one pillar everyone touches. Applied everywhere intentionally
+# because every host's iptables needs to know about every other host in the
+# grid. Salt's firewall state is idempotent (file.managed + iptables-restore
+# onchanges in salt/firewall/init.sls), so hosts whose rendered firewall is
+# unchanged do a file comparison and no-op without touching iptables -- actual
+# reload happens only on the hosts whose rules actually changed. Fleetwide
+# blast radius is intentional and matches the pre-plan behavior via highstate.
+# Adding N sensors in a burst coalesces into one dispatch via the drainer.
+firewall:
+  - state: firewall
+    tgt: '*'
+
+# bpf is pillar-only (no state); consumed by both zeek and suricata as macros.
+# Both states run on sensor_roles + so-import per salt/top.sls.
+bpf:
+  - state: zeek
+    tgt: 'G@role:so-eval or G@role:so-heavynode or G@role:so-import or G@role:so-sensor or G@role:so-standalone'
+  - state: suricata
+    tgt: 'G@role:so-eval or G@role:so-heavynode or G@role:so-import or G@role:so-sensor or G@role:so-standalone'
+
+# ca is applied universally.
+ca:
+  - state: ca
+    tgt: '*'
+
+# elastalert: eval, standalone, manager, managerhype, managersearch (NOT import).
+elastalert:
+  - state: elastalert
+    tgt: 'G@role:so-eval or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone'
+
+# elasticsearch: 8 roles.
+elasticsearch:
+  - state: elasticsearch
+    tgt: 'G@role:so-eval or G@role:so-heavynode or G@role:so-import or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-searchnode or G@role:so-standalone'
+
+# elasticagent: so-heavynode only.
+elasticagent:
+  - state: elasticagent
+    tgt: 'G@role:so-heavynode'
+
+# elasticfleet: base state only on pillar change. elasticfleet.install_agent_grid
+# is a deploy/enrollment step, not a config reload; leave it to the next highstate.
+elasticfleet:
+  - state: elasticfleet
+    tgt: 'G@role:so-eval or G@role:so-fleet or G@role:so-import or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone'
+
+# healthcheck: eval, sensor, standalone only.
+healthcheck:
+  - state: healthcheck
+    tgt: 'G@role:so-eval or G@role:so-sensor or G@role:so-standalone'
+
+# influxdb: manager_roles exactly.
+influxdb:
+  - state: influxdb
+    tgt: 'G@role:so-eval or G@role:so-import or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone'
+
+# kafka: standalone, manager, managerhype, managersearch, searchnode, receiver.
+kafka:
+  - state: kafka
+    tgt: 'G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-receiver or G@role:so-searchnode or G@role:so-standalone'
+
+# kibana: manager_roles exactly.
+kibana:
+  - state: kibana
+    tgt: 'G@role:so-eval or G@role:so-import or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone'
+
+# logstash: 8 roles, no eval/import.
+logstash:
+  - state: logstash
+    tgt: 'G@role:so-fleet or G@role:so-heavynode or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-receiver or G@role:so-searchnode or G@role:so-standalone'
+
+# nginx: 10 specific roles. NOT receiver, idh, hypervisor, desktop.
+nginx:
+  - state: nginx
+    tgt: 'G@role:so-eval or G@role:so-fleet or G@role:so-heavynode or G@role:so-import or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-searchnode or G@role:so-sensor or G@role:so-standalone'
+
+# redis: 6 roles. standalone, manager, managerhype, managersearch, heavynode, receiver.
+# (NOT eval, NOT import, NOT searchnode.)
+redis:
+  - state: redis
+    tgt: 'G@role:so-heavynode or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-receiver or G@role:so-standalone'
+
+# soc: manager_roles exactly.
+soc:
+  - state: soc
+    tgt: 'G@role:so-eval or G@role:so-import or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone'
+
+# strelka: sensor-side only on pillar change (sensor_roles). strelka.manager is
+# intentionally NOT fired on pillar changes -- YARA rule and strelka config
+# pillar changes are consumed by the sensor-side strelka backend, and re-running
+# strelka.manager on managers is both unnecessary and disruptive. strelka.manager
+# is left to the 2-hour highstate.
+strelka:
+  - state: strelka
+    tgt: 'G@role:so-eval or G@role:so-heavynode or G@role:so-sensor or G@role:so-standalone'
+
+# suricata: sensor_roles + so-import (5 roles).
+suricata:
+  - state: suricata
+    tgt: 'G@role:so-eval or G@role:so-heavynode or G@role:so-import or G@role:so-sensor or G@role:so-standalone'
+
+# telegraf: universal.
+telegraf:
+  - state: telegraf
+    tgt: '*'
+
+# zeek: sensor_roles + so-import (5 roles).
+zeek:
+  - state: zeek
+    tgt: 'G@role:so-eval or G@role:so-heavynode or G@role:so-import or G@role:so-sensor or G@role:so-standalone'
--- a/salt/reactor/push_pillar.sls
+++ b/salt/reactor/push_pillar.sls
@@ -0,0 +1,170 @@
+#!py
+
+# Reactor invoked by the inotify beacon on pillar file changes under
+# /opt/so/saltstack/local/pillar/.
+#
+# Two branches:
+#   A) per-minion override under pillar/minions/<id>.sls or adv_<id>.sls
+#      -> write an intent that runs state.highstate on just that minion.
+#   B) shared app pillar (pillar/<app>/...) -> look up <app> in
+#      pillar_push_map.yaml and write an intent with the entry's actions.
+#
+# Reactors never dispatch directly. The so-push-drainer schedule picks up
+# ready intents, dedupes across pending files, and dispatches orch.push_batch.
+# See plan /home/mreeves/.claude/plans/goofy-marinating-hummingbird.md.
+
+import fcntl
+import json
+import logging
+import os
+import time
+
+import salt.client
+import yaml
+
+LOG = logging.getLogger(__name__)
+
+PENDING_DIR = '/opt/so/state/push_pending'
+LOCK_FILE = os.path.join(PENDING_DIR, '.lock')
+MAX_PATHS = 20
+
+PILLAR_ROOT = '/opt/so/saltstack/local/pillar/'
+MINIONS_PREFIX = PILLAR_ROOT + 'minions/'
+
+# The pillar_push_map.yaml is shipped via salt:// but the reactor runs on the
+# master, which mounts the default saltstack tree at this path.
+PUSH_MAP_PATH = '/opt/so/saltstack/default/salt/reactor/pillar_push_map.yaml'
+
+_PUSH_MAP_CACHE = {'mtime': 0, 'data': None}
+
+
+def _load_push_map():
+    try:
+        st = os.stat(PUSH_MAP_PATH)
+    except OSError:
+        LOG.warning('push_pillar: %s not found', PUSH_MAP_PATH)
+        return {}
+    if _PUSH_MAP_CACHE['mtime'] != st.st_mtime:
+        try:
+            with open(PUSH_MAP_PATH, 'r') as f:
+                _PUSH_MAP_CACHE['data'] = yaml.safe_load(f) or {}
+        except Exception:
+            LOG.exception('push_pillar: failed to load %s', PUSH_MAP_PATH)
+            _PUSH_MAP_CACHE['data'] = {}
+        _PUSH_MAP_CACHE['mtime'] = st.st_mtime
+    return _PUSH_MAP_CACHE['data'] or {}
+
+
+def _push_enabled():
+    try:
+        caller = salt.client.Caller()
+        return bool(caller.cmd('pillar.get', 'global:push:enabled', True))
+    except Exception:
+        LOG.exception('push_pillar: pillar.get global:push:enabled failed, assuming enabled')
+        return True
+
+
+def _write_intent(key, actions, path):
+    now = time.time()
+    try:
+        os.makedirs(PENDING_DIR, exist_ok=True)
+    except OSError:
+        LOG.exception('push_pillar: cannot create %s', PENDING_DIR)
+        return
+
+    intent_path = os.path.join(PENDING_DIR, '{}.json'.format(key))
+    lock_fd = os.open(LOCK_FILE, os.O_CREAT | os.O_RDWR, 0o644)
+    try:
+        fcntl.flock(lock_fd, fcntl.LOCK_EX)
+
+        intent = {}
+        if os.path.exists(intent_path):
+            try:
+                with open(intent_path, 'r') as f:
+                    intent = json.load(f)
+            except (IOError, ValueError):
+                intent = {}
+
+        intent.setdefault('first_touch', now)
+        intent['last_touch'] = now
+        intent['actions'] = actions
+        paths = intent.get('paths', [])
+        if path and path not in paths:
+            paths.append(path)
+            paths = paths[-MAX_PATHS:]
+        intent['paths'] = paths
+
+        tmp_path = intent_path + '.tmp'
+        with open(tmp_path, 'w') as f:
+            json.dump(intent, f)
+        os.rename(tmp_path, intent_path)
+    except Exception:
+        LOG.exception('push_pillar: failed to write intent %s', intent_path)
+    finally:
+        try:
+            fcntl.flock(lock_fd, fcntl.LOCK_UN)
+        finally:
+            os.close(lock_fd)
+
+
+def _minion_id_from_path(path):
+    # path is e.g. /opt/so/saltstack/local/pillar/minions/sensor1.sls
+    #          or /opt/so/saltstack/local/pillar/minions/adv_sensor1.sls
+    filename = os.path.basename(path)
+    if not filename.endswith('.sls'):
+        return None
+    stem = filename[:-4]
+    if stem.startswith('adv_'):
+        stem = stem[4:]
+    return stem or None
+
+
+def _app_from_path(path):
+    # path is e.g. /opt/so/saltstack/local/pillar/zeek/soc_zeek.sls -> 'zeek'
+    remainder = path[len(PILLAR_ROOT):]
+    if '/' not in remainder:
+        return None
+    return remainder.split('/', 1)[0] or None
+
+
+def run():
+    if not _push_enabled():
+        LOG.info('push_pillar: push disabled, skipping')
+        return {}
+
+    path = data.get('data', {}).get('path', '')  # noqa: F821 -- data provided by reactor
+    if not path or not path.startswith(PILLAR_ROOT):
+        LOG.debug('push_pillar: ignoring path outside pillar root: %s', path)
+        return {}
+
+    # Branch A: per-minion override
+    if path.startswith(MINIONS_PREFIX):
+        minion_id = _minion_id_from_path(path)
+        if not minion_id:
+            LOG.debug('push_pillar: ignoring non-sls path under minions/: %s', path)
+            return {}
+        actions = [{'highstate': True, 'tgt': minion_id, 'tgt_type': 'glob'}]
+        _write_intent('minion_{}'.format(minion_id), actions, path)
+        LOG.info('push_pillar: per-minion intent updated for %s (path=%s)', minion_id, path)
+        return {}
+
+    # Branch B: shared app pillar -> allowlist lookup
+    app = _app_from_path(path)
+    if not app:
+        LOG.debug('push_pillar: ignoring path with no app segment: %s', path)
+        return {}
+
+    push_map = _load_push_map()
+    entry = push_map.get(app)
+    if not entry:
+        LOG.warning(
+            'push_pillar: pillar dir "%s" is not in pillar_push_map.yaml; '
+            'change will be picked up at the next scheduled highstate (path=%s)',
+            app, path,
+        )
+        return {}
+
+    actions = list(entry)  # copy to avoid mutating the cache
+    _write_intent('pillar_{}'.format(app), actions, path)
+    LOG.info('push_pillar: app intent updated for %s (path=%s)', app, path)
+    return {}
--- a/salt/reactor/push_strelka.sls
+++ b/salt/reactor/push_strelka.sls
@@ -0,0 +1,96 @@
+#!py
+
+# Reactor invoked by the inotify beacon on rule file changes under
+# /opt/so/saltstack/local/salt/strelka/rules/compiled/.
+#
+# Writes (or updates) a push intent at /opt/so/state/push_pending/rules_strelka.json
+# and returns {}. The so-push-drainer schedule picks up ready intents, dedupes
+# across pending files, and dispatches orch.push_batch. Reactors never dispatch
+# directly -- see plan /home/mreeves/.claude/plans/goofy-marinating-hummingbird.md.
+
+import fcntl
+import json
+import logging
+import os
+import time
+
+import salt.client
+
+LOG = logging.getLogger(__name__)
+
+PENDING_DIR = '/opt/so/state/push_pending'
+LOCK_FILE = os.path.join(PENDING_DIR, '.lock')
+MAX_PATHS = 20
+
+# Mirrors GLOBALS.sensor_roles in salt/vars/globals.map.jinja. Sensor-side
+# strelka runs on exactly these four roles; so-import gets strelka.manager
+# instead, which is not fired on pillar changes.
+SENSOR_ROLES = ['so-eval', 'so-heavynode', 'so-sensor', 'so-standalone']
+
+
+def _sensor_compound():
+    return ' or '.join('G@role:{}'.format(r) for r in SENSOR_ROLES)
+
+
+def _push_enabled():
+    try:
+        caller = salt.client.Caller()
+        return bool(caller.cmd('pillar.get', 'global:push:enabled', True))
+    except Exception:
+        LOG.exception('push_strelka: pillar.get global:push:enabled failed, assuming enabled')
+        return True
+
+
+def _write_intent(key, actions, path):
+    now = time.time()
+    try:
+        os.makedirs(PENDING_DIR, exist_ok=True)
+    except OSError:
+        LOG.exception('push_strelka: cannot create %s', PENDING_DIR)
+        return
+
+    intent_path = os.path.join(PENDING_DIR, '{}.json'.format(key))
+    lock_fd = os.open(LOCK_FILE, os.O_CREAT | os.O_RDWR, 0o644)
+    try:
+        fcntl.flock(lock_fd, fcntl.LOCK_EX)
+
+        intent = {}
+        if os.path.exists(intent_path):
+            try:
+                with open(intent_path, 'r') as f:
+                    intent = json.load(f)
+            except (IOError, ValueError):
+                intent = {}
+
+        intent.setdefault('first_touch', now)
+        intent['last_touch'] = now
+        intent['actions'] = actions
+        paths = intent.get('paths', [])
+        if path and path not in paths:
+            paths.append(path)
+            paths = paths[-MAX_PATHS:]
+        intent['paths'] = paths
+
+        tmp_path = intent_path + '.tmp'
+        with open(tmp_path, 'w') as f:
+            json.dump(intent, f)
+        os.rename(tmp_path, intent_path)
+    except Exception:
+        LOG.exception('push_strelka: failed to write intent %s', intent_path)
+    finally:
+        try:
+            fcntl.flock(lock_fd, fcntl.LOCK_UN)
+        finally:
+            os.close(lock_fd)
+
+
+def run():
+    if not _push_enabled():
+        LOG.info('push_strelka: push disabled, skipping')
+        return {}
+
+    path = data.get('data', {}).get('path', '')  # noqa: F821 -- data provided by reactor
+    actions = [{'state': 'strelka', 'tgt': _sensor_compound()}]
+    _write_intent('rules_strelka', actions, path)
+    LOG.info('push_strelka: intent updated for path=%s', path)
+    return {}
--- a/salt/reactor/push_suricata.sls
+++ b/salt/reactor/push_suricata.sls
@@ -0,0 +1,95 @@
+#!py
+
+# Reactor invoked by the inotify beacon on rule file changes under
+# /opt/so/saltstack/local/salt/suricata/rules/.
+#
+# Writes (or updates) a push intent at /opt/so/state/push_pending/rules_suricata.json
+# and returns {}. The so-push-drainer schedule picks up ready intents, dedupes
+# across pending files, and dispatches orch.push_batch. Reactors never dispatch
+# directly -- see plan /home/mreeves/.claude/plans/goofy-marinating-hummingbird.md.
+
+import fcntl
+import json
+import logging
+import os
+import time
+
+import salt.client
+
+LOG = logging.getLogger(__name__)
+
+PENDING_DIR = '/opt/so/state/push_pending'
+LOCK_FILE = os.path.join(PENDING_DIR, '.lock')
+MAX_PATHS = 20
+
+# Mirrors GLOBALS.sensor_roles in salt/vars/globals.map.jinja. Suricata also
+# runs on so-import per salt/top.sls, so that role is appended below.
+SENSOR_ROLES = ['so-eval', 'so-heavynode', 'so-sensor', 'so-standalone']
+
+
+def _sensor_compound_plus_import():
+    return ' or '.join('G@role:{}'.format(r) for r in SENSOR_ROLES) + ' or G@role:so-import'
+
+
+def _push_enabled():
+    try:
+        caller = salt.client.Caller()
+        return bool(caller.cmd('pillar.get', 'global:push:enabled', True))
+    except Exception:
+        LOG.exception('push_suricata: pillar.get global:push:enabled failed, assuming enabled')
+        return True
+
+
+def _write_intent(key, actions, path):
+    now = time.time()
+    try:
+        os.makedirs(PENDING_DIR, exist_ok=True)
+    except OSError:
+        LOG.exception('push_suricata: cannot create %s', PENDING_DIR)
+        return
+
+    intent_path = os.path.join(PENDING_DIR, '{}.json'.format(key))
+    lock_fd = os.open(LOCK_FILE, os.O_CREAT | os.O_RDWR, 0o644)
+    try:
+        fcntl.flock(lock_fd, fcntl.LOCK_EX)
+
+        intent = {}
+        if os.path.exists(intent_path):
+            try:
+                with open(intent_path, 'r') as f:
+                    intent = json.load(f)
+            except (IOError, ValueError):
+                intent = {}
+
+        intent.setdefault('first_touch', now)
+        intent['last_touch'] = now
+        intent['actions'] = actions
+        paths = intent.get('paths', [])
+        if path and path not in paths:
+            paths.append(path)
+            paths = paths[-MAX_PATHS:]
+        intent['paths'] = paths
+
+        tmp_path = intent_path + '.tmp'
+        with open(tmp_path, 'w') as f:
+            json.dump(intent, f)
+        os.rename(tmp_path, intent_path)
+    except Exception:
+        LOG.exception('push_suricata: failed to write intent %s', intent_path)
+    finally:
+        try:
+            fcntl.flock(lock_fd, fcntl.LOCK_UN)
+        finally:
+            os.close(lock_fd)
+
+
+def run():
+    if not _push_enabled():
+        LOG.info('push_suricata: push disabled, skipping')
+        return {}
+
+    path = data.get('data', {}).get('path', '')  # noqa: F821 -- data provided by reactor
+    actions = [{'state': 'suricata', 'tgt': _sensor_compound_plus_import()}]
+    _write_intent('rules_suricata', actions, path)
+    LOG.info('push_suricata: intent updated for path=%s', path)
+    return {}
--- a/salt/redis/enabled.sls
+++ b/salt/redis/enabled.sls
@@ -17,6 +17,7 @@ include:
 so-redis:
  docker_container.running:
    - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-redis:{{ GLOBALS.so_version }}
+    - restart_policy: unless-stopped
    - hostname: so-redis
    - user: socore
    - networks:
--- a/salt/registry/enabled.sls
+++ b/salt/registry/enabled.sls
@@ -21,6 +21,9 @@ so-dockerregistry:
    - networks:
      - sobridge:
        - ipv4_address: {{ DOCKERMERGED.containers['so-dockerregistry'].ip }}
+    # Intentionally `always` (not unless-stopped) -- registry is critical infra
+    # and must come back up even if it was manually stopped. Do not homogenize
+    # to unless-stopped; see the container auto-restart section of the plan.
    - restart_policy: always
    - port_bindings:
      {% for BINDING in DOCKERMERGED.containers['so-dockerregistry'].port_bindings %}
--- a/salt/salt/beacons.sls
+++ b/salt/salt/beacons.sls
@@ -1,3 +1,5 @@
+{% from 'vars/globals.map.jinja' import GLOBALS %}
+{% from 'global/map.jinja' import GLOBALMERGED %}
 {% set CHECKS = salt['pillar.get']('healthcheck:checks', {}) %}
 {% set ENABLED = salt['pillar.get']('healthcheck:enabled', False) %}
 {% set SCHEDULE = salt['pillar.get']('healthcheck:schedule', 30) %}
@@ -14,12 +16,28 @@ salt_beacons:
    - defaults:
        CHECKS: {{ CHECKS }}
        SCHEDULE: {{ SCHEDULE }}
-    - watch_in: 
+    - watch_in:
      - service: salt_minion_service
 {% else %}
 salt_beacons:
  file.absent:
    - name: /etc/salt/minion.d/beacons.conf
-    - watch_in: 
+    - watch_in:
+      - service: salt_minion_service
+{% endif %}
+
+{% if GLOBALS.is_manager and GLOBALMERGED.push.enabled %}
+salt_beacons_pushstate:
+  file.managed:
+    - name: /etc/salt/minion.d/beacons_pushstate.conf
+    - source: salt://salt/files/beacons_pushstate.conf.jinja
+    - template: jinja
+    - watch_in:
+      - service: salt_minion_service
+{% else %}
+salt_beacons_pushstate:
+  file.absent:
+    - name: /etc/salt/minion.d/beacons_pushstate.conf
+    - watch_in:
      - service: salt_minion_service
 {% endif %}
--- a/salt/salt/files/beacons_pushstate.conf.jinja
+++ b/salt/salt/files/beacons_pushstate.conf.jinja
@@ -0,0 +1,26 @@
+beacons:
+  inotify:
+    - disable_during_state_run: True
+    - coalesce: True
+    - files:
+        /opt/so/saltstack/local/salt/suricata/rules/:
+          mask:
+            - close_write
+            - moved_to
+            - delete
+          recurse: True
+          auto_add: True
+        /opt/so/saltstack/local/salt/strelka/rules/compiled/:
+          mask:
+            - close_write
+            - moved_to
+            - delete
+          recurse: True
+          auto_add: True
+        /opt/so/saltstack/local/pillar/:
+          mask:
+            - close_write
+            - moved_to
+            - delete
+          recurse: True
+          auto_add: True
--- a/salt/salt/files/reactor_pushstate.conf
+++ b/salt/salt/files/reactor_pushstate.conf
@@ -0,0 +1,7 @@
+reactor:
+  - 'salt/beacon/*/inotify//opt/so/saltstack/local/salt/suricata/rules/':
+    - salt://reactor/push_suricata.sls
+  - 'salt/beacon/*/inotify//opt/so/saltstack/local/salt/strelka/rules/compiled/':
+    - salt://reactor/push_strelka.sls
+  - 'salt/beacon/*/inotify//opt/so/saltstack/local/pillar/':
+    - salt://reactor/push_pillar.sls
--- a/salt/salt/master.sls
+++ b/salt/salt/master.sls
@@ -10,6 +10,7 @@
 #    software that is protected by the license key."

 {% from 'allowed_states.map.jinja' import allowed_states %}
+{% from 'global/map.jinja' import GLOBALMERGED %}
 {% if sls in allowed_states %}

 include:
@@ -62,6 +63,22 @@ engines_config:
    - name: /etc/salt/master.d/engines.conf
    - source: salt://salt/files/engines.conf

+{% if GLOBALMERGED.push.enabled %}
+reactor_pushstate_config:
+  file.managed:
+    - name: /etc/salt/master.d/reactor_pushstate.conf
+    - source: salt://salt/files/reactor_pushstate.conf
+    - watch_in:
+      - service: salt_master_service
+    - order: last
+{% else %}
+reactor_pushstate_config:
+  file.absent:
+    - name: /etc/salt/master.d/reactor_pushstate.conf
+    - watch_in:
+      - service: salt_master_service
+{% endif %}
+
 # update the bootstrap script when used for salt-cloud
 salt_bootstrap_cloud:
  file.managed:
--- a/salt/salt/minion.defaults.yaml
+++ b/salt/salt/minion.defaults.yaml
@@ -2,4 +2,3 @@
 salt:
  minion:
    version: '3006.19'
-    check_threshold: 3600 # in seconds, threshold used for so-salt-minion-check. any value less than 600 seconds may cause a lot of salt-minion restarts since the job to touch the file occurs every 5-8 minutes by default
--- a/salt/schedule.sls
+++ b/salt/schedule.sls
@@ -1,10 +1,26 @@
-{%   from 'vars/globals.map.jinja' import GLOBALS %}
+{% from 'vars/globals.map.jinja' import GLOBALS %}
+{% from 'global/map.jinja' import GLOBALMERGED %}

 highstate_schedule:
  schedule.present:
    - function: state.highstate
-    - minutes: 15
+    - hours: {{ GLOBALMERGED.push.highstate_interval_hours }}
    - maxrunning: 1
 {% if not GLOBALS.is_manager %}
-    - splay: 120
+    - splay: 1800
+{% endif %}
+
+{% if GLOBALS.is_manager and GLOBALMERGED.push.enabled %}
+push_drain_schedule:
+  schedule.present:
+    - function: cmd.run
+    - job_args:
+      - /usr/sbin/so-push-drainer
+    - seconds: {{ GLOBALMERGED.push.drain_interval }}
+    - maxrunning: 1
+    - return_job: False
+{% elif GLOBALS.is_manager %}
+push_drain_schedule:
+  schedule.absent:
+    - name: push_drain_schedule
 {% endif %}
--- a/salt/sensoroni/enabled.sls
+++ b/salt/sensoroni/enabled.sls
@@ -14,6 +14,7 @@ include:
 so-sensoroni:
  docker_container.running:
    - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-soc:{{ GLOBALS.so_version }}
+    - restart_policy: unless-stopped
    - network_mode: host
    - binds:
      - /nsm/import:/nsm/import:rw
--- a/salt/soc/defaults.yaml
+++ b/salt/soc/defaults.yaml
@@ -2687,4 +2687,5 @@ soc:
              lowBalanceColorAlert: 500000
              enabled: true
              adapter: SOAI
+              charsPerTokenEstimate: 4
            
--- a/salt/soc/enabled.sls
+++ b/salt/soc/enabled.sls
@@ -18,6 +18,7 @@ include:
 so-soc:
  docker_container.running:
    - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-soc:{{ GLOBALS.so_version }}
+    - restart_policy: unless-stopped
    - hostname: soc
    - name: so-soc
    - networks:
--- a/salt/soc/soc_soc.yaml
+++ b/salt/soc/soc_soc.yaml
@@ -761,7 +761,7 @@ soc:
              required: True
            - field: origin
              label: Country of Origin for the Model Training
-              required: false
+              required: False
            - field: contextLimitSmall
              label: Context Limit (Small)
              forcedType: int
@@ -779,6 +779,10 @@ soc:
            - field: enabled
              label: Enabled
              forcedType: bool
+            - field: charsPerTokenEstimate
+              label: Characters per Token Estimate
+              forcedType: float
+              required: False
        apiTimeoutMs:
          description: Duration (in milliseconds) to wait for a response from the SOC server API before giving up and showing an error on the SOC UI.
          global: True
--- a/salt/strelka/backend/enabled.sls
+++ b/salt/strelka/backend/enabled.sls
@@ -47,6 +47,10 @@ strelka_backend:
      - {{ ULIMIT.name }}={{ ULIMIT.soft }}:{{ ULIMIT.hard }}
    {%   endfor %}
    {% endif %}
+    # Intentionally `on-failure` (not unless-stopped) -- strelka backend shuts
+    # down cleanly during rule reloads and we do not want those clean exits to
+    # trigger an auto-restart. Do not homogenize; see the container
+    # auto-restart section of the plan.
    - restart_policy: on-failure
    - watch:
      - file: strelkasensorcompiledrules
--- a/salt/strelka/coordinator/enabled.sls
+++ b/salt/strelka/coordinator/enabled.sls
@@ -15,6 +15,7 @@ include:
 strelka_coordinator:
  docker_container.running:
    - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-redis:{{ GLOBALS.so_version }}
+    - restart_policy: unless-stopped
    - name: so-strelka-coordinator
    - networks:
      - sobridge:
--- a/salt/strelka/filestream/enabled.sls
+++ b/salt/strelka/filestream/enabled.sls
@@ -15,6 +15,7 @@ include:
 strelka_filestream:
  docker_container.running:
    - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-strelka-manager:{{ GLOBALS.so_version }}
+    - restart_policy: unless-stopped
    - binds:
      - /opt/so/conf/strelka/filestream/:/etc/strelka/:ro
      - /nsm/strelka:/nsm/strelka
--- a/salt/strelka/frontend/enabled.sls
+++ b/salt/strelka/frontend/enabled.sls
@@ -15,6 +15,7 @@ include:
 strelka_frontend:
  docker_container.running:
    - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-strelka-manager:{{ GLOBALS.so_version }}
+    - restart_policy: unless-stopped
    - binds:
      - /opt/so/conf/strelka/frontend/:/etc/strelka/:ro
      - /nsm/strelka/log/:/var/log/strelka/:rw
--- a/salt/strelka/gatekeeper/enabled.sls
+++ b/salt/strelka/gatekeeper/enabled.sls
@@ -15,6 +15,7 @@ include:
 strelka_gatekeeper:
  docker_container.running:
    - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-redis:{{ GLOBALS.so_version }}
+    - restart_policy: unless-stopped
    - name: so-strelka-gatekeeper
    - networks:
      - sobridge:
--- a/salt/strelka/manager/enabled.sls
+++ b/salt/strelka/manager/enabled.sls
@@ -15,6 +15,7 @@ include:
 strelka_manager:
  docker_container.running:
    - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-strelka-manager:{{ GLOBALS.so_version }}
+    - restart_policy: unless-stopped
    - binds:
      - /opt/so/conf/strelka/manager/:/etc/strelka/:ro
      {% if DOCKERMERGED.containers['so-strelka-manager'].custom_bind_mounts %}
--- a/salt/suricata/enabled.sls
+++ b/salt/suricata/enabled.sls
@@ -18,6 +18,7 @@ so-suricata:
  docker_container.running:
    - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-suricata:{{ GLOBALS.so_version }}
    - privileged: True
+    - restart_policy: unless-stopped
    - environment:
      - INTERFACE={{ GLOBALS.sensor.interface }}
      {% if DOCKERMERGED.containers['so-suricata'].extra_env %}
--- a/salt/suricata/soc_suricata.yaml
+++ b/salt/suricata/soc_suricata.yaml
@@ -64,8 +64,10 @@ suricata:
      helpLink: suricata
    conditional: 
      description: Set to "all" to record PCAP for all flows. Set to "alerts" to only record PCAP for Suricata alerts. Set to "tag" to only record PCAP for tagged rules.
-      regex: ^(all|alerts|tag)$
-      regexFailureMessage: You must enter either all, alert or tag.
+      options:
+      - all
+      - alerts
+      - tag
      helpLink: suricata
    dir:
      description: Parent directory to store PCAP.
@@ -83,7 +85,9 @@ suricata:
        advanced: True
      cluster-type:
        advanced: True
-        regex: ^(cluster_flow|cluster_qm)$
+        options:
+        - cluster_flow
+        - cluster_qm
      defrag:
        description: Enable defragmentation of IP packets before processing.
        forcedType: bool
--- a/salt/tcpreplay/init.sls
+++ b/salt/tcpreplay/init.sls
@@ -7,6 +7,7 @@ so-tcpreplay:
  docker_container.running:
    - network_mode: "host"
    - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-tcpreplay:{{ GLOBALS.so_version }}
+    - restart_policy: unless-stopped
    - name: so-tcpreplay
    - user: root
    - interactive: True
--- a/salt/telegraf/enabled.sls
+++ b/salt/telegraf/enabled.sls
@@ -18,6 +18,7 @@ include:
 so-telegraf:
  docker_container.running:
    - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-telegraf:{{ GLOBALS.so_version }}
+    - restart_policy: unless-stopped
    - user: 939
    - group_add: 939,920
    - environment:
--- a/salt/zeek/enabled.sls
+++ b/salt/zeek/enabled.sls
@@ -18,6 +18,7 @@ so-zeek:
    - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-zeek:{{ GLOBALS.so_version }}
    - start: True
    - privileged: True
+    - restart_policy: unless-stopped
    {% if DOCKERMERGED.containers['so-zeek'].ulimits %}
    - ulimits:
    {%   for ULIMIT in DOCKERMERGED.containers['so-zeek'].ulimits %}
Author	SHA1	Message	Date
Mike Reeves	a0cf0489d6	reduce highstate frequency with active push for rules and pillars - schedule highstate every 2 hours (was 15 minutes); interval lives in global:push:highstate_interval_hours so the SOC admin UI can tune it and so-salt-minion-check derives its threshold as (interval + 1) * 3600 - add inotify beacon on the manager + master reactor + orch.push_batch that writes per-app intent files, with a so-push-drainer schedule on the manager that debounces, dedupes, and dispatches a single orchestration - pillar_push_map.yaml allowlists the apps whose pillar changes trigger an immediate targeted state.apply (targets verified against salt/top.sls); edits under pillar/minions/ trigger a state.highstate on that one minion - host-batch every push orchestration (batch: 25%, batch_wait: 15) so rule changes don't thundering-herd large fleets - new global:push:enabled kill-switch tears down the beacon, reactor config, and drainer schedule on the next highstate for operators who want to keep highstate-only behavior - set restart_policy: unless-stopped on 23 container states so docker recovers crashes without waiting for the next highstate; leave registry (always), strelka/backend (on-failure), kratos, and hydra alone with inline comments explaining why	2026-04-10 15:43:16 -04:00
Matthew Wright	81afbd32d4	Merge pull request #15742 from Security-Onion-Solutions/mwright/ai-query-length Assistant: charsPerTokenEstimate	2026-04-09 11:28:37 -04:00
Josh Patterson	e9c4f40735	Merge pull request #15745 from Security-Onion-Solutions/delta define options in annotation files	2026-04-09 10:39:13 -04:00
Josh Patterson	9ec4a26f97	define options in annotation files	2026-04-09 10:18:36 -04:00
Josh Patterson	ef3cfc8722	Merge pull request #15741 from Security-Onion-Solutions/fix/suricata-pcap-log-max-files ensure max-files is 1 at minimum	2026-04-08 16:00:26 -04:00
Matthew Wright	28d31f4840	add charsPerTokenEstimate	2026-04-08 15:25:51 -04:00