Compare commits

..

14 Commits

Author SHA1 Message Date
Josh Patterson 1ee555957a Speed up so-elastic-fleet-integration-upgrade
Fetch each agent policy once and extract integration name/package/version/id
locally via a single jq pass instead of re-fetching the identical policy JSON
1+3N times. Memoize epm/packages latest-version lookups so each package is
queried once instead of per (policy, integration). Dispatch the per-integration
dry-run+upgrade as throttled background jobs (MAX_FLEET_JOBS) with
flock-serialized output and a FAIL_FILE marker, mirroring
elastic_fleet_load_integrations_dir.

Behavior preserved: same elastic-defend-endpoints/fleet_server skips, same
AUTO_UPGRADE_INTEGRATIONS default-package gating (moved into jq, using $defaults
to avoid the jq $def keyword collision), and exit 1 on any failure so salt
retries.
2026-06-12 15:23:43 -04:00
Josh Patterson 43f72c1f9f Parallelize so-elasticsearch-templates-load template PUTs
Load component and index templates as throttled background jobs (max 10
concurrent) instead of sequential curl PUTs, matching the bounded-concurrency
+ flock-serialized-output pattern used by the fleet/ILM load scripts. Keeps a
wait barrier between the component phase and the index phase so index
templates never load before their referenced component templates. Failures are
tracked via per-job marker files since counter increments can't escape
background subshells.
2026-06-12 15:11:34 -04:00
Josh Patterson ae6a705ce1 Speed up so-elastic-fleet-integration-policy-load
Fetch each agent policy once per group instead of refetching the full
policy (plus a fresh Kibana session cookie) for every integration file,
and dispatch the create/update writes as throttled background jobs.

Adds elastic_fleet_load_integrations_dir and elastic_fleet_throttle to
so-elastic-fleet-common, reusing the bounded-concurrency pattern from
so-elasticsearch-ilm-policy-load. Replaces the four serial loops in the
loader with one call per agent policy.
2026-06-12 09:38:41 -04:00
Josh Patterson b1273573ed Fix jq $def keyword collision in optional-integrations-load
The agent-policy enumeration passed --argjson def, creating a jq
variable $def. 'def' is a reserved keyword in jq and the deployed jq
version rejects it, so the program failed to compile and
in_use_integrations was left empty (silently disabling the in-use
upgrade guard). Rename the arg to $defaults.
2026-06-11 15:50:53 -04:00
Josh Patterson 6c42c419e2 Serialize ILM policy-load output with flock to stop interleaving
A single printf per block was not actually one write() call, so
concurrent jobs still occasionally interleaved their label and response
lines. Hold an flock around just the printf (curl still runs in
parallel) so each policy's block prints intact, keeping live
completion-order streaming.
2026-06-11 15:42:41 -04:00
Josh Patterson f23652397c Speed up so-elastic-fleet-optional-integrations-load decision logic
Replace the per-package decision loop (which forked ~10 processes per
package and rebuilt a growing JSON file on every add -> O(n^2)) with two
jq passes: one prints the status messages, one builds the bulk install
list. A vnum/needs() jq definition reproduces the previous
version_conversion/compare_versions and excluded/subscription/installed/
upgrade/in-use logic exactly. Also fetch each agent policy once and
extract non-default package names locally instead of re-fetching the
policy per integration (1+K -> 1 GET per policy). Install behavior is
unchanged.
2026-06-11 13:57:56 -04:00
Josh Patterson 07d3b148b5 fix output 2026-06-11 13:37:26 -04:00
Josh Patterson 780d9faf0d Parallelize so-elasticsearch-ilm-policy-load PUTs
Run the ~300 ILM policy PUTs concurrently (bounded to 10 in flight via a
throttle gate) instead of one serial curl per policy. Adds a put_policy
helper and waits for all background jobs before exiting. Preserves policy
parity; only the scheduling changes. Drops the dead empty sid cookie arg
(falls back to basic auth from curl.config as before).
2026-06-11 12:08:32 -04:00
Josh Patterson d2fe51d5fe Merge remote-tracking branch 'origin/3/dev' into soupmod2 2026-06-11 09:26:14 -04:00
Jason Ertel 0cc94980af Merge pull request #15967 from Security-Onion-Solutions/jertel/wip
Jertel/wip
2026-06-11 08:22:14 -04:00
Jason Ertel b8bf684077 ver 2026-06-11 08:18:38 -04:00
Jason Ertel f083db67e4 disable telemetry for automated tests 2026-06-11 08:17:39 -04:00
Josh Patterson 83aaa76f98 allow full highstate on manager when locked 2026-06-10 16:34:10 -04:00
Jason Ertel eb82f9ea9d kilo version 2026-06-08 16:53:35 -04:00
60 changed files with 383 additions and 1597 deletions
-142
View File
@@ -1,142 +0,0 @@
# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
# https://securityonion.net/license; you may not use this file except in compliance with the
# Elastic License 2.0.
# Custom salt beacon that watches the SOC audit_settings table in postgres for
# new settings changes and emits a beacon event per new row. This replaces the
# inotify watch on /opt/so/saltstack/local/pillar -- instead of monitoring pillar
# files on disk, we monitor the so_soc.audit_settings table that SOC writes to.
#
# Detection is poll-based with a monotonic `id` watermark persisted to
# WATERMARK_FILE: each pass selects rows with id greater than the last id seen,
# which makes it self-healing (a missed poll simply catches up on the next one).
#
# Each emitted event carries setting_id and node_id; the push_pillar reactor maps
# setting_id -> app via pillar_push_map.yaml and writes a push intent, after which
# the existing so-push-drainer / orch.push_batch pipeline takes over unchanged.
import logging
import os
import subprocess
log = logging.getLogger(__name__)
WATERMARK_FILE = '/opt/so/state/pillar_db_watch.id'
CONTAINER = 'so-postgres'
DATABASE = 'so_soc'
# Unaligned, tuples-only psql output with a field separator that cannot appear in
# an id/setting_id/node_id, so we can split each row reliably.
FIELD_SEP = '\x1f'
def __virtual__():
return True
def validate(config):
return True, 'valid'
def _read_watermark():
# Returns the last processed id, or None if the watermark has not been seeded.
try:
with open(WATERMARK_FILE, 'r') as f:
return int((f.read() or '').strip())
except (IOError, ValueError):
return None
def _write_watermark(value):
try:
os.makedirs(os.path.dirname(WATERMARK_FILE), exist_ok=True)
tmp = WATERMARK_FILE + '.tmp'
with open(tmp, 'w') as f:
f.write(str(int(value)))
os.rename(tmp, WATERMARK_FILE)
except OSError:
log.exception('pillar_db beacon: failed to persist watermark to %s', WATERMARK_FILE)
def _query(sql):
# Run a query against so_soc inside the so-postgres container over the unix
# socket (trust auth, no password). Returns stdout on success, or None on any
# failure so the caller can no-op and retry on the next interval.
cmd = [
'docker', 'exec', CONTAINER,
'psql', '-U', 'postgres', '-d', DATABASE,
'-tA', '-F', FIELD_SEP, '-c', sql,
]
try:
result = subprocess.run(cmd, capture_output=True, text=True, timeout=30)
except subprocess.TimeoutExpired:
log.warning('pillar_db beacon: psql timed out')
return None
except Exception:
log.exception('pillar_db beacon: failed to exec psql')
return None
if result.returncode != 0:
log.warning('pillar_db beacon: psql failed (rc=%s): %s',
result.returncode, (result.stderr or '').strip())
return None
return result.stdout
def beacon(config):
retval = []
watermark = _read_watermark()
# First run / missing watermark: seed to the current MAX(id) and emit nothing
# so we never replay the entire settings history into a fleetwide push.
if watermark is None:
seed = _query('SELECT COALESCE(MAX(id), 0) FROM audit_settings;')
if seed is None:
return retval # postgres not ready yet; retry next interval
try:
_write_watermark(int((seed or '0').strip() or 0))
except ValueError:
log.warning('pillar_db beacon: could not parse MAX(id) seed: %r', seed)
return retval
rows = _query(
"SELECT id, setting_id, COALESCE(node_id, '') FROM audit_settings "
"WHERE id > %d ORDER BY id;" % watermark
)
if rows is None:
return retval
max_id = watermark
for line in rows.splitlines():
# Do NOT str.strip() the whole line: Python treats the \x1f field
# separator (and \x1c-\x1e) as whitespace, so stripping would eat an
# empty trailing node_id field and make the row look malformed.
if not line.strip():
continue
parts = line.split(FIELD_SEP)
if len(parts) < 3:
log.warning('pillar_db beacon: skipping malformed row: %r', line)
continue
try:
row_id = int(parts[0])
except ValueError:
log.warning('pillar_db beacon: skipping row with non-int id: %r', line)
continue
setting_id = parts[1]
node_id = parts[2]
retval.append({
'tag': 'audit_settings',
'id': row_id,
'setting_id': setting_id,
'node_id': node_id,
})
if row_id > max_id:
max_id = row_id
if max_id > watermark:
_write_watermark(max_id)
log.info('pillar_db beacon: emitted %d change(s), watermark %d -> %d',
len(retval), watermark, max_id)
return retval
@@ -1,3 +1,5 @@
{% import_yaml 'salt/minion.defaults.yaml' as SALT_MINION_DEFAULTS -%}
#!/bin/bash #!/bin/bash
# #
# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one # Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
@@ -23,8 +25,7 @@ SYSTEM_START_TIME=$(date -d "$(</proc/uptime awk '{print $1}') seconds ago" +%s)
LAST_HIGHSTATE_END=$([ -e "/opt/so/log/salt/lasthighstate" ] && date -r /opt/so/log/salt/lasthighstate +%s || echo 0) LAST_HIGHSTATE_END=$([ -e "/opt/so/log/salt/lasthighstate" ] && date -r /opt/so/log/salt/lasthighstate +%s || echo 0)
LAST_HEALTHCHECK_STATE_APPLY=$([ -e "/opt/so/log/salt/state-apply-test" ] && date -r /opt/so/log/salt/state-apply-test +%s || echo 0) LAST_HEALTHCHECK_STATE_APPLY=$([ -e "/opt/so/log/salt/state-apply-test" ] && date -r /opt/so/log/salt/state-apply-test +%s || echo 0)
# SETTING THRESHOLD TO ANYTHING UNDER 600 seconds may cause a lot of salt-minion restarts since the job to touch the file occurs every 5-8 minutes by default # SETTING THRESHOLD TO ANYTHING UNDER 600 seconds may cause a lot of salt-minion restarts since the job to touch the file occurs every 5-8 minutes by default
# THRESHOLD is derived from the global push highstate interval + 1 hour, so the minion-check grace period tracks the schedule automatically. THRESHOLD={{SALT_MINION_DEFAULTS.salt.minion.check_threshold}} #within how many seconds the file /opt/so/log/salt/state-apply-test must have been touched/modified before the salt minion is restarted
THRESHOLD=$(( ({{ salt['pillar.get']('global:push:highstate_interval_hours', 2) }} + 1) * 3600 )) #within how many seconds the file /opt/so/log/salt/state-apply-test must have been touched/modified before the salt minion is restarted
THRESHOLD_DATE=$((LAST_HEALTHCHECK_STATE_APPLY+THRESHOLD)) THRESHOLD_DATE=$((LAST_HEALTHCHECK_STATE_APPLY+THRESHOLD))
logCmd() { logCmd() {
+1 -2
View File
@@ -9,8 +9,7 @@
prune_images: prune_images:
cmd.run: cmd.run:
- name: so-docker-prune - name: so-docker-prune
- onlyif: command -v /usr/sbin/so-docker-prune >/dev/null 2>&1 - order: last
- order: 9000
{% else %} {% else %}
-1
View File
@@ -19,7 +19,6 @@ wait_for_elasticsearch:
so-elastalert: so-elastalert:
docker_container.running: docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-elastalert:{{ GLOBALS.so_version }} - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-elastalert:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- hostname: elastalert - hostname: elastalert
- name: so-elastalert - name: so-elastalert
- user: so-elastalert - user: so-elastalert
@@ -15,7 +15,6 @@ include:
so-elastic-fleet-package-registry: so-elastic-fleet-package-registry:
docker_container.running: docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-elastic-fleet-package-registry:{{ GLOBALS.so_version }} - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-elastic-fleet-package-registry:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- name: so-elastic-fleet-package-registry - name: so-elastic-fleet-package-registry
- hostname: Fleet-package-reg-{{ GLOBALS.hostname }} - hostname: Fleet-package-reg-{{ GLOBALS.hostname }}
- detach: True - detach: True
-1
View File
@@ -16,7 +16,6 @@ include:
so-elastic-agent: so-elastic-agent:
docker_container.running: docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-elastic-agent:{{ GLOBALS.so_version }} - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-elastic-agent:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- name: so-elastic-agent - name: so-elastic-agent
- hostname: {{ GLOBALS.hostname }} - hostname: {{ GLOBALS.hostname }}
- detach: True - detach: True
-1
View File
@@ -42,7 +42,6 @@ elasticagent_syncartifacts:
so-elastic-fleet: so-elastic-fleet:
docker_container.running: docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-elastic-agent:{{ GLOBALS.so_version }} - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-elastic-agent:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- name: so-elastic-fleet - name: so-elastic-fleet
- hostname: FleetServer-{{ GLOBALS.hostname }} - hostname: FleetServer-{{ GLOBALS.hostname }}
- detach: True - detach: True
@@ -30,6 +30,70 @@ fleet_api() {
curl -sK /opt/so/conf/elasticsearch/curl.config -L "localhost:5601/api/fleet/${QUERYPATH}" "$@" --retry 3 --retry-delay 10 --fail 2>/dev/null curl -sK /opt/so/conf/elasticsearch/curl.config -L "localhost:5601/api/fleet/${QUERYPATH}" "$@" --retry 3 --retry-delay 10 --fail 2>/dev/null
} }
# Max number of concurrent Fleet write jobs (create/update). Override via env if needed.
MAX_FLEET_JOBS=${MAX_FLEET_JOBS:-10}
# Block until fewer than MAX_FLEET_JOBS background jobs are running.
elastic_fleet_throttle() {
while (( $(jobs -rp | wc -l) >= MAX_FLEET_JOBS )); do
wait -n
done
}
# Load every integration JSON in a directory into a single agent policy.
# The agent policy is fetched ONCE (not per file), and the create/update writes
# are dispatched as throttled background jobs.
# $1 AGENT_POLICY - the agent policy id/name to load integrations into
# $2 DIR - directory of integration *.json files
# $3 LABEL - human-readable label for log output
# $4 SKIP_CREATE_NAME - (optional) integration name to skip when creating (still updated if present)
# Returns 1 if any integration failed to create/update.
elastic_fleet_load_integrations_dir() {
local AGENT_POLICY=$1
local DIR=$2
local LABEL=$3
local SKIP_CREATE_NAME=$4
local POLICY_JSON FAIL_FILE INTEGRATION NAME ID
FAIL_FILE=$(mktemp)
# Fetch the agent policy a single time; we look up integration ids locally below.
POLICY_JSON=$(fleet_api "agent_policies/$AGENT_POLICY")
for INTEGRATION in "$DIR"/*.json; do
[ -e "$INTEGRATION" ] || continue
NAME=$(jq -r .name "$INTEGRATION")
ID=$(jq -r --arg n "$NAME" '.item.package_policies[]? | select(.name==$n) | .id' <<<"$POLICY_JSON")
elastic_fleet_throttle
{
if [ -n "$ID" ]; then
printf "\n\n%s - Updating integration %s\n" "$LABEL" "$NAME"
if ! elastic_fleet_integration_update "$ID" "@$INTEGRATION"; then
flock 9; echo "update ${INTEGRATION##*/}" >&9
fi
elif [ -n "$SKIP_CREATE_NAME" ] && [ "$NAME" == "$SKIP_CREATE_NAME" ]; then
printf "\n\n%s - Skipping creation of %s\n" "$LABEL" "$NAME"
else
printf "\n\n%s - Creating integration %s\n" "$LABEL" "$NAME"
if ! elastic_fleet_integration_create "@$INTEGRATION"; then
flock 9; echo "create ${INTEGRATION##*/}" >&9
fi
fi
} 9>>"$FAIL_FILE" &
done
wait
local rc=0
if [ -s "$FAIL_FILE" ]; then
printf "\n%s: failed integrations:\n" "$LABEL"
cat "$FAIL_FILE"
rc=1
fi
rm -f "$FAIL_FILE"
return $rc
}
elastic_fleet_integration_check() { elastic_fleet_integration_check() {
AGENT_POLICY=$1 AGENT_POLICY=$1
@@ -18,93 +18,26 @@ if [ ! -f /opt/so/state/eaintegrations.txt ]; then
# Third, configure Elastic Defend Integration seperately # Third, configure Elastic Defend Integration seperately
/usr/sbin/so-elastic-fleet-integration-policy-elastic-defend /usr/sbin/so-elastic-fleet-integration-policy-elastic-defend
# Each group fetches its agent policy once and dispatches create/update writes concurrently.
# Initial Endpoints # Initial Endpoints
for INTEGRATION in /opt/so/conf/elastic-fleet/integrations/endpoints-initial/*.json; do elastic_fleet_load_integrations_dir "endpoints-initial" \
printf "\n\nInitial Endpoints Policy - Loading $INTEGRATION\n" /opt/so/conf/elastic-fleet/integrations/endpoints-initial "Initial Endpoints Policy" || RETURN_CODE=1
elastic_fleet_integration_check "endpoints-initial" "$INTEGRATION"
if [ -n "$INTEGRATION_ID" ]; then
printf "\n\nIntegration $NAME exists - Updating integration\n"
if ! elastic_fleet_integration_update "$INTEGRATION_ID" "@$INTEGRATION"; then
echo -e "\nFailed to update integration for ${INTEGRATION##*/}"
RETURN_CODE=1
continue
fi
else
printf "\n\nIntegration does not exist - Creating integration\n"
if ! elastic_fleet_integration_create "@$INTEGRATION"; then
echo -e "\nFailed to create integration for ${INTEGRATION##*/}"
RETURN_CODE=1
continue
fi
fi
done
# Grid Nodes - General # Grid Nodes - General
for INTEGRATION in /opt/so/conf/elastic-fleet/integrations/grid-nodes_general/*.json; do elastic_fleet_load_integrations_dir "so-grid-nodes_general" \
printf "\n\nGrid Nodes Policy_General - Loading $INTEGRATION\n" /opt/so/conf/elastic-fleet/integrations/grid-nodes_general "Grid Nodes Policy_General" || RETURN_CODE=1
elastic_fleet_integration_check "so-grid-nodes_general" "$INTEGRATION"
if [ -n "$INTEGRATION_ID" ]; then
printf "\n\nIntegration $NAME exists - Updating integration\n"
if ! elastic_fleet_integration_update "$INTEGRATION_ID" "@$INTEGRATION"; then
echo -e "\nFailed to update integration for ${INTEGRATION##*/}"
RETURN_CODE=1
continue
fi
else
printf "\n\nIntegration does not exist - Creating integration\n"
if ! elastic_fleet_integration_create "@$INTEGRATION"; then
echo -e "\nFailed to create integration for ${INTEGRATION##*/}"
RETURN_CODE=1
continue
fi
fi
done
# Grid Nodes - Heavy # Grid Nodes - Heavy
for INTEGRATION in /opt/so/conf/elastic-fleet/integrations/grid-nodes_heavy/*.json; do elastic_fleet_load_integrations_dir "so-grid-nodes_heavy" \
printf "\n\nGrid Nodes Policy_Heavy - Loading $INTEGRATION\n" /opt/so/conf/elastic-fleet/integrations/grid-nodes_heavy "Grid Nodes Policy_Heavy" || RETURN_CODE=1
elastic_fleet_integration_check "so-grid-nodes_heavy" "$INTEGRATION"
if [ -n "$INTEGRATION_ID" ]; then
printf "\n\nIntegration $NAME exists - Updating integration\n"
if ! elastic_fleet_integration_update "$INTEGRATION_ID" "@$INTEGRATION"; then
echo -e "\nFailed to update integration for ${INTEGRATION##*/}"
RETURN_CODE=1
continue
fi
else
printf "\n\nIntegration does not exist - Creating integration\n"
if ! elastic_fleet_integration_create "@$INTEGRATION"; then
echo -e "\nFailed to create integration for ${INTEGRATION##*/}"
RETURN_CODE=1
continue
fi
fi
done
# Fleet Server - Optional integrations # Fleet Server - Optional integrations (one agent policy per FleetServer_* directory)
for INTEGRATION in /opt/so/conf/elastic-fleet/integrations-optional/FleetServer*/*.json; do for FLEET_DIR in /opt/so/conf/elastic-fleet/integrations-optional/FleetServer*/; do
if ! [ "$INTEGRATION" == "/opt/so/conf/elastic-fleet/integrations-optional/FleetServer*/*.json" ]; then [ -d "$FLEET_DIR" ] || continue
FLEET_POLICY=`echo "$INTEGRATION"| cut -d'/' -f7` FLEET_POLICY=$(basename "$FLEET_DIR")
printf "\n\nFleet Server Policy - Loading $INTEGRATION\n" elastic_fleet_load_integrations_dir "$FLEET_POLICY" \
elastic_fleet_integration_check "$FLEET_POLICY" "$INTEGRATION" "${FLEET_DIR%/}" "Fleet Server Policy" "elasticsearch-logs" || RETURN_CODE=1
if [ -n "$INTEGRATION_ID" ]; then
printf "\n\nIntegration $NAME exists - Updating integration\n"
if ! elastic_fleet_integration_update "$INTEGRATION_ID" "@$INTEGRATION"; then
echo -e "\nFailed to update integration for ${INTEGRATION##*/}"
RETURN_CODE=1
continue
fi
else
printf "\n\nIntegration does not exist - Creating integration\n"
if [ "$NAME" != "elasticsearch-logs" ]; then
if ! elastic_fleet_integration_create "@$INTEGRATION"; then
echo -e "\nFailed to create integration for ${INTEGRATION##*/}"
RETURN_CODE=1
continue
fi
fi
fi
fi
done done
# Only create the state file if all policies were created/updated successfully # Only create the state file if all policies were created/updated successfully
@@ -23,73 +23,90 @@ if [ $? -ne 0 ]; then
fi fi
default_packages=({% for pkg in SUPPORTED_PACKAGES %}"{{ pkg }}"{% if not loop.last %} {% endif %}{% endfor %}) default_packages=({% for pkg in SUPPORTED_PACKAGES %}"{{ pkg }}"{% if not loop.last %} {% endif %}{% endfor %})
# JSON array of the default packages, used by the jq filter below.
default_packages_json=$(printf '%s\n' "${default_packages[@]}" | jq -R . | jq -s '.')
# Output lock (serializes concurrent job output) and failure file (one marker line per
# failed integration). Mirrors the pattern used by elastic_fleet_load_integrations_dir.
OUTPUT_LOCK=$(mktemp)
FAIL_FILE=$(mktemp)
trap 'rm -f "$OUTPUT_LOCK" "$FAIL_FILE"' EXIT
# Cache of package name -> latest available version, so the same package is only looked up
# once instead of once per (policy, integration).
declare -A LATEST_VERSION_CACHE
ERROR=false
for AGENT_POLICY in $agent_policies; do for AGENT_POLICY in $agent_policies; do
if ! integrations=$(elastic_fleet_integration_policy_names "$AGENT_POLICY"); then # Fetch the agent policy a single time; package name/version and integration id are all
# extracted locally below instead of re-fetching the same policy per integration.
if ! POLICY_JSON=$(fleet_api "agent_policies/$AGENT_POLICY"); then
# this script upgrades default integration packages, exit 1 and let salt handle retrying # this script upgrades default integration packages, exit 1 and let salt handle retrying
exit 1 exit 1
fi fi
for INTEGRATION in $integrations; do
if ! [[ "$INTEGRATION" == "elastic-defend-endpoints" ]] && ! [[ "$INTEGRATION" == "fleet_server-"* ]]; then # One jq pass emits name/package.name/package.version/id for every eligible integration.
# Get package name so we know what package to look for when checking the current and latest available version # The endpoint/fleet_server skips and the default-package gate are applied here in jq.
if ! PACKAGE_NAME=$(elastic_fleet_integration_policy_package_name "$AGENT_POLICY" "$INTEGRATION"); then # $defaults (not $def, a jq reserved keyword) holds the default package list.
while IFS=$'\t' read -r INTEGRATION PACKAGE_NAME PACKAGE_VERSION INTEGRATION_ID; do
[ -n "$INTEGRATION" ] || continue
# Look up the latest available version once per package, then memoize it.
if [[ -z "${LATEST_VERSION_CACHE[$PACKAGE_NAME]+set}" ]]; then
if ! AVAILABLE_VERSION=$(elastic_fleet_package_latest_version_check "$PACKAGE_NAME"); then
echo "Error: Failed getting latest version for $PACKAGE_NAME"
exit 1 exit 1
fi fi
{%- if not AUTO_UPGRADE_INTEGRATIONS %} LATEST_VERSION_CACHE[$PACKAGE_NAME]=$AVAILABLE_VERSION
if [[ " ${default_packages[@]} " =~ " $PACKAGE_NAME " ]]; then
{%- endif %}
# Get currently installed version of package
attempt=0
max_attempts=3
while [ $attempt -lt $max_attempts ]; do
if PACKAGE_VERSION=$(elastic_fleet_integration_policy_package_version "$AGENT_POLICY" "$INTEGRATION") && AVAILABLE_VERSION=$(elastic_fleet_package_latest_version_check "$PACKAGE_NAME"); then
break
fi
attempt=$((attempt + 1))
done
if [ $attempt -eq $max_attempts ]; then
echo "Error: Failed getting $PACKAGE_VERSION or $AVAILABLE_VERSION"
exit 1
fi
# Get integration ID
if ! INTEGRATION_ID=$(elastic_fleet_integration_id "$AGENT_POLICY" "$INTEGRATION"); then
exit 1
fi
if [[ "$PACKAGE_VERSION" != "$AVAILABLE_VERSION" ]]; then
# Dry run of the upgrade
echo ""
echo "Current $PACKAGE_NAME package version ($PACKAGE_VERSION) is not the same as the latest available package ($AVAILABLE_VERSION)..."
echo "Upgrading $INTEGRATION..."
echo "Starting dry run..."
if ! DRYRUN_OUTPUT=$(elastic_fleet_integration_policy_dryrun_upgrade "$INTEGRATION_ID"); then
exit 1
fi
DRYRUN_ERRORS=$(echo "$DRYRUN_OUTPUT" | jq .[].hasErrors)
# If no errors with dry run, proceed with actual upgrade
if [[ "$DRYRUN_ERRORS" == "false" ]]; then
echo "No errors detected. Proceeding with upgrade..."
if ! elastic_fleet_integration_policy_upgrade "$INTEGRATION_ID"; then
echo "Error: Upgrade failed for $PACKAGE_NAME with integration ID '$INTEGRATION_ID'."
ERROR=true
continue
fi
else
echo "Errors detected during dry run for $PACKAGE_NAME policy upgrade..."
ERROR=true
continue
fi
fi
{%- if not AUTO_UPGRADE_INTEGRATIONS %}
fi
{%- endif %}
fi fi
done AVAILABLE_VERSION=${LATEST_VERSION_CACHE[$PACKAGE_NAME]}
if [[ "$PACKAGE_VERSION" != "$AVAILABLE_VERSION" ]]; then
# Dry run, then (if clean) the actual upgrade, dispatched as a throttled background
# job. Each job builds its full log into one block, then flushes it under a single
# shared lock (OUTPUT_LOCK) so concurrent jobs never interleave on stdout; a failed
# job also appends a marker line to FAIL_FILE while holding that same lock.
elastic_fleet_throttle
{
block=$'\n'"Current $PACKAGE_NAME package version ($PACKAGE_VERSION) is not the same as the latest available package ($AVAILABLE_VERSION)..."$'\n'
block+="Upgrading $INTEGRATION..."$'\n'"Starting dry run..."$'\n'
fail=""
if ! DRYRUN_OUTPUT=$(elastic_fleet_integration_policy_dryrun_upgrade "$INTEGRATION_ID"); then
block+="Error: Failed to complete dry run for '$INTEGRATION_ID'."$'\n'
fail="dryrun $INTEGRATION"
elif [[ "$(jq .[].hasErrors <<<"$DRYRUN_OUTPUT")" == "false" ]]; then
block+="No errors detected. Proceeding with upgrade..."$'\n'
if ! elastic_fleet_integration_policy_upgrade "$INTEGRATION_ID"; then
block+="Error: Upgrade failed for $PACKAGE_NAME with integration ID '$INTEGRATION_ID'."$'\n'
fail="upgrade $INTEGRATION"
fi
else
block+="Errors detected during dry run for $PACKAGE_NAME policy upgrade..."$'\n'
fail="dryrun-errors $INTEGRATION"
fi
{
flock 9
printf '%s' "$block"
[ -n "$fail" ] && printf '%s\n' "$fail" >>"$FAIL_FILE"
} 9>>"$OUTPUT_LOCK"
} &
fi
done < <(jq -r --argjson defaults "$default_packages_json" '
.item.package_policies[]
| select(.name != "elastic-defend-endpoints")
| select(.name | startswith("fleet_server-") | not)
{%- if not AUTO_UPGRADE_INTEGRATIONS %}
| select(.package.name | IN($defaults[]))
{%- endif %}
| [.name, .package.name, .package.version, .id] | @tsv
' <<<"$POLICY_JSON")
done done
if [[ "$ERROR" == "true" ]]; then
# Barrier: wait for every dispatched dry-run/upgrade job to finish.
wait
if [ -s "$FAIL_FILE" ]; then
printf '\nFailed integration upgrades:\n'
cat "$FAIL_FILE"
exit 1 exit 1
fi fi
echo echo
@@ -16,7 +16,6 @@
STATE_FILE_SUCCESS=/opt/so/state/estemplates.txt STATE_FILE_SUCCESS=/opt/so/state/estemplates.txt
INSTALLED_PACKAGE_LIST=/tmp/esfleet_installed_packages.json INSTALLED_PACKAGE_LIST=/tmp/esfleet_installed_packages.json
BULK_INSTALL_PACKAGE_LIST=/tmp/esfleet_bulk_install.json BULK_INSTALL_PACKAGE_LIST=/tmp/esfleet_bulk_install.json
BULK_INSTALL_PACKAGE_TMP=/tmp/esfleet_bulk_install_tmp.json
BULK_INSTALL_OUTPUT=/opt/so/state/esfleet_bulk_install_results.json BULK_INSTALL_OUTPUT=/opt/so/state/esfleet_bulk_install_results.json
INTEGRATION_PACKAGE_COMPONENTS=/opt/so/state/esfleet_package_components.json INTEGRATION_PACKAGE_COMPONENTS=/opt/so/state/esfleet_package_components.json
INPUT_PACKAGE_COMPONENTS=/opt/so/state/esfleet_input_package_components.json INPUT_PACKAGE_COMPONENTS=/opt/so/state/esfleet_input_package_components.json
@@ -29,29 +28,6 @@ PENDING_UPDATE=false
# Requiring some level of manual Elastic Stack configuration before installation # Requiring some level of manual Elastic Stack configuration before installation
EXCLUDED_INTEGRATIONS=('apm') EXCLUDED_INTEGRATIONS=('apm')
version_conversion(){
version=$1
echo "$version" | awk -F '.' '{ printf("%d%03d%03d\n", $1, $2, $3); }'
}
compare_versions() {
version1=$1
version2=$2
# Convert versions to numbers
num1=$(version_conversion "$version1")
num2=$(version_conversion "$version2")
# Compare using bc
if (( $(echo "$num1 < $num2" | bc -l) )); then
echo "less"
elif (( $(echo "$num1 > $num2" | bc -l) )); then
echo "greater"
else
echo "equal"
fi
}
IFS=$'\n' IFS=$'\n'
agent_policies=$(elastic_fleet_agent_policy_ids) agent_policies=$(elastic_fleet_agent_policy_ids)
if [ $? -ne 0 ]; then if [ $? -ne 0 ]; then
@@ -63,23 +39,23 @@ default_packages=({% for pkg in SUPPORTED_PACKAGES %}"{{ pkg }}"{% if not loop.l
in_use_integrations=() in_use_integrations=()
# Fetch each agent policy once; its package_policies[] already contain both the integration name
# and the .package.name, so extract all non-default package names locally in a single jq instead
# of re-fetching the same policy per integration.
default_packages_json=$(printf '%s\n' "${default_packages[@]}" | jq -R . | jq -s '.')
for AGENT_POLICY in $agent_policies; do for AGENT_POLICY in $agent_policies; do
if ! integrations=$(elastic_fleet_integration_policy_names "$AGENT_POLICY"); then if ! policy_json=$(fleet_api "agent_policies/$AGENT_POLICY"); then
# skip the agent policy if we can't get required info, let salt retry. Integrations loaded by this script are non-default integrations. # skip the agent policy if we can't get required info, let salt retry. Integrations loaded by this script are non-default integrations.
echo "Skipping $AGENT_POLICY.. " echo "Skipping $AGENT_POLICY.. "
continue continue
fi fi
for INTEGRATION in $integrations; do # non-default integrations that are in-use in any policy
if ! PACKAGE_NAME=$(elastic_fleet_integration_policy_package_name "$AGENT_POLICY" "$INTEGRATION"); then while IFS= read -r PACKAGE_NAME; do
echo "Not adding $INTEGRATION, couldn't get package name" [ -n "$PACKAGE_NAME" ] && in_use_integrations+=("$PACKAGE_NAME")
continue done < <(jq -r --argjson defaults "$default_packages_json" \
fi '.item.package_policies[].package.name | select(. as $n | ($defaults | index($n)) | not)' \
# non-default integrations that are in-use in any policy <<<"$policy_json")
if ! [[ " ${default_packages[@]} " =~ " $PACKAGE_NAME " ]]; then
in_use_integrations+=("$PACKAGE_NAME")
fi
done
done done
if [[ -f $STATE_FILE_SUCCESS ]]; then if [[ -f $STATE_FILE_SUCCESS ]]; then
@@ -90,72 +66,55 @@ if [[ -f $STATE_FILE_SUCCESS ]]; then
rm -f $INSTALLED_PACKAGE_LIST rm -f $INSTALLED_PACKAGE_LIST
echo $latest_package_list | jq '{packages: [.items[] | {name: .name, latest_version: .version, installed_version: .installationInfo.version, subscription: .conditions.elastic.subscription }]}' >> $INSTALLED_PACKAGE_LIST echo $latest_package_list | jq '{packages: [.items[] | {name: .name, latest_version: .version, installed_version: .installationInfo.version, subscription: .conditions.elastic.subscription }]}' >> $INSTALLED_PACKAGE_LIST
while read -r package; do # Build the bulk install list and the per-package status messages with two jq passes
# get package details # instead of a per-package bash loop. The old loop forked ~10 processes per package
package_name=$(echo "$package" | jq -r '.name') # (5 jq + awk/bc for the version compare) and re-parsed/rewrote a growing JSON file on
latest_version=$(echo "$package" | jq -r '.latest_version') # every add (O(n^2)). Selection and messages below are identical to that logic.
installed_version=$(echo "$package" | jq -r '.installed_version') SUB={% if SUB %}true{% else %}false{% endif %}
subscription=$(echo "$package" | jq -r '.subscription') AUTOUP={% if AUTO_UPGRADE_INTEGRATIONS %}true{% else %}false{% endif %}
bulk_package=$(echo "$package" | jq '{name: .name, version: .latest_version}' ) EXCLUDED_JSON=$(printf '%s\n' "${EXCLUDED_INTEGRATIONS[@]}" | jq -R 'select(length>0)' | jq -s '.')
INUSE_JSON=$(printf '%s\n' "${in_use_integrations[@]}" | jq -R 'select(length>0)' | jq -s 'unique')
if [[ ! "${EXCLUDED_INTEGRATIONS[@]}" =~ "$package_name" ]]; then # vnum replicates the previous version_conversion (%d%03d%03d of the first three dotted
{% if not SUB %} # fields); needs() replicates the excluded/subscription/installed/upgrade/in-use logic.
if [[ "$subscription" != "basic" && "$subscription" != "null" && -n "$subscription" ]]; then JQ_DECISION='
# pass over integrations that require non-basic elastic license def vnum:
echo "$package_name integration requires an Elastic license of $subscription or greater... skipping" [ (split(".")|.[0:3][] | gsub("[^0-9].*";"") | (if .=="" then "0" else . end) | tonumber) ]
continue | (.[0]//0)*1000000 + (.[1]//0)*1000 + (.[2]//0);
else def needs($sub;$autoup;$excluded;$inuse):
if [[ "$installed_version" == "null" || -z "$installed_version" ]]; then .name as $n
echo "$package_name is not installed... Adding to next update." | ($n | IN($excluded[]) | not)
jq --argjson package "$bulk_package" '.packages += [$package]' $BULK_INSTALL_PACKAGE_LIST > $BULK_INSTALL_PACKAGE_TMP && mv $BULK_INSTALL_PACKAGE_TMP $BULK_INSTALL_PACKAGE_LIST and ( $sub or (.subscription==null or .subscription=="basic" or .subscription=="") )
and ( (.installed_version==null or .installed_version=="")
or ( ((.latest_version|vnum) > (.installed_version|vnum))
and ( $autoup or ($n | IN($inuse[]) | not) ) ) );'
PENDING_UPDATE=true JQ_ARGS=(--argjson sub "$SUB" --argjson autoup "$AUTOUP" --argjson excluded "$EXCLUDED_JSON" --argjson inuse "$INUSE_JSON")
else
results=$(compare_versions "$latest_version" "$installed_version")
if [ $results == "greater" ]; then
{#- When auto_upgrade_integrations is false, skip upgrading in_use_integrations #}
{%- if not AUTO_UPGRADE_INTEGRATIONS %}
if ! [[ " ${in_use_integrations[@]} " =~ " $package_name " ]]; then
{%- endif %}
echo "$package_name is at version $installed_version latest version is $latest_version... Adding to next update."
jq --argjson package "$bulk_package" '.packages += [$package]' $BULK_INSTALL_PACKAGE_LIST > $BULK_INSTALL_PACKAGE_TMP && mv $BULK_INSTALL_PACKAGE_TMP $BULK_INSTALL_PACKAGE_LIST
PENDING_UPDATE=true # (a) Per-package status messages (parity with the previous echo output).
{%- if not AUTO_UPGRADE_INTEGRATIONS %} jq -r "${JQ_ARGS[@]}" "$JQ_DECISION"'
else .packages[]
echo "skipping available upgrade for in use integration - $package_name." | .name as $n
fi | if ($n|IN($excluded[])) then "Skipping \($n)..."
{%- endif %} elif (($sub|not) and (.subscription!=null and .subscription!="basic" and .subscription!="")) then
fi "\($n) integration requires an Elastic license of \(.subscription) or greater... skipping"
fi elif (.installed_version==null or .installed_version=="") then
fi "\($n) is not installed... Adding to next update."
{% else %} elif ((.latest_version|vnum) > (.installed_version|vnum)) then
if [[ "$installed_version" == "null" || -z "$installed_version" ]]; then (if ($autoup or ($n|IN($inuse[])|not))
echo "$package_name is not installed... Adding to next update." then "\($n) is at version \(.installed_version) latest version is \(.latest_version)... Adding to next update."
jq --argjson package "$bulk_package" '.packages += [$package]' $BULK_INSTALL_PACKAGE_LIST > $BULK_INSTALL_PACKAGE_TMP && mv $BULK_INSTALL_PACKAGE_TMP $BULK_INSTALL_PACKAGE_LIST else "skipping available upgrade for in use integration - \($n)." end)
PENDING_UPDATE=true else empty end
else ' "$INSTALLED_PACKAGE_LIST"
results=$(compare_versions "$latest_version" "$installed_version")
if [ $results == "greater" ]; then # (b) The bulk install list, built in a single pass.
{#- When auto_upgrade_integrations is false, skip upgrading in_use_integrations #} jq "${JQ_ARGS[@]}" "$JQ_DECISION"'
{%- if not AUTO_UPGRADE_INTEGRATIONS %} {packages: [ .packages[] | select(needs($sub;$autoup;$excluded;$inuse)) | {name, version: .latest_version} ]}
if ! [[ " ${in_use_integrations[@]} " =~ " $package_name " ]]; then ' "$INSTALLED_PACKAGE_LIST" > "$BULK_INSTALL_PACKAGE_LIST"
{%- endif %}
echo "$package_name is at version $installed_version latest version is $latest_version... Adding to next update." if jq -e '.packages | length > 0' "$BULK_INSTALL_PACKAGE_LIST" >/dev/null; then
jq --argjson package "$bulk_package" '.packages += [$package]' $BULK_INSTALL_PACKAGE_LIST > $BULK_INSTALL_PACKAGE_TMP && mv $BULK_INSTALL_PACKAGE_TMP $BULK_INSTALL_PACKAGE_LIST PENDING_UPDATE=true
PENDING_UPDATE=true fi
{%- if not AUTO_UPGRADE_INTEGRATIONS %}
else
echo "skipping available upgrade for in use integration - $package_name."
fi
{%- endif %}
fi
fi
{% endif %}
else
echo "Skipping $package_name..."
fi
done <<< "$(jq -c '.packages[]' "$INSTALLED_PACKAGE_LIST")"
if [ "$PENDING_UPDATE" = true ]; then if [ "$PENDING_UPDATE" = true ]; then
# Run chunked install of packages # Run chunked install of packages
-1
View File
@@ -24,7 +24,6 @@ include:
so-elasticsearch: so-elasticsearch:
docker_container.running: docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-elasticsearch:{{ ELASTICSEARCHMERGED.version }} - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-elasticsearch:{{ ELASTICSEARCHMERGED.version }}
- restart_policy: unless-stopped
- hostname: elasticsearch - hostname: elasticsearch
- name: so-elasticsearch - name: so-elasticsearch
- user: elasticsearch - user: elasticsearch
@@ -11,10 +11,8 @@ ADDON_STATEFILE_SUCCESS=/opt/so/state/addon_estemplates.txt
ELASTICSEARCH_TEMPLATES_DIR="/opt/so/conf/elasticsearch/templates" ELASTICSEARCH_TEMPLATES_DIR="/opt/so/conf/elasticsearch/templates"
SO_TEMPLATES_DIR="${ELASTICSEARCH_TEMPLATES_DIR}/index" SO_TEMPLATES_DIR="${ELASTICSEARCH_TEMPLATES_DIR}/index"
ADDON_TEMPLATES_DIR="${ELASTICSEARCH_TEMPLATES_DIR}/addon-index" ADDON_TEMPLATES_DIR="${ELASTICSEARCH_TEMPLATES_DIR}/addon-index"
SO_LOAD_FAILURES=0 FAILED_NAMES=()
ADDON_LOAD_FAILURES=0 FAILED_COUNT=0
SO_LOAD_FAILURES_NAMES=()
ADDON_LOAD_FAILURES_NAMES=()
IS_HEAVYNODE="false" IS_HEAVYNODE="false"
FORCE="false" FORCE="false"
VERBOSE="false" VERBOSE="false"
@@ -46,20 +44,86 @@ while [[ $# -gt 0 ]]; do
shift shift
done done
# Max number of concurrent template PUT jobs. Override via env if needed.
MAX_TEMPLATE_JOBS=${MAX_TEMPLATE_JOBS:-10}
# Block until fewer than MAX_TEMPLATE_JOBS background jobs are running.
template_throttle() {
while (( $(jobs -rp | wc -l) >= MAX_TEMPLATE_JOBS )); do
wait -n
done
}
# Per-job failure markers and an output lock for serializing parallel job output.
# Each failed load drops one file (named after the template) into FAIL_DIR; the
# output of each job is flushed as a single block under flock so concurrent jobs
# never interleave their (chatty) retry output.
FAIL_DIR=$(mktemp -d)
OUTPUT_LOCK="${FAIL_DIR}/.output.lock"
: > "$OUTPUT_LOCK"
trap 'rm -rf "$FAIL_DIR"' EXIT
# Record a failure: $1 = the template name/path to report later. Slashes are
# encoded so the path becomes a safe single filename.
record_failure() {
local marker="${1//\//__}"
: > "${FAIL_DIR}/fail.${marker}"
}
# Populate FAILED_NAMES and FAILED_COUNT from the current phase's markers.
# Must run in the current shell (not a command substitution) so the array sticks.
collect_failures() {
FAILED_NAMES=()
FAILED_COUNT=0
local f name
shopt -s nullglob
for f in "${FAIL_DIR}"/fail.*; do
name="${f##*/fail.}"
name="${name//__//}"
FAILED_NAMES+=("$name")
FAILED_COUNT=$((FAILED_COUNT + 1))
done
shopt -u nullglob
}
# Clear markers and names between phases so SO and addon counts stay independent.
reset_failures() {
shopt -s nullglob
rm -f "${FAIL_DIR}"/fail.*
shopt -u nullglob
FAILED_NAMES=()
FAILED_COUNT=0
}
# Print a block of text atomically (under the shared output lock) so the output
# of concurrent background jobs is not interleaved.
locked_echo() {
{ flock 9; printf '%s\n' "$1"; } 9>>"$OUTPUT_LOCK"
}
# Loads one template file via PUT. Intended to be dispatched as a background job.
# $1 uri - e.g. _component_template/foo or _index_template/foo
# $2 file - path to the template JSON
# $3 report_name - name/path to record if this load fails
load_template() { load_template() {
local uri="$1" local uri="$1"
local file="$2" local file="$2"
local report_name="$3"
local out rc=0 block
echo "Loading template file $file" # Capture everything (including retry's diagnostic chatter) into one block so
if ! output=$(retry 3 3 "so-elasticsearch-query $uri -d@$file -XPUT" "{\"acknowledged\":true}"); then # concurrent jobs never interleave; the whole block is flushed under one flock.
echo "$output" block="Loading template file $file"$'\n'
if ! out=$(retry 3 3 "so-elasticsearch-query $uri -d@$file -XPUT" "{\"acknowledged\":true}" 2>&1); then
return 1 block+="$out"$'\n'
rc=1
elif [[ "$VERBOSE" == "true" ]]; then elif [[ "$VERBOSE" == "true" ]]; then
echo "$output" block+="$out"$'\n'
fi fi
{ flock 9; printf '%s' "$block"; } 9>>"$OUTPUT_LOCK"
(( rc != 0 )) && record_failure "$report_name"
} }
check_required_component_template_exists() { check_required_component_template_exists() {
@@ -110,6 +174,9 @@ load_component_templates() {
return return
fi fi
# Dispatch loads as throttled background jobs. The barrier (wait) happens in
# the caller after all component groups have been dispatched, since index
# templates must not load until every component template is in place.
for component in "$pattern"/*.json; do for component in "$pattern"/*.json; do
tmpl_name=$(basename "${component%.json}") tmpl_name=$(basename "${component%.json}")
@@ -118,10 +185,8 @@ load_component_templates() {
tmpl_name="${tmpl_name%-mappings}-mappings" tmpl_name="${tmpl_name%-mappings}-mappings"
fi fi
if ! load_template "_component_template/${tmpl_name}" "$component"; then template_throttle
SO_LOAD_FAILURES=$((SO_LOAD_FAILURES + 1)) load_template "_component_template/${tmpl_name}" "$component" "$component" &
SO_LOAD_FAILURES_NAMES+=("$component")
fi
done done
} }
@@ -180,6 +245,9 @@ if [[ "$FORCE" == "true" || ! -f "$SO_STATEFILE_SUCCESS" ]] && index_templates_e
load_component_templates "Elastic Agent" "elastic-agent" load_component_templates "Elastic Agent" "elastic-agent"
load_component_templates "Security Onion" "so" load_component_templates "Security Onion" "so"
# Barrier: every component template PUT must complete before we snapshot the
# component template list and start loading index templates that depend on them.
wait
component_templates=$(so-elasticsearch-component-templates-list) component_templates=$(so-elasticsearch-component-templates-list)
echo -e "Loading Security Onion index templates...\n" echo -e "Loading Security Onion index templates...\n"
for so_idx_tmpl in "${SO_TEMPLATES_DIR}"/*.json; do for so_idx_tmpl in "${SO_TEMPLATES_DIR}"/*.json; do
@@ -189,7 +257,7 @@ if [[ "$FORCE" == "true" || ! -f "$SO_STATEFILE_SUCCESS" ]] && index_templates_e
# TODO: Better way to load only heavynode specific templates # TODO: Better way to load only heavynode specific templates
if ! check_heavynode_compatiable_index_template "$tmpl_name"; then if ! check_heavynode_compatiable_index_template "$tmpl_name"; then
if [[ "$VERBOSE" == "true" ]]; then if [[ "$VERBOSE" == "true" ]]; then
echo "Skipping over $so_idx_tmpl, template is not a heavynode specific index template." locked_echo "Skipping over $so_idx_tmpl, template is not a heavynode specific index template."
fi fi
continue continue
@@ -197,32 +265,34 @@ if [[ "$FORCE" == "true" || ! -f "$SO_STATEFILE_SUCCESS" ]] && index_templates_e
fi fi
if check_required_component_template_exists "$so_idx_tmpl"; then if check_required_component_template_exists "$so_idx_tmpl"; then
if ! load_template "_index_template/$tmpl_name" "$so_idx_tmpl"; then template_throttle
SO_LOAD_FAILURES=$((SO_LOAD_FAILURES + 1)) load_template "_index_template/$tmpl_name" "$so_idx_tmpl" "$so_idx_tmpl" &
SO_LOAD_FAILURES_NAMES+=("$so_idx_tmpl")
fi
else else
echo "Skipping over $so_idx_tmpl due to missing required component template(s)." locked_echo "Skipping over $so_idx_tmpl due to missing required component template(s)."
SO_LOAD_FAILURES=$((SO_LOAD_FAILURES + 1)) record_failure "$so_idx_tmpl"
SO_LOAD_FAILURES_NAMES+=("$so_idx_tmpl")
continue continue
fi fi
done done
if [[ $SO_LOAD_FAILURES -eq 0 ]]; then # Barrier: all SO index template PUTs must finish before tallying failures.
wait
collect_failures
if [[ $FAILED_COUNT -eq 0 ]]; then
echo "All Security Onion core templates loaded successfully." echo "All Security Onion core templates loaded successfully."
touch "$SO_STATEFILE_SUCCESS" touch "$SO_STATEFILE_SUCCESS"
else else
echo "Encountered $SO_LOAD_FAILURES failure(s) loading templates:" echo "Encountered $FAILED_COUNT failure(s) loading templates:"
for failed_template in "${SO_LOAD_FAILURES_NAMES[@]}"; do for failed_template in "${FAILED_NAMES[@]}"; do
echo " - $failed_template" echo " - $failed_template"
done done
if [[ "$SHOULD_EXIT_ON_FAILURE" == "true" ]]; then if [[ "$SHOULD_EXIT_ON_FAILURE" == "true" ]]; then
fail "Failed to load all Security Onion core templates successfully." fail "Failed to load all Security Onion core templates successfully."
fi fi
fi fi
reset_failures
elif ! index_templates_exist "$SO_TEMPLATES_DIR"; then elif ! index_templates_exist "$SO_TEMPLATES_DIR"; then
echo "No Security Onion core index templates found in ${SO_TEMPLATES_DIR}, skipping." echo "No Security Onion core index templates found in ${SO_TEMPLATES_DIR}, skipping."
elif [[ -f "$SO_STATEFILE_SUCCESS" ]]; then elif [[ -f "$SO_STATEFILE_SUCCESS" ]]; then
@@ -241,26 +311,27 @@ if should_load_addon_templates; then
tmpl_name=$(basename "${addon_idx_tmpl%-template.json}") tmpl_name=$(basename "${addon_idx_tmpl%-template.json}")
if check_required_component_template_exists "$addon_idx_tmpl"; then if check_required_component_template_exists "$addon_idx_tmpl"; then
if ! load_template "_index_template/${tmpl_name}" "$addon_idx_tmpl"; then template_throttle
ADDON_LOAD_FAILURES=$((ADDON_LOAD_FAILURES + 1)) load_template "_index_template/${tmpl_name}" "$addon_idx_tmpl" "$addon_idx_tmpl" &
ADDON_LOAD_FAILURES_NAMES+=("$addon_idx_tmpl")
fi
else else
echo "Skipping over $addon_idx_tmpl due to missing required component template(s)." locked_echo "Skipping over $addon_idx_tmpl due to missing required component template(s)."
ADDON_LOAD_FAILURES=$((ADDON_LOAD_FAILURES + 1)) record_failure "$addon_idx_tmpl"
ADDON_LOAD_FAILURES_NAMES+=("$addon_idx_tmpl")
continue continue
fi fi
done done
if [[ $ADDON_LOAD_FAILURES -eq 0 ]]; then # Barrier: all addon index template PUTs must finish before tallying failures.
wait
collect_failures
if [[ $FAILED_COUNT -eq 0 ]]; then
echo "All addon integration templates loaded successfully." echo "All addon integration templates loaded successfully."
touch "$ADDON_STATEFILE_SUCCESS" touch "$ADDON_STATEFILE_SUCCESS"
else else
echo "Encountered $ADDON_LOAD_FAILURES failure(s) loading addon integration templates:" echo "Encountered $FAILED_COUNT failure(s) loading addon integration templates:"
for failed_template in "${ADDON_LOAD_FAILURES_NAMES[@]}"; do for failed_template in "${FAILED_NAMES[@]}"; do
echo " - $failed_template" echo " - $failed_template"
done done
if [[ "$SHOULD_EXIT_ON_FAILURE" == "true" ]]; then if [[ "$SHOULD_EXIT_ON_FAILURE" == "true" ]]; then
@@ -6,6 +6,37 @@
. /usr/sbin/so-common . /usr/sbin/so-common
MAX_JOBS=10
# Lock used to serialize block writes so concurrent jobs never interleave their output.
ILM_OUTPUT_LOCK=$(mktemp)
trap 'rm -f "$ILM_OUTPUT_LOCK"' EXIT
# Policies are loaded concurrently (up to MAX_JOBS at a time) for speed. Each policy's block is
# printed the moment its curl returns, so output appears in COMPLETION ORDER, not the order
# policies are defined in configuration.
echo "Loading ILM policies concurrently; output below appears in completion order, not configuration order."
echo
put_policy() {
local desc="$1" policyname="$2" data="$3" result
result=$(curl -K /opt/so/conf/elasticsearch/curl.config -s -k -L \
-X PUT "https://localhost:9200/_ilm/policy/${policyname}" \
-H 'Content-Type: application/json' -d"${data}")
# curl above ran in parallel; serialize just this block write so concurrent jobs never interleave.
{
flock 200
printf 'Setting up %s policy...\n%s\n\n' "${desc}" "${result}"
} 200>>"${ILM_OUTPUT_LOCK}"
}
# Block until fewer than MAX_JOBS background curls are running.
throttle() {
while (( $(jobs -rp | wc -l) >= MAX_JOBS )); do
wait -n
done
}
{%- from 'elasticsearch/template.map.jinja' import ES_INDEX_SETTINGS %} {%- from 'elasticsearch/template.map.jinja' import ES_INDEX_SETTINGS %}
{%- if GLOBALS.role != "so-heavynode" %} {%- if GLOBALS.role != "so-heavynode" %}
{%- from 'elasticsearch/template.map.jinja' import ALL_ADDON_SETTINGS %} {%- from 'elasticsearch/template.map.jinja' import ALL_ADDON_SETTINGS %}
@@ -14,35 +45,26 @@
{%- for index, settings in ES_INDEX_SETTINGS.items() %} {%- for index, settings in ES_INDEX_SETTINGS.items() %}
{%- if settings.policy is defined %} {%- if settings.policy is defined %}
{%- if index == 'so-logs-detections.alerts' %} {%- if index == 'so-logs-detections.alerts' %}
echo throttle
echo "Setting up so-logs-detections.alerts-so policy..." put_policy "so-logs-detections.alerts-so" "{{ index }}-so" '{ "policy": {{ settings.policy | tojson(true) }} }' &
curl -K /opt/so/conf/elasticsearch/curl.config -b "sid=$SESSIONCOOKIE" -s -k -L -X PUT "https://localhost:9200/_ilm/policy/{{ index }}-so" -H 'Content-Type: application/json' -d'{ "policy": {{ settings.policy | tojson(true) }} }'
echo
{%- elif index == 'so-logs-soc' %} {%- elif index == 'so-logs-soc' %}
echo throttle
echo "Setting up so-soc-logs policy..." put_policy "so-soc-logs" "so-soc-logs" '{ "policy": {{ settings.policy | tojson(true) }} }' &
curl -K /opt/so/conf/elasticsearch/curl.config -b "sid=$SESSIONCOOKIE" -s -k -L -X PUT "https://localhost:9200/_ilm/policy/so-soc-logs" -H 'Content-Type: application/json' -d'{ "policy": {{ settings.policy | tojson(true) }} }' throttle
echo put_policy "{{ index }}-logs" "{{ index }}-logs" '{ "policy": {{ settings.policy | tojson(true) }} }' &
echo
echo "Setting up {{ index }}-logs policy..."
curl -K /opt/so/conf/elasticsearch/curl.config -b "sid=$SESSIONCOOKIE" -s -k -L -X PUT "https://localhost:9200/_ilm/policy/{{ index }}-logs" -H 'Content-Type: application/json' -d'{ "policy": {{ settings.policy | tojson(true) }} }'
echo
{%- else %} {%- else %}
echo throttle
echo "Setting up {{ index }}-logs policy..." put_policy "{{ index }}-logs" "{{ index }}-logs" '{ "policy": {{ settings.policy | tojson(true) }} }' &
curl -K /opt/so/conf/elasticsearch/curl.config -b "sid=$SESSIONCOOKIE" -s -k -L -X PUT "https://localhost:9200/_ilm/policy/{{ index }}-logs" -H 'Content-Type: application/json' -d'{ "policy": {{ settings.policy | tojson(true) }} }'
echo
{%- endif %} {%- endif %}
{%- endif %} {%- endif %}
{%- endfor %} {%- endfor %}
echo
{%- if GLOBALS.role != "so-heavynode" %} {%- if GLOBALS.role != "so-heavynode" %}
{%- for index, settings in ALL_ADDON_SETTINGS.items() %} {%- for index, settings in ALL_ADDON_SETTINGS.items() %}
{%- if settings.policy is defined %} {%- if settings.policy is defined %}
echo throttle
echo "Setting up {{ index }}-logs policy..." put_policy "{{ index }}-logs" "{{ index }}-logs" '{ "policy": {{ settings.policy | tojson(true) }} }' &
curl -K /opt/so/conf/elasticsearch/curl.config -b "sid=$SESSIONCOOKIE" -s -k -L -X PUT "https://localhost:9200/_ilm/policy/{{ index }}-logs" -H 'Content-Type: application/json' -d'{ "policy": {{ settings.policy | tojson(true) }} }'
echo
{%- endif %} {%- endif %}
{%- endfor %} {%- endfor %}
{%- endif %} {%- endif %}
wait
-7
View File
@@ -1,10 +1,3 @@
global: global:
pcapengine: SURICATA pcapengine: SURICATA
pipeline: REDIS pipeline: REDIS
push:
enabled: true
highstate_interval_hours: 2
debounce_seconds: 30
drain_interval: 15
batch: '25%'
batch_wait: 15
-37
View File
@@ -59,41 +59,4 @@ global:
description: Allows use of Endgame with Security Onion. This feature requires a license from Endgame. description: Allows use of Endgame with Security Onion. This feature requires a license from Endgame.
global: True global: True
advanced: True advanced: True
push:
enabled:
description: Master kill-switch for the active push feature. When disabled, rule and pillar changes are picked up at the next scheduled highstate instead of being pushed immediately.
forcedType: bool
helpLink: push
global: True
highstate_interval_hours:
description: How often every minion in the grid runs a scheduled state.highstate, in hours. Lower values keep minions closer in sync at the cost of more load; higher values reduce load but increase worst-case latency for non-pushed changes. The salt-minion health check restarts a minion if its last highstate is older than this value plus one hour.
forcedType: int
helpLink: push
global: True
advanced: True
debounce_seconds:
description: Trailing-edge debounce window in seconds. A push intent must be quiet for this long before the drainer dispatches. Rapid bursts of edits within this window coalesce into one dispatch.
forcedType: int
helpLink: push
global: True
advanced: True
drain_interval:
description: How often the push drainer checks for ready intents, in seconds. Small values lower dispatch latency at the cost of more background work on the manager.
forcedType: int
helpLink: push
global: True
advanced: True
batch:
description: "Host batch size for push orchestrations. A number (e.g. '10') or a percentage (e.g. '25%'). Limits how many minions run the push state at once so large fleets don't thundering-herd."
helpLink: push
global: True
advanced: True
regex: '^([0-9]+%?)$'
regexFailureMessage: Enter a whole number or a whole-number percentage (e.g. 10 or 25%).
batch_wait:
description: Seconds to wait between host batches in a push orchestration. Gives the fleet time to breathe between waves.
forcedType: int
helpLink: push
global: True
advanced: True
-1
View File
@@ -58,7 +58,6 @@ so-hydra:
- {{ ULIMIT.name }}={{ ULIMIT.soft }}:{{ ULIMIT.hard }} - {{ ULIMIT.name }}={{ ULIMIT.soft }}:{{ ULIMIT.hard }}
{% endfor %} {% endfor %}
{% endif %} {% endif %}
# Intentionally unless-stopped -- matches the fleet default.
- restart_policy: unless-stopped - restart_policy: unless-stopped
- watch: - watch:
- file: hydraconfig - file: hydraconfig
-1
View File
@@ -15,7 +15,6 @@ include:
so-idh: so-idh:
docker_container.running: docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-idh:{{ GLOBALS.so_version }} - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-idh:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- name: so-idh - name: so-idh
- detach: True - detach: True
- network_mode: host - network_mode: host
-1
View File
@@ -18,7 +18,6 @@ include:
so-influxdb: so-influxdb:
docker_container.running: docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-influxdb:{{ GLOBALS.so_version }} - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-influxdb:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- hostname: influxdb - hostname: influxdb
- networks: - networks:
- sobridge: - sobridge:
-1
View File
@@ -27,7 +27,6 @@ include:
so-kafka: so-kafka:
docker_container.running: docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-kafka:{{ GLOBALS.so_version }} - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-kafka:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- hostname: so-kafka - hostname: so-kafka
- name: so-kafka - name: so-kafka
- networks: - networks:
-1
View File
@@ -16,7 +16,6 @@ include:
so-kibana: so-kibana:
docker_container.running: docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-kibana:{{ GLOBALS.so_version }} - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-kibana:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- hostname: kibana - hostname: kibana
- user: kibana - user: kibana
- networks: - networks:
-1
View File
@@ -51,7 +51,6 @@ so-kratos:
- {{ ULIMIT.name }}={{ ULIMIT.soft }}:{{ ULIMIT.hard }} - {{ ULIMIT.name }}={{ ULIMIT.soft }}:{{ ULIMIT.hard }}
{% endfor %} {% endfor %}
{% endif %} {% endif %}
# Intentionally unless-stopped -- matches the fleet default.
- restart_policy: unless-stopped - restart_policy: unless-stopped
- watch: - watch:
- file: kratosschema - file: kratosschema
-1
View File
@@ -28,7 +28,6 @@ include:
so-logstash: so-logstash:
docker_container.running: docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-logstash:{{ GLOBALS.so_version }} - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-logstash:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- hostname: so-logstash - hostname: so-logstash
- name: so-logstash - name: so-logstash
- networks: - networks:
-21
View File
@@ -1,21 +0,0 @@
{% from 'vars/globals.map.jinja' import GLOBALS %}
{% from 'global/map.jinja' import GLOBALMERGED %}
include:
- salt.minion
{% if GLOBALS.is_manager and GLOBALMERGED.push.enabled %}
salt_beacons_pushstate:
file.managed:
- name: /etc/salt/minion.d/beacons_pushstate.conf
- source: salt://manager/files/beacons_pushstate.conf.jinja
- template: jinja
- watch_in:
- service: salt_minion_service
{% else %}
salt_beacons_pushstate:
file.absent:
- name: /etc/salt/minion.d/beacons_pushstate.conf
- watch_in:
- service: salt_minion_service
{% endif %}
@@ -1,41 +0,0 @@
{% from 'global/map.jinja' import GLOBALMERGED %}
beacons:
pillar_db:
- interval: {{ GLOBALMERGED.push.drain_interval }}
- disable_during_state_run: True
inotify:
- disable_during_state_run: True
- coalesce: True
- files:
/opt/so/saltstack/local/salt/suricata/rules:
mask:
- close_write
- moved_to
- delete
recurse: True
auto_add: True
exclude:
- '\.sw[a-z]$':
regex: True
- '~$':
regex: True
- '/4913$':
regex: True
- '/\.#':
regex: True
/opt/so/saltstack/local/salt/strelka/rules/compiled:
mask:
- close_write
- moved_to
- delete
recurse: True
auto_add: True
exclude:
- '\.sw[a-z]$':
regex: True
- '~$':
regex: True
- '/4913$':
regex: True
- '/\.#':
regex: True
-2
View File
@@ -15,7 +15,6 @@ include:
- manager.elasticsearch - manager.elasticsearch
- manager.kibana - manager.kibana
- manager.managed_soc_annotations - manager.managed_soc_annotations
- manager.beacons
repo_log_dir: repo_log_dir:
file.directory: file.directory:
@@ -232,7 +231,6 @@ surifiltersrules:
- user: 939 - user: 939
- group: 939 - group: 939
{% else %} {% else %}
{{sls}}_state_not_allowed: {{sls}}_state_not_allowed:
-232
View File
@@ -1,232 +0,0 @@
#!/opt/saltstack/salt/bin/python3
# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
# https://securityonion.net/license; you may not use this file except in compliance with the
# Elastic License 2.0.
"""
so-push-drainer
===============
Scheduled drainer for the active-push feature. Runs on the manager every
drain_interval seconds (default 15) via a salt schedule in salt/schedule.sls.
For each intent file under /opt/so/state/push_pending/*.json whose last_touch
is older than debounce_seconds, this script:
* concatenates the actions lists from every ready intent
* dedupes by (state or __highstate__, tgt, tgt_type)
* dispatches a single `salt-run state.orchestrate orch.push_batch --async`
with the deduped actions list passed as pillar kwargs
* deletes the contributed intent files on successful dispatch
Reactor sls files (push_suricata, push_strelka, push_pillar) write intents
but never dispatch directly -- see plan
/home/mreeves/.claude/plans/goofy-marinating-hummingbird.md for the full design.
"""
import fcntl
import glob
import json
import logging
import logging.handlers
import os
import subprocess
import sys
import time
import salt.client
PENDING_DIR = '/opt/so/state/push_pending'
LOCK_FILE = os.path.join(PENDING_DIR, '.lock')
LOG_FILE = '/opt/so/log/salt/so-push-drainer.log'
HIGHSTATE_SENTINEL = '__highstate__'
def _make_logger():
logger = logging.getLogger('so-push-drainer')
logger.setLevel(logging.INFO)
if not logger.handlers:
os.makedirs(os.path.dirname(LOG_FILE), exist_ok=True)
handler = logging.handlers.RotatingFileHandler(
LOG_FILE, maxBytes=5 * 1024 * 1024, backupCount=3,
)
handler.setFormatter(logging.Formatter(
'%(asctime)s | %(levelname)s | %(message)s',
))
logger.addHandler(handler)
return logger
def _load_push_cfg():
"""Read the global:push pillar subtree via salt-call. Returns a dict."""
caller = salt.client.Caller()
cfg = caller.cmd('pillar.get', 'global:push', {})
return cfg if isinstance(cfg, dict) else {}
def _read_intent(path, log):
try:
with open(path, 'r') as f:
return json.load(f)
except (IOError, ValueError) as exc:
log.warning('cannot read intent %s: %s', path, exc)
return None
except Exception:
log.exception('unexpected error reading %s', path)
return None
def _dedupe_actions(actions):
seen = set()
deduped = []
for action in actions:
if not isinstance(action, dict):
continue
state_key = HIGHSTATE_SENTINEL if action.get('highstate') else action.get('state')
tgt = action.get('tgt')
tgt_type = action.get('tgt_type', 'compound')
if not state_key or not tgt:
continue
key = (state_key, tgt, tgt_type)
if key in seen:
continue
seen.add(key)
deduped.append(action)
return deduped
def _dispatch(actions, log):
pillar_arg = json.dumps({'actions': actions})
cmd = [
'salt-run',
'state.orchestrate',
'orch.push_batch',
'pillar={}'.format(pillar_arg),
'--async',
]
log.info('dispatching: %s', ' '.join(cmd[:3]) + ' pillar=<{} actions>'.format(len(actions)))
try:
result = subprocess.run(
cmd, check=True, capture_output=True, text=True, timeout=60,
)
except subprocess.CalledProcessError as exc:
log.error('dispatch failed (rc=%s): stdout=%s stderr=%s',
exc.returncode, exc.stdout, exc.stderr)
return False
except subprocess.TimeoutExpired:
log.error('dispatch timed out after 60s')
return False
except Exception:
log.exception('dispatch raised')
return False
log.info('dispatch accepted: %s', (result.stdout or '').strip())
return True
def main():
log = _make_logger()
if not os.path.isdir(PENDING_DIR):
# Nothing to do; reactors create the dir on first use.
return 0
try:
push = _load_push_cfg()
except Exception:
log.exception('failed to read global:push pillar; aborting drain pass')
return 1
if not push.get('enabled', True):
log.debug('push disabled; exiting')
return 0
debounce_seconds = int(push.get('debounce_seconds', 30))
os.makedirs(PENDING_DIR, exist_ok=True)
lock_fd = os.open(LOCK_FILE, os.O_CREAT | os.O_RDWR, 0o644)
try:
fcntl.flock(lock_fd, fcntl.LOCK_EX)
intent_files = [
p for p in sorted(glob.glob(os.path.join(PENDING_DIR, '*.json')))
if os.path.basename(p) != '.lock'
]
if not intent_files:
return 0
now = time.time()
ready = []
skipped = 0
broken = []
for path in intent_files:
intent = _read_intent(path, log)
if not isinstance(intent, dict):
broken.append(path)
continue
last_touch = intent.get('last_touch', 0)
if now - last_touch < debounce_seconds:
skipped += 1
continue
ready.append((path, intent))
for path in broken:
try:
os.unlink(path)
except OSError:
pass
if not ready:
if skipped:
log.debug('no ready intents (%d still in debounce window)', skipped)
return 0
combined_actions = []
oldest_first_touch = now
all_paths = []
for path, intent in ready:
combined_actions.extend(intent.get('actions', []) or [])
first = intent.get('first_touch', now)
if first < oldest_first_touch:
oldest_first_touch = first
all_paths.extend(intent.get('paths', []) or [])
deduped = _dedupe_actions(combined_actions)
if not deduped:
log.warning('%d intent(s) had no usable actions; clearing', len(ready))
for path, _ in ready:
try:
os.unlink(path)
except OSError:
pass
return 0
debounce_duration = now - oldest_first_touch
log.info(
'draining %d intent(s): %d action(s) after dedupe (raw=%d), '
'debounce_duration=%.1fs, paths=%s',
len(ready), len(deduped), len(combined_actions),
debounce_duration, all_paths[:20],
)
if not _dispatch(deduped, log):
log.warning('dispatch failed; leaving intent files in place for retry')
return 1
for path, _ in ready:
try:
os.unlink(path)
except OSError:
log.exception('failed to remove drained intent %s', path)
return 0
finally:
try:
fcntl.flock(lock_fd, fcntl.LOCK_UN)
finally:
os.close(lock_fd)
if __name__ == '__main__':
sys.exit(main())
+5 -4
View File
@@ -343,10 +343,11 @@ highstate() {
masterlock() { masterlock() {
echo "Locking Salt Master" echo "Locking Salt Master"
mv -v $TOPFILE $BACKUPTOPFILE mv -v $TOPFILE $BACKUPTOPFILE
echo "base:" > $TOPFILE # Render the real top file only for the host running soup; every other
echo " $MINIONID:" >> $TOPFILE # minion gets an empty top (no states) while the master is upgrading.
echo " - ca" >> $TOPFILE echo "{% if grains['id'] == '$MINIONID' %}" > $TOPFILE
echo " - elasticsearch" >> $TOPFILE cat $BACKUPTOPFILE >> $TOPFILE
echo "{% endif %}" >> $TOPFILE
} }
masterunlock() { masterunlock() {
-1
View File
@@ -34,7 +34,6 @@ make-rule-dir-nginx:
so-nginx: so-nginx:
docker_container.running: docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-nginx:{{ GLOBALS.so_version }} - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-nginx:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- hostname: so-nginx - hostname: so-nginx
- networks: - networks:
- sobridge: - sobridge:
-37
View File
@@ -1,37 +0,0 @@
{% from 'global/map.jinja' import GLOBALMERGED %}
{% set actions = salt['pillar.get']('actions', []) %}
{% set BATCH = GLOBALMERGED.push.batch %}
{% set BATCH_WAIT = GLOBALMERGED.push.batch_wait %}
{% for action in actions %}
{% if action.get('highstate') %}
apply_highstate_{{ loop.index }}:
salt.state:
- tgt: '{{ action.tgt }}'
- tgt_type: {{ action.get('tgt_type', 'compound') }}
- highstate: True
- batch: {{ action.get('batch', BATCH) }}
- batch_wait: {{ action.get('batch_wait', BATCH_WAIT) }}
- kwarg:
queue: 2
{% else %}
refresh_pillar_{{ loop.index }}:
salt.function:
- name: saltutil.refresh_pillar
- tgt: '{{ action.tgt }}'
- tgt_type: {{ action.get('tgt_type', 'compound') }}
apply_{{ action.state | replace('.', '_') }}_{{ loop.index }}:
salt.state:
- tgt: '{{ action.tgt }}'
- tgt_type: {{ action.get('tgt_type', 'compound') }}
- sls:
- {{ action.state }}
- batch: {{ action.get('batch', BATCH) }}
- batch_wait: {{ action.get('batch_wait', BATCH_WAIT) }}
- kwarg:
queue: 2
- require:
- salt: refresh_pillar_{{ loop.index }}
{% endif %}
{% endfor %}
-240
View File
@@ -1,240 +0,0 @@
# One pillar directory can map to multiple (state, tgt) actions.
# tgt is a raw salt compound expression. tgt_type is always "compound".
# Per-action `batch` / `batch_wait` override the orch defaults (25% / 15s).
# An action with `highstate: True` triggers state.highstate instead of
# state.apply -- see salt/orch/push_batch.sls.
#
# Notes:
# - `bpf` is a pillar-only dir (no state of its own) consumed by both
# zeek and suricata via macros, so a bpf pillar change re-applies both.
# - suricata/strelka/zeek/elasticsearch/redis/kafka/logstash etc. have
# their own pillar dirs AND their own state, so they map 1:1 (or 1:2
# in strelka's case, because of the split init.sls / manager.sls).
#
# Intentional omissions (these will log a "not in pillar_push_map.yaml"
# warning in push_pillar.sls and wait for the next scheduled highstate):
# - `data` and `node_data`: pillar-only data consumed by many states;
# handling them generically would amount to a fleetwide highstate.
# - `host`: soc_host describes mainint/mainip; a change is a re-IP and
# needs a coordinated procedure, not an immediate state push.
# - `hypervisor`: state changes touch libvirt and are disruptive; leave
# to the next scheduled highstate.
# - `sensor`: every field in soc_sensor.yaml is `readonly: True` or
# per-minion (`node: True`). Per-minion edits are persisted under
# pillar/minions/<id>.sls and are handled by Branch A of push_pillar.sls
# (per-minion highstate intent), not by this app-pillar map.
#
# The role sets here were verified line-by-line against salt/top.sls. If
# salt/top.sls changes how an app is targeted, update the corresponding
# compound here.
# firewall: the one pillar everyone touches. Applied everywhere intentionally
# because every host's iptables needs to know about every other host in the
# grid. Salt's firewall state is idempotent (file.managed + iptables-restore
# onchanges in salt/firewall/init.sls), so hosts whose rendered firewall is
# unchanged do a file comparison and no-op without touching iptables -- actual
# reload happens only on the hosts whose rules actually changed. Fleetwide
# blast radius is intentional and matches the pre-plan behavior via highstate.
# Adding N sensors in a burst coalesces into one dispatch via the drainer.
firewall:
- state: firewall
tgt: '*'
# backup: backup.config_backup runs on eval, standalone, manager, managerhype,
# managersearch (NOT import -- the backup pillar is included on import per
# pillar/top.sls but the backup state is not run there per salt/top.sls).
backup:
- state: backup.config_backup
tgt: 'G@role:so-eval or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone'
# bpf is pillar-only (no state); consumed by both zeek and suricata as macros.
# Both states run on sensor_roles + so-import per salt/top.sls.
bpf:
- state: zeek
tgt: 'G@role:so-eval or G@role:so-heavynode or G@role:so-import or G@role:so-sensor or G@role:so-standalone'
- state: suricata
tgt: 'G@role:so-eval or G@role:so-heavynode or G@role:so-import or G@role:so-sensor or G@role:so-standalone'
# ca is applied universally.
ca:
- state: ca
tgt: '*'
# docker: universal. The docker state is in both the all-non-managers and
# all-managers branches of salt/top.sls.
docker:
- state: docker
tgt: '*'
# elastalert: eval, standalone, manager, managerhype, managersearch (NOT import).
elastalert:
- state: elastalert
tgt: 'G@role:so-eval or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone'
# elastic-fleet-package-registry: manager_roles exactly.
elastic-fleet-package-registry:
- state: elastic-fleet-package-registry
tgt: 'G@role:so-eval or G@role:so-import or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone'
# elasticsearch: 8 roles.
elasticsearch:
- state: elasticsearch
tgt: 'G@role:so-eval or G@role:so-heavynode or G@role:so-import or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-searchnode or G@role:so-standalone'
# elasticagent: so-heavynode only.
elasticagent:
- state: elasticagent
tgt: 'G@role:so-heavynode'
# elasticfleet: base state only on pillar change. elasticfleet.install_agent_grid
# is a deploy/enrollment step, not a config reload; leave it to the next highstate.
elasticfleet:
- state: elasticfleet
tgt: 'G@role:so-eval or G@role:so-fleet or G@role:so-import or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone'
# global: fanout to a fleetwide highstate. The global pillar (soc_global.sls)
# carries cross-cutting settings (pipeline, url_base, imagerepo, mdengine, ...)
# that are consumed by virtually every state, so a targeted re-apply isn't
# meaningful. The drainer's batch/batch_wait throttling controls blast radius.
global:
- highstate: True
tgt: '*'
# healthcheck: eval, sensor, standalone only.
healthcheck:
- state: healthcheck
tgt: 'G@role:so-eval or G@role:so-sensor or G@role:so-standalone'
# hydra: manager_roles exactly.
hydra:
- state: hydra
tgt: 'G@role:so-eval or G@role:so-import or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone'
# idh: so-idh only.
idh:
- state: idh
tgt: 'G@role:so-idh'
# influxdb: manager_roles exactly.
influxdb:
- state: influxdb
tgt: 'G@role:so-eval or G@role:so-import or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone'
# kafka: standalone, manager, managerhype, managersearch, searchnode, receiver.
kafka:
- state: kafka
tgt: 'G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-receiver or G@role:so-searchnode or G@role:so-standalone'
# kibana: manager_roles exactly.
kibana:
- state: kibana
tgt: 'G@role:so-eval or G@role:so-import or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone'
# kratos: manager_roles exactly.
kratos:
- state: kratos
tgt: 'G@role:so-eval or G@role:so-import or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone'
# logrotate: universal (top-of-file '*' branch in salt/top.sls).
logrotate:
- state: logrotate
tgt: '*'
# logstash: 8 roles, no eval/import.
logstash:
- state: logstash
tgt: 'G@role:so-fleet or G@role:so-heavynode or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-receiver or G@role:so-searchnode or G@role:so-standalone'
# manager: manager_roles exactly. The manager state is also referenced under
# *_sensor / *_heavynode top.sls blocks via `sensor`, but the standalone
# `manager` state itself runs only on manager_roles.
manager:
- state: manager
tgt: 'G@role:so-eval or G@role:so-import or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone'
# nginx: 10 specific roles. NOT receiver, idh, hypervisor, desktop.
nginx:
- state: nginx
tgt: 'G@role:so-eval or G@role:so-fleet or G@role:so-heavynode or G@role:so-import or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-searchnode or G@role:so-sensor or G@role:so-standalone'
# ntp: universal (top-of-file '*' branch in salt/top.sls).
ntp:
- state: ntp
tgt: '*'
# patch: universal. soc_patch carries the OS update schedule, applied via
# patch.os.schedule on every node (it's in both the all-non-managers and
# all-managers branches of salt/top.sls).
patch:
- state: patch.os.schedule
tgt: '*'
# postgres: manager_roles exactly.
postgres:
- state: postgres
tgt: 'G@role:so-eval or G@role:so-import or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone'
# redis: 6 roles. standalone, manager, managerhype, managersearch, heavynode, receiver.
# (NOT eval, NOT import, NOT searchnode.)
redis:
- state: redis
tgt: 'G@role:so-heavynode or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-receiver or G@role:so-standalone'
# registry: manager_roles exactly.
registry:
- state: registry
tgt: 'G@role:so-eval or G@role:so-import or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone'
# sensoroni: universal.
sensoroni:
- state: sensoroni
tgt: '*'
# soc: manager_roles exactly.
soc:
- state: soc
tgt: 'G@role:so-eval or G@role:so-import or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone'
# stig: broad. Runs on standalone, manager, managerhype, managersearch,
# searchnode, sensor, receiver, fleet, hypervisor, desktop.
# NOT eval, NOT import, NOT heavynode, NOT idh (the *_idh block in
# salt/top.sls intentionally omits stig).
stig:
- state: stig
tgt: 'G@role:so-desktop or G@role:so-fleet or G@role:so-hypervisor or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-receiver or G@role:so-searchnode or G@role:so-sensor or G@role:so-standalone'
# strelka: sensor-side only on pillar change (sensor_roles). strelka.manager is
# intentionally NOT fired on pillar changes -- YARA rule and strelka config
# pillar changes are consumed by the sensor-side strelka backend, and re-running
# strelka.manager on managers is both unnecessary and disruptive. strelka.manager
# is left to the 2-hour highstate.
strelka:
- state: strelka
tgt: 'G@role:so-eval or G@role:so-heavynode or G@role:so-sensor or G@role:so-standalone'
# suricata: sensor_roles + so-import (5 roles).
suricata:
- state: suricata
tgt: 'G@role:so-eval or G@role:so-heavynode or G@role:so-import or G@role:so-sensor or G@role:so-standalone'
# telegraf: universal.
telegraf:
- state: telegraf
tgt: '*'
# versionlock: universal (top-of-file '*' branch in salt/top.sls).
versionlock:
- state: versionlock
tgt: '*'
# vm: libvirt-driver hypervisors only. Matched by the salt-cloud:driver:libvirt
# grain (compound supports nested grain matching via G@<key>:<subkey>:<value>).
# pillar/vm/soc_vm.sls write path is referenced at salt/_runners/setup_hypervisor.py:856.
vm:
- state: vm
tgt: 'G@salt-cloud:driver:libvirt'
# zeek: sensor_roles + so-import (5 roles).
zeek:
- state: zeek
tgt: 'G@role:so-eval or G@role:so-heavynode or G@role:so-import or G@role:so-sensor or G@role:so-standalone'
-176
View File
@@ -1,176 +0,0 @@
#!py
# Reactor invoked by the pillar_db beacon when SOC records settings changes in
# the so_soc.audit_settings table (see salt/_beacons/pillar_db.py). The beacon
# emits one event per new row carrying setting_id and node_id.
#
# Two branches, keyed on node_id:
# A) node_id populated -> the change is scoped to that one minion. Look up the
# app in pillar_push_map.yaml and write an intent that runs the app's mapped
# state(s) targeted to just that node.
# B) node_id empty -> grid-wide app change. Look up the app in
# pillar_push_map.yaml and write an intent with the entry's actions as-is.
#
# The app name is the first dotted segment of setting_id (e.g. "telegraf.output"
# -> "telegraf"), which matches the pillar_push_map.yaml keys 1:1.
#
# Reactors never dispatch directly. The so-push-drainer schedule picks up
# ready intents, dedupes across pending files, and dispatches orch.push_batch.
import fcntl
import json
import logging
import os
import time
from salt.client import Caller
import yaml
LOG = logging.getLogger(__name__)
PENDING_DIR = '/opt/so/state/push_pending'
LOCK_FILE = os.path.join(PENDING_DIR, '.lock')
MAX_PATHS = 20
# The pillar_push_map.yaml is shipped via salt:// but the reactor runs on the
# master, which mounts the default saltstack tree at this path.
PUSH_MAP_PATH = '/opt/so/saltstack/default/salt/reactor/pillar_push_map.yaml'
_PUSH_MAP_CACHE = {'mtime': 0, 'data': None}
def _load_push_map():
try:
st = os.stat(PUSH_MAP_PATH)
except OSError:
LOG.warning('push_pillar: %s not found', PUSH_MAP_PATH)
return {}
if _PUSH_MAP_CACHE['mtime'] != st.st_mtime:
try:
with open(PUSH_MAP_PATH, 'r') as f:
_PUSH_MAP_CACHE['data'] = yaml.safe_load(f) or {}
except Exception:
LOG.exception('push_pillar: failed to load %s', PUSH_MAP_PATH)
_PUSH_MAP_CACHE['data'] = {}
_PUSH_MAP_CACHE['mtime'] = st.st_mtime
return _PUSH_MAP_CACHE['data'] or {}
def _push_enabled():
try:
caller = Caller()
return bool(caller.cmd('pillar.get', 'global:push:enabled', True))
except Exception:
LOG.exception('push_pillar: pillar.get global:push:enabled failed, assuming enabled')
return True
def _write_intent(key, actions, path):
now = time.time()
try:
os.makedirs(PENDING_DIR, exist_ok=True)
except OSError:
LOG.exception('push_pillar: cannot create %s', PENDING_DIR)
return
intent_path = os.path.join(PENDING_DIR, '{}.json'.format(key))
lock_fd = os.open(LOCK_FILE, os.O_CREAT | os.O_RDWR, 0o644)
try:
fcntl.flock(lock_fd, fcntl.LOCK_EX)
intent = {}
if os.path.exists(intent_path):
try:
with open(intent_path, 'r') as f:
intent = json.load(f)
except (IOError, ValueError):
intent = {}
intent.setdefault('first_touch', now)
intent['last_touch'] = now
intent['actions'] = actions
paths = intent.get('paths', [])
if path and path not in paths:
paths.append(path)
paths = paths[-MAX_PATHS:]
intent['paths'] = paths
tmp_path = intent_path + '.tmp'
with open(tmp_path, 'w') as f:
json.dump(intent, f)
os.rename(tmp_path, intent_path)
except Exception:
LOG.exception('push_pillar: failed to write intent %s', intent_path)
finally:
try:
fcntl.flock(lock_fd, fcntl.LOCK_UN)
finally:
os.close(lock_fd)
def _app_from_setting(setting_id):
# setting_id is e.g. 'telegraf.output' -> 'telegraf', 'ntp.config.servers' -> 'ntp'
if not setting_id:
return None
return setting_id.split('.', 1)[0] or None
def _node_actions(entry, node_id):
# Copy the app's mapped actions but retarget each one to the single node.
# Preserves the state/highstate selection and any batch/batch_wait overrides.
actions = []
for action in entry:
if not isinstance(action, dict):
continue
node_action = dict(action)
node_action['tgt'] = node_id
node_action['tgt_type'] = 'glob'
actions.append(node_action)
return actions
def run():
if not _push_enabled():
LOG.info('push_pillar: push disabled, skipping')
return {}
# The pillar_db beacon nests its payload under data['data']; fall back to the
# top level so the reactor is robust to either shape.
event = data.get('data', data) # noqa: F821 -- data provided by reactor
setting_id = event.get('setting_id', '')
node_id = (event.get('node_id') or '').strip()
app = _app_from_setting(setting_id)
if not app:
LOG.debug('push_pillar: ignoring event with no app segment: setting_id=%s', setting_id)
return {}
push_map = _load_push_map()
entry = push_map.get(app)
if not entry:
LOG.warning(
'push_pillar: app "%s" is not in pillar_push_map.yaml; change will be '
'picked up at the next scheduled highstate (setting_id=%s)',
app, setting_id,
)
return {}
# Branch A: per-node change -> retarget the app's states to just that node.
if node_id:
actions = _node_actions(entry, node_id)
if not actions:
LOG.warning('push_pillar: no usable actions for app "%s" (setting_id=%s)', app, setting_id)
return {}
_write_intent(
'node_{}_{}'.format(node_id, app), actions,
'audit:{}@{}'.format(setting_id, node_id),
)
LOG.info('push_pillar: per-node intent updated for %s on %s (setting_id=%s)',
app, node_id, setting_id)
return {}
# Branch B: grid-wide app change -> use the map entry's actions as-is.
actions = list(entry) # copy to avoid mutating the cache
_write_intent('pillar_{}'.format(app), actions, 'audit:{}'.format(setting_id))
LOG.info('push_pillar: app intent updated for %s (setting_id=%s)', app, setting_id)
return {}
-96
View File
@@ -1,96 +0,0 @@
#!py
# Reactor invoked by the inotify beacon on rule file changes under
# /opt/so/saltstack/local/salt/strelka/rules/compiled/.
#
# Writes (or updates) a push intent at /opt/so/state/push_pending/rules_strelka.json
# and returns {}. The so-push-drainer schedule picks up ready intents, dedupes
# across pending files, and dispatches orch.push_batch. Reactors never dispatch
# directly -- see plan /home/mreeves/.claude/plans/goofy-marinating-hummingbird.md.
import fcntl
import json
import logging
import os
import time
from salt.client import Caller
LOG = logging.getLogger(__name__)
PENDING_DIR = '/opt/so/state/push_pending'
LOCK_FILE = os.path.join(PENDING_DIR, '.lock')
MAX_PATHS = 20
# Mirrors GLOBALS.sensor_roles in salt/vars/globals.map.jinja. Sensor-side
# strelka runs on exactly these four roles; so-import gets strelka.manager
# instead, which is not fired on pillar changes.
SENSOR_ROLES = ['so-eval', 'so-heavynode', 'so-sensor', 'so-standalone']
def _sensor_compound():
return ' or '.join('G@role:{}'.format(r) for r in SENSOR_ROLES)
def _push_enabled():
try:
caller = Caller()
return bool(caller.cmd('pillar.get', 'global:push:enabled', True))
except Exception:
LOG.exception('push_strelka: pillar.get global:push:enabled failed, assuming enabled')
return True
def _write_intent(key, actions, path):
now = time.time()
try:
os.makedirs(PENDING_DIR, exist_ok=True)
except OSError:
LOG.exception('push_strelka: cannot create %s', PENDING_DIR)
return
intent_path = os.path.join(PENDING_DIR, '{}.json'.format(key))
lock_fd = os.open(LOCK_FILE, os.O_CREAT | os.O_RDWR, 0o644)
try:
fcntl.flock(lock_fd, fcntl.LOCK_EX)
intent = {}
if os.path.exists(intent_path):
try:
with open(intent_path, 'r') as f:
intent = json.load(f)
except (IOError, ValueError):
intent = {}
intent.setdefault('first_touch', now)
intent['last_touch'] = now
intent['actions'] = actions
paths = intent.get('paths', [])
if path and path not in paths:
paths.append(path)
paths = paths[-MAX_PATHS:]
intent['paths'] = paths
tmp_path = intent_path + '.tmp'
with open(tmp_path, 'w') as f:
json.dump(intent, f)
os.rename(tmp_path, intent_path)
except Exception:
LOG.exception('push_strelka: failed to write intent %s', intent_path)
finally:
try:
fcntl.flock(lock_fd, fcntl.LOCK_UN)
finally:
os.close(lock_fd)
def run():
if not _push_enabled():
LOG.info('push_strelka: push disabled, skipping')
return {}
path = data.get('path', '') # noqa: F821 -- data provided by reactor
actions = [{'state': 'strelka', 'tgt': _sensor_compound()}]
_write_intent('rules_strelka', actions, path)
LOG.info('push_strelka: intent updated for path=%s', path)
return {}
-95
View File
@@ -1,95 +0,0 @@
#!py
# Reactor invoked by the inotify beacon on rule file changes under
# /opt/so/saltstack/local/salt/suricata/rules/.
#
# Writes (or updates) a push intent at /opt/so/state/push_pending/rules_suricata.json
# and returns {}. The so-push-drainer schedule picks up ready intents, dedupes
# across pending files, and dispatches orch.push_batch. Reactors never dispatch
# directly -- see plan /home/mreeves/.claude/plans/goofy-marinating-hummingbird.md.
import fcntl
import json
import logging
import os
import time
from salt.client import Caller
LOG = logging.getLogger(__name__)
PENDING_DIR = '/opt/so/state/push_pending'
LOCK_FILE = os.path.join(PENDING_DIR, '.lock')
MAX_PATHS = 20
# Mirrors GLOBALS.sensor_roles in salt/vars/globals.map.jinja. Suricata also
# runs on so-import per salt/top.sls, so that role is appended below.
SENSOR_ROLES = ['so-eval', 'so-heavynode', 'so-sensor', 'so-standalone']
def _sensor_compound_plus_import():
return ' or '.join('G@role:{}'.format(r) for r in SENSOR_ROLES) + ' or G@role:so-import'
def _push_enabled():
try:
caller = Caller()
return bool(caller.cmd('pillar.get', 'global:push:enabled', True))
except Exception:
LOG.exception('push_suricata: pillar.get global:push:enabled failed, assuming enabled')
return True
def _write_intent(key, actions, path):
now = time.time()
try:
os.makedirs(PENDING_DIR, exist_ok=True)
except OSError:
LOG.exception('push_suricata: cannot create %s', PENDING_DIR)
return
intent_path = os.path.join(PENDING_DIR, '{}.json'.format(key))
lock_fd = os.open(LOCK_FILE, os.O_CREAT | os.O_RDWR, 0o644)
try:
fcntl.flock(lock_fd, fcntl.LOCK_EX)
intent = {}
if os.path.exists(intent_path):
try:
with open(intent_path, 'r') as f:
intent = json.load(f)
except (IOError, ValueError):
intent = {}
intent.setdefault('first_touch', now)
intent['last_touch'] = now
intent['actions'] = actions
paths = intent.get('paths', [])
if path and path not in paths:
paths.append(path)
paths = paths[-MAX_PATHS:]
intent['paths'] = paths
tmp_path = intent_path + '.tmp'
with open(tmp_path, 'w') as f:
json.dump(intent, f)
os.rename(tmp_path, intent_path)
except Exception:
LOG.exception('push_suricata: failed to write intent %s', intent_path)
finally:
try:
fcntl.flock(lock_fd, fcntl.LOCK_UN)
finally:
os.close(lock_fd)
def run():
if not _push_enabled():
LOG.info('push_suricata: push disabled, skipping')
return {}
path = data.get('path', '') # noqa: F821 -- data provided by reactor
actions = [{'state': 'suricata', 'tgt': _sensor_compound_plus_import()}]
_write_intent('rules_suricata', actions, path)
LOG.info('push_suricata: intent updated for path=%s', path)
return {}
-1
View File
@@ -17,7 +17,6 @@ include:
so-redis: so-redis:
docker_container.running: docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-redis:{{ GLOBALS.so_version }} - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-redis:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- hostname: so-redis - hostname: so-redis
- user: socore - user: socore
- networks: - networks:
-3
View File
@@ -21,9 +21,6 @@ so-dockerregistry:
- networks: - networks:
- sobridge: - sobridge:
- ipv4_address: {{ DOCKERMERGED.containers['so-dockerregistry'].ip }} - ipv4_address: {{ DOCKERMERGED.containers['so-dockerregistry'].ip }}
# Intentionally `always` (not unless-stopped) -- registry is critical infra
# and must come back up even if it was manually stopped. Do not homogenize
# to unless-stopped; see the container auto-restart section of the plan.
- restart_policy: always - restart_policy: always
- port_bindings: - port_bindings:
{% for BINDING in DOCKERMERGED.containers['so-dockerregistry'].port_bindings %} {% for BINDING in DOCKERMERGED.containers['so-dockerregistry'].port_bindings %}
+1 -2
View File
@@ -3,7 +3,7 @@
{% set SCHEDULE = salt['pillar.get']('healthcheck:schedule', 30) %} {% set SCHEDULE = salt['pillar.get']('healthcheck:schedule', 30) %}
include: include:
- salt.minion - salt
{% if CHECKS and ENABLED %} {% if CHECKS and ENABLED %}
salt_beacons: salt_beacons:
@@ -23,4 +23,3 @@ salt_beacons:
- watch_in: - watch_in:
- service: salt_minion_service - service: salt_minion_service
{% endif %} {% endif %}
-11
View File
@@ -1,11 +0,0 @@
reactor:
- 'salt/beacon/*/inotify//opt/so/saltstack/local/salt/suricata/rules':
- salt://reactor/push_suricata.sls
- 'salt/beacon/*/inotify//opt/so/saltstack/local/salt/suricata/rules/*':
- salt://reactor/push_suricata.sls
- 'salt/beacon/*/inotify//opt/so/saltstack/local/salt/strelka/rules/compiled':
- salt://reactor/push_strelka.sls
- 'salt/beacon/*/inotify//opt/so/saltstack/local/salt/strelka/rules/compiled/*':
- salt://reactor/push_strelka.sls
- 'salt/beacon/*/pillar_db/audit_settings':
- salt://reactor/push_pillar.sls
-8
View File
@@ -5,11 +5,3 @@ salt_bootstrap:
- source: salt://salt/scripts/bootstrap-salt.sh - source: salt://salt/scripts/bootstrap-salt.sh
- mode: 755 - mode: 755
- show_changes: False - show_changes: False
salt_sbin:
file.recurse:
- name: /usr/sbin
- source: salt://salt/tools/sbin
- user: 939
- group: 939
- file_mode: 755
+1 -1
View File
@@ -1,4 +1,4 @@
lasthighstate: lasthighstate:
file.touch: file.touch:
- name: /opt/so/log/salt/lasthighstate - name: /opt/so/log/salt/lasthighstate
- order: 9001 - order: last
+1 -18
View File
@@ -10,12 +10,10 @@
# software that is protected by the license key." # software that is protected by the license key."
{% from 'allowed_states.map.jinja' import allowed_states %} {% from 'allowed_states.map.jinja' import allowed_states %}
{% from 'global/map.jinja' import GLOBALMERGED %}
{% if sls in allowed_states %} {% if sls in allowed_states %}
include: include:
- salt.minion - salt.minion
- salt.master.pyinotify
- salt.master.boot_mine_update - salt.master.boot_mine_update
{% if 'vrt' in salt['pillar.get']('features', []) %} {% if 'vrt' in salt['pillar.get']('features', []) %}
- salt.cloud - salt.cloud
@@ -65,21 +63,6 @@ engines_config:
- name: /etc/salt/master.d/engines.conf - name: /etc/salt/master.d/engines.conf
- source: salt://salt/files/engines.conf - source: salt://salt/files/engines.conf
{% if GLOBALMERGED.push.enabled %}
reactor_pushstate_config:
file.managed:
- name: /etc/salt/master.d/reactor_pushstate.conf
- source: salt://salt/files/reactor_pushstate.conf
- watch_in:
- service: salt_master_service
{% else %}
reactor_pushstate_config:
file.absent:
- name: /etc/salt/master.d/reactor_pushstate.conf
- watch_in:
- service: salt_master_service
{% endif %}
# update the bootstrap script when used for salt-cloud # update the bootstrap script when used for salt-cloud
salt_bootstrap_cloud: salt_bootstrap_cloud:
file.managed: file.managed:
@@ -95,7 +78,7 @@ salt_master_service:
- file: checkmine_engine - file: checkmine_engine
- file: pillarWatch_engine - file: pillarWatch_engine
- file: engines_config - file: engines_config
- order: 9002 - order: last
{% else %} {% else %}
-20
View File
@@ -1,20 +0,0 @@
# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
# https://securityonion.net/license; you may not use this file except in compliance with the
# Elastic License 2.0.
pyinotify_module_package:
file.recurse:
- name: /opt/so/conf/salt/module_packages/pyinotify
- source: salt://salt/module_packages/pyinotify
- clean: True
- makedirs: True
pyinotify_python_module_install:
cmd.run:
- name: /opt/saltstack/salt/bin/python3.10 -m pip install pyinotify --no-index --find-links=/opt/so/conf/salt/module_packages/pyinotify/ --upgrade
- onchanges:
- file: pyinotify_module_package
- failhard: True
- watch_in:
- service: salt_minion_service
+1
View File
@@ -2,3 +2,4 @@
salt: salt:
minion: minion:
version: '3006.19' version: '3006.19'
check_threshold: 3600 # in seconds, threshold used for so-salt-minion-check. any value less than 600 seconds may cause a lot of salt-minion restarts since the job to touch the file occurs every 5-8 minutes by default
+2 -20
View File
@@ -111,17 +111,13 @@ mark_setup_complete_for_upgrades:
{% endif %} {% endif %}
# this has to be outside the if statement above since there are <requisite>_in calls to this state. # this has to be outside the if statement above since there are <requisite>_in calls to this state
# uses watch (not listen) so the restart fires in-state and its result lands on this state's
# running entry; that is what lets wait_for_salt_minion_ready below detect any restart
# uniformly via onchanges, regardless of whether the trigger came from these files or from
# external watch_in's (e.g. beacons, master/pyinotify).
salt_minion_service: salt_minion_service:
service.running: service.running:
- name: salt-minion - name: salt-minion
- enable: True - enable: True
- onlyif: test "{{INSTALLEDSALTVERSION}}" == "{{SALTVERSION}}" - onlyif: test "{{INSTALLEDSALTVERSION}}" == "{{SALTVERSION}}"
- watch: - listen:
- file: mine_functions - file: mine_functions
{% if INSTALLEDSALTVERSION|string == SALTVERSION|string %} {% if INSTALLEDSALTVERSION|string == SALTVERSION|string %}
- file: set_log_levels - file: set_log_levels
@@ -130,17 +126,3 @@ salt_minion_service:
- file: signing_policy - file: signing_policy
{% endif %} {% endif %}
- order: last - order: last
# block until the just-restarted salt-minion is back and can execute modules locally, so
# follow-on jobs and the next highstate iteration do not race the restart. onchanges +
# require on salt_minion_service catches every restart trigger uniformly because watch
# mod_watch results replace the service state's running entry. wait logic lives in
# /usr/sbin/so-salt-minion-wait (deployed by common_sbin from common/tools/sbin/).
wait_for_salt_minion_ready:
cmd.run:
- name: /usr/sbin/so-salt-minion-wait
- onchanges:
- service: salt_minion_service
- require:
- service: salt_minion_service
- order: last
-35
View File
@@ -1,35 +0,0 @@
#!/bin/bash
#
# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
# https://securityonion.net/license; you may not use this file except in compliance with the
# Elastic License 2.0.
# Block until the local salt-minion service is back up and can execute modules locally.
# Invoked from the wait_for_salt_minion_ready state in salt/minion/init.sls after
# salt_minion_service fires its watch-driven mod_watch (a non-blocking systemctl restart),
# so follow-on jobs and the next highstate iteration do not race the in-flight restart.
. /usr/sbin/so-common
# Initial sleep gives the systemctl restart (--no-block by default for salt-minion on
# >=3006.15) time to begin tearing down the old process before we probe for readiness.
INITIAL_SLEEP=3
TIMEOUT=120
PING_TIMEOUT=5
sleep "$INITIAL_SLEEP"
elapsed="$INITIAL_SLEEP"
while [ "$elapsed" -lt "$TIMEOUT" ]; do
if systemctl is-active --quiet salt-minion \
&& salt-call --local --timeout="$PING_TIMEOUT" --out=quiet test.ping >/dev/null 2>&1; then
echo "salt-minion ready after ${elapsed}s"
exit 0
fi
sleep 1
elapsed=$((elapsed + 1))
done
echo "salt-minion did not become ready within ${TIMEOUT}s" >&2
exit 1
+3 -19
View File
@@ -1,26 +1,10 @@
{% from 'vars/globals.map.jinja' import GLOBALS %} {% from 'vars/globals.map.jinja' import GLOBALS %}
{% from 'global/map.jinja' import GLOBALMERGED %}
highstate_schedule: highstate_schedule:
schedule.present: schedule.present:
- function: state.highstate - function: state.highstate
- hours: {{ GLOBALMERGED.push.highstate_interval_hours }} - minutes: 15
- maxrunning: 1 - maxrunning: 1
{% if not GLOBALS.is_manager %} {% if not GLOBALS.is_manager %}
- splay: 1800 - splay: 120
{% endif %}
{% if GLOBALS.is_manager and GLOBALMERGED.push.enabled %}
push_drain_schedule:
schedule.present:
- function: cmd.run
- job_args:
- /usr/sbin/so-push-drainer
- seconds: {{ GLOBALMERGED.push.drain_interval }}
- maxrunning: 1
- return_job: False
{% elif GLOBALS.is_manager %}
push_drain_schedule:
schedule.absent:
- name: push_drain_schedule
{% endif %} {% endif %}
-1
View File
@@ -14,7 +14,6 @@ include:
so-sensoroni: so-sensoroni:
docker_container.running: docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-soc:{{ GLOBALS.so_version }} - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-soc:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- network_mode: host - network_mode: host
- binds: - binds:
- /nsm/import:/nsm/import:rw - /nsm/import:/nsm/import:rw
-1
View File
@@ -18,7 +18,6 @@ include:
so-soc: so-soc:
docker_container.running: docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-soc:{{ GLOBALS.so_version }} - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-soc:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- hostname: soc - hostname: soc
- name: so-soc - name: so-soc
- networks: - networks:
-4
View File
@@ -47,10 +47,6 @@ strelka_backend:
- {{ ULIMIT.name }}={{ ULIMIT.soft }}:{{ ULIMIT.hard }} - {{ ULIMIT.name }}={{ ULIMIT.soft }}:{{ ULIMIT.hard }}
{% endfor %} {% endfor %}
{% endif %} {% endif %}
# Intentionally `on-failure` (not unless-stopped) -- strelka backend shuts
# down cleanly during rule reloads and we do not want those clean exits to
# trigger an auto-restart. Do not homogenize; see the container
# auto-restart section of the plan.
- restart_policy: on-failure - restart_policy: on-failure
- watch: - watch:
- file: strelkasensorcompiledrules - file: strelkasensorcompiledrules
-1
View File
@@ -15,7 +15,6 @@ include:
strelka_coordinator: strelka_coordinator:
docker_container.running: docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-redis:{{ GLOBALS.so_version }} - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-redis:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- name: so-strelka-coordinator - name: so-strelka-coordinator
- networks: - networks:
- sobridge: - sobridge:
-1
View File
@@ -15,7 +15,6 @@ include:
strelka_filestream: strelka_filestream:
docker_container.running: docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-strelka-manager:{{ GLOBALS.so_version }} - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-strelka-manager:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- binds: - binds:
- /opt/so/conf/strelka/filestream/:/etc/strelka/:ro - /opt/so/conf/strelka/filestream/:/etc/strelka/:ro
- /nsm/strelka:/nsm/strelka - /nsm/strelka:/nsm/strelka
-1
View File
@@ -15,7 +15,6 @@ include:
strelka_frontend: strelka_frontend:
docker_container.running: docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-strelka-manager:{{ GLOBALS.so_version }} - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-strelka-manager:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- binds: - binds:
- /opt/so/conf/strelka/frontend/:/etc/strelka/:ro - /opt/so/conf/strelka/frontend/:/etc/strelka/:ro
- /nsm/strelka/log/:/var/log/strelka/:rw - /nsm/strelka/log/:/var/log/strelka/:rw
-1
View File
@@ -15,7 +15,6 @@ include:
strelka_gatekeeper: strelka_gatekeeper:
docker_container.running: docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-redis:{{ GLOBALS.so_version }} - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-redis:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- name: so-strelka-gatekeeper - name: so-strelka-gatekeeper
- networks: - networks:
- sobridge: - sobridge:
-1
View File
@@ -15,7 +15,6 @@ include:
strelka_manager: strelka_manager:
docker_container.running: docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-strelka-manager:{{ GLOBALS.so_version }} - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-strelka-manager:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- binds: - binds:
- /opt/so/conf/strelka/manager/:/etc/strelka/:ro - /opt/so/conf/strelka/manager/:/etc/strelka/:ro
{% if DOCKERMERGED.containers['so-strelka-manager'].custom_bind_mounts %} {% if DOCKERMERGED.containers['so-strelka-manager'].custom_bind_mounts %}
-1
View File
@@ -18,7 +18,6 @@ so-suricata:
docker_container.running: docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-suricata:{{ GLOBALS.so_version }} - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-suricata:{{ GLOBALS.so_version }}
- privileged: True - privileged: True
- restart_policy: unless-stopped
- environment: - environment:
- INTERFACE={{ GLOBALS.sensor.interface }} - INTERFACE={{ GLOBALS.sensor.interface }}
{% if DOCKERMERGED.containers['so-suricata'].extra_env %} {% if DOCKERMERGED.containers['so-suricata'].extra_env %}
-1
View File
@@ -7,7 +7,6 @@ so-tcpreplay:
docker_container.running: docker_container.running:
- network_mode: "host" - network_mode: "host"
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-tcpreplay:{{ GLOBALS.so_version }} - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-tcpreplay:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- name: so-tcpreplay - name: so-tcpreplay
- user: root - user: root
- interactive: True - interactive: True
-1
View File
@@ -18,7 +18,6 @@ include:
so-telegraf: so-telegraf:
docker_container.running: docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-telegraf:{{ GLOBALS.so_version }} - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-telegraf:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- user: 939 - user: 939
- group_add: 939,920 - group_add: 939,920
- environment: - environment:
-1
View File
@@ -18,7 +18,6 @@ so-zeek:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-zeek:{{ GLOBALS.so_version }} - image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-zeek:{{ GLOBALS.so_version }}
- start: True - start: True
- privileged: True - privileged: True
- restart_policy: unless-stopped
{% if DOCKERMERGED.containers['so-zeek'].ulimits %} {% if DOCKERMERGED.containers['so-zeek'].ulimits %}
- ulimits: - ulimits:
{% for ULIMIT in DOCKERMERGED.containers['so-zeek'].ulimits %} {% for ULIMIT in DOCKERMERGED.containers['so-zeek'].ulimits %}
+2
View File
@@ -223,6 +223,8 @@ if [ -n "$test_profile" ]; then
WEBPASSWD1=0n10nus3r WEBPASSWD1=0n10nus3r
WEBPASSWD2=0n10nus3r WEBPASSWD2=0n10nus3r
NODE_DESCRIPTION="${HOSTNAME} - ${install_type} - ${MSRVIP_OFFSET}" NODE_DESCRIPTION="${HOSTNAME} - ${install_type} - ${MSRVIP_OFFSET}"
# opt out of telemetry for automated testing
telemetry=1
update_sudoers_for_testing update_sudoers_for_testing
fi fi