Compare commits

...

62 Commits

Author SHA1 Message Date
Josh Patterson 3310e19ee4 Merge remote-tracking branch 'origin/3/dev' into saltthangs 2026-07-02 10:27:54 -04:00
Jason Ertel 07d6b2cfdd Merge pull request #16033 from Security-Onion-Solutions/jertel/wip
avoid setup failure reason ambiguity
2026-07-02 09:20:48 -04:00
Jason Ertel 89afea876a Merge branch '3/dev' into jertel/wip 2026-07-02 09:04:57 -04:00
Jason Ertel 1243a25bd3 avoid setup failure reason ambiguity 2026-07-02 08:59:52 -04:00
Josh Patterson 76f6947f36 Merge pull request #16029 from Security-Onion-Solutions/surirulereload
only reload suricata rules if all-rulesets.rules exists
2026-07-01 16:54:02 -04:00
Jorge Reyes 92a55386c6 Merge pull request #16028 from Security-Onion-Solutions/reyesj2-patch-6
duplicate repo name in so-repo-sync
2026-07-01 15:50:54 -05:00
reyesj2 e7352eb841 duplicate repo name in so-repo-sync 2026-07-01 15:17:55 -05:00
Josh Patterson 795aa898a3 suricata: only reload rules once the ruleset file exists
On a fresh install the surirulesync file.recurse creates .gitkeep before
SOC has generated all-rulesets.rules. That change satisfied the
surirulereload onchanges requisite, so the reload ran with no ruleset
present, failed to stat the file, and reported the state (and install)
as failed.

Add an onlyif guard so the reload only runs when all-rulesets.rules
exists. A .gitkeep-only sync now leaves the state a clean success
(onlyif condition false); once SOC writes the ruleset, the reload fires
normally.
2026-07-01 15:12:54 -04:00
Josh Patterson 69d77382f1 suricata: timestamp each line of reload log output
Route the reload/verify output (ours plus so-common's retry/fail lines)
through a synchronous timestamping pipeline so every line in reload.log
is prefixed with a date/time, and preserve the real exit code via
PIPESTATUS.
2026-07-01 15:12:53 -04:00
Jorge Reyes dc9b4f3ce5 Merge pull request #16027 from Security-Onion-Solutions/reyesj2-patch-6
increase wait_for_so-kibana timeout to 10m
2026-07-01 13:48:10 -05:00
reyesj2 87b9276c79 increase wait_for_so-kibana timeout to 10m 2026-07-01 13:19:47 -05:00
Jorge Reyes 99118f9bed Merge pull request #16023 from Security-Onion-Solutions/reyesj2/uekairgap
update airgap soup to sync uek repo from iso and retain latest packag…
2026-07-01 13:14:55 -05:00
reyesj2 24b75b4a2b typo 2026-07-01 12:50:23 -05:00
Jorge Reyes 395bd627f1 Merge pull request #16024 from Security-Onion-Solutions/reyesj2/fixsearch
remove outdated eval script and associated salt utility state
2026-07-01 11:59:00 -05:00
reyesj2 c33db9d00f remove outdated eval script and associated salt utility state 2026-07-01 11:12:39 -05:00
reyesj2 e88eb65a44 keep old packages for rollback ability 2026-07-01 10:29:05 -05:00
reyesj2 dc8c80633b update airgap soup to sync uek repo from iso and retain latest packages only 2026-07-01 10:23:04 -05:00
Josh Patterson f441d98e71 Merge remote-tracking branch 'origin/3/dev' into saltthangs 2026-07-01 10:34:56 -04:00
Josh Patterson 895aa18486 Merge pull request #16021 from Security-Onion-Solutions/surirulereload
suricata: verify reloaded ruleset is newer than the rules file
2026-07-01 10:33:14 -04:00
Josh Patterson ee36f5f84c suricata: verify reloaded ruleset is newer than the rules file
Treating an in-progress reload as instant success could report success
while Suricata was still running a stale ruleset (the in-flight reload
may have started before the new all-rulesets.rules was written).

Make success conditional on Suricata actually having loaded the current
ruleset: capture the rules-file mtime up front, trigger a blocking
reload-rules, then query ruleset-reload-time and only succeed when
last_reload >= mtime. An in-progress reload now retries (waits for it to
clear so our own fresh reload runs) instead of short-circuiting, and a
ruleset that never catches up within the retry window fails via fail().

Also drop the redundant ruleset-reload-nonblocking call (the verified
blocking reload is authoritative and the async call was what left a
reload running) and log human-readable timestamps.
2026-07-01 09:00:36 -04:00
Jorge Reyes a3f586cf88 Merge pull request #16018 from Security-Onion-Solutions/reyesj2/kf 2026-06-30 14:46:22 -05:00
Josh Patterson 52574e21c6 suricata: treat in-progress rule reload as success
so-suricata-reload-rules failed the surirulereload state when a rule
reload was already running: suricatasc returns
{"message":"Reload already in progress","return":"NOK"}, which never
matched the expected output, so retry looped all 60 attempts (~3 min)
and called fail.

Wrap the suricatasc calls so an in-progress reload is treated as
success (the in-flight reload picks up the new rules) while genuine
container-not-ready conditions still retry and ultimately fail.
2026-06-30 09:40:23 -04:00
Josh Patterson a330bea25e Rename push-detection beacons to clearer names
Rename the two custom push-detection beacons for clarity:
- pillar_db -> postgres_pillar_beacon
- rules_db  -> rules_beacon

Salt resolves a beacon by its config-key name to a _beacons/ module of the
same filename and tags its events salt/beacon/<minion>/<name>/<tag>, so each
rename touches the module file, the beacon config key in
beacons_pushstate.conf.jinja, and the reactor tag patterns in
reactor_pushstate.conf together. Watermark filenames and log prefixes are
updated to match; reactor run() logic is unchanged.
2026-06-29 14:29:07 -04:00
Josh Patterson 33c24cd136 Merge remote-tracking branch 'origin/3/dev' into saltthangs 2026-06-26 15:42:56 -04:00
Josh Patterson 12f4447875 Replace inotify rule-watch beacon with poll-based rules_db beacon
Salt's stock inotify beacon leaks one kernel inotify instance every time
the minion rebuilds the beacon loader's __context__ (the orphaned
pyinotify.Notifier is never stopped), accumulating against
fs.inotify.max_user_instances=128 until inotify_init() fails with EMFILE
and rule-change push detection silently stops. This is independent of
disable_during_state_run.

Add a custom poll-based beacon (salt/_beacons/rules_db.py) modeled on
pillar_db.py: it fingerprints the suricata/strelka rule dirs each interval
(relpath + mtime_ns + size, temp files excluded) against a per-dir
watermark, emitting an event only on change. It holds zero inotify
instances, so the leak is impossible, and it keeps firing during state
runs. Swap the inotify beacon config and reactor tag mappings accordingly;
the push_suricata/push_strelka reactors are unchanged (they read only
data['path']).
2026-06-26 15:40:32 -04:00
Jorge Reyes 576c7bfedd Merge pull request #16013 from Security-Onion-Solutions/reyesj2/so-start
update so-stop | so-start | so-restart scripts
2026-06-26 13:47:09 -05:00
reyesj2 b3b7ecdded update so-stop | so-start | so-restart scripts 2026-06-26 13:19:18 -05:00
Josh Patterson da94788255 Move highstate_interval_hours to salt.schedule and split schedule.sls
highstate_interval_hours describes the per-minion highstate schedule, not the
active-push pipeline, so relocate it from salt.auto_apply to a new salt.schedule
settings subtree. Repoint so-salt-minion-check at the new pillar path (it had
been left on the stale global:push path) so its restart grace period tracks the
schedule again.

- Add salt.schedule.highstate_interval_hours to defaults.yaml/soc_salt.yaml and a
  side-effect-free salt/salt/schedule.map.jinja (SCHEDULEMERGED), matching the
  *MERGED map convention. Consumers read SCHEDULEMERGED.highstate_interval_hours.
- Split salt/schedule.sls into salt/salt/highstate_schedule.sls (every minion) and
  salt/salt/push_drain_schedule.sls (managers); update top.sls to apply the
  highstate schedule via '*' and the drainer schedule via the configured-manager
  block. Remove the now-empty schedule.sls aggregator.
- pillar_push_map.yaml and so-push-drainer: comment/doc updates only.
2026-06-26 10:51:57 -04:00
Josh Patterson fa2ae1b87f Merge remote-tracking branch 'origin/3/dev' into saltthangs 2026-06-25 11:45:03 -04:00
Josh Patterson 5bf9751adf do not disable during state run 2026-06-25 11:44:38 -04:00
Josh Patterson 3effdbc91e do not disable during state run 2026-06-25 11:36:52 -04:00
Josh Patterson 8836529496 Merge remote-tracking branch 'origin/3/dev' into saltthangs 2026-06-25 08:13:32 -04:00
Josh Patterson b09c3776b7 Point pillar_db beacon at securityonion database
The SOC postgres database was renamed so_soc -> securityonion (see
POSTGRES_DB in salt/postgres/enabled.sls and the SOC postgres config in
salt/soc/defaults.yaml). The pillar_db beacon still hardcoded so_soc, so
every poll failed with 'database "so_soc" does not exist' (rc=2),
silently disabling active-push detection of audit_settings changes.

Update DATABASE to 'securityonion' and refresh the now-stale so_soc
references in the beacon and push_pillar reactor comments.
2026-06-24 16:51:32 -04:00
Josh Patterson dfdb1fbaeb Move global.push config to salt.auto_apply
The active-push tunables (enabled, highstate_interval_hours, debounce_seconds,
drain_interval, batch, batch_wait) described how Salt auto-applies changes, not
general grid config, so relocate them from the global namespace to a new
salt.auto_apply settings module.

- Add salt/salt/{defaults.yaml,auto_apply.map.jinja,soc_salt.yaml,adv_salt.yaml}.
  auto_apply.map.jinja is a dedicated, side-effect-free merge map (the existing
  salt/salt/map.jinja dereferences pillar.host.mainint at import time).
- Remove the push blocks from salt/global/{defaults,soc_global}.yaml.
- Register salt.soc_salt/salt.adv_salt in pillar/top.sls; seed the local pillar
  stubs for fresh installs (make_some_dirs) and upgrades (ensure_salt_local_pillar
  in soup, wired into up_to_3.2.0).
- Repoint all consumers: GLOBALMERGED.push.* -> AUTOAPPLY.* (schedule, salt
  master, manager beacons, beacons_pushstate, orch.push_batch) and
  pillar.get('global:push...') -> 'salt:auto_apply...' (push reactors,
  so-push-drainer).
- Add a salt: fleetwide-highstate entry to pillar_push_map.yaml so edits keep
  applying immediately, matching the prior global-namespace behavior.
2026-06-24 15:17:48 -04:00
Josh Patterson 61aa963a2d Merge remote-tracking branch 'origin/3/dev' into saltthangs 2026-06-24 08:10:27 -04:00
Josh Patterson d71e80cf66 Merge remote-tracking branch 'origin/3/dev' into saltthangs 2026-06-23 10:32:32 -04:00
Josh Patterson 33a116357d Merge remote-tracking branch 'origin/3/dev' into saltthangs 2026-06-10 08:56:17 -04:00
Josh Patterson 8c17ae0f66 move so-salt-minion-wait 2026-06-01 14:48:54 -04:00
Josh Patterson f54939b444 Replace inotify pillar watch with postgres audit_settings beacon
The active-push feature detected pillar/settings changes via an inotify
beacon on the manager watching /opt/so/saltstack/local/pillar. Replace
that pillar watch with a custom salt beacon (pillar_db) that polls the
SOC so_soc.audit_settings table on a monotonic id watermark, so changes
made through SOC drive immediate pushes from the database instead of the
files. The suricata/strelka rule inotify watches (and pyinotify) are kept
unchanged, since rule-file edits are not recorded in audit_settings.

- salt/_beacons/pillar_db.py: new beacon. Polls audit_settings via
  `docker exec so-postgres psql` (unix-socket trust auth), tracks the last
  processed id in /opt/so/state/pillar_db_watch.id, seeds to MAX(id) on
  first run (no history replay), and emits one event per new row.
- salt/reactor/push_pillar.sls: consume setting_id/node_id from the beacon
  event instead of a file path. App = first dotted segment of setting_id,
  looked up in pillar_push_map.yaml. Empty node_id -> grid-wide actions as
  is; populated node_id -> the app's state(s) retargeted to that one node.
- salt/manager/files/beacons_pushstate.conf.jinja: drop the pillar inotify
  block, add the pillar_db beacon (interval = push.drain_interval); keep
  the suricata/strelka inotify watches.
- salt/salt/files/reactor_pushstate.conf: map salt/beacon/*/pillar_db/
  audit_settings to push_pillar.sls; remove the pillar inotify reactor
  lines; keep suricata/strelka.

The intent -> so-push-drainer -> orch.push_batch pipeline is unchanged.
Verified end-to-end on a standalone: a grid-wide telegraf.output change
re-applied telegraf fleetwide (container replaced), and a per-host
ntp.config.servers change applied ntp to only that node.
2026-05-29 14:55:13 -04:00
Josh Patterson d48a22e37e Merge pull request #15944 from Security-Onion-Solutions/jertel/wip
Jertel/wip
2026-05-28 14:01:42 -04:00
Josh Patterson 6393d08e86 merge 2026-05-27 08:59:28 -04:00
Josh Patterson 730c828bec Merge remote-tracking branch 'origin/jertel/wip' into saltthangs 2026-05-19 10:23:45 -04:00
Josh Patterson b4e5171415 Merge remote-tracking branch 'origin/3/dev' into saltthangs 2026-05-14 08:03:45 -04:00
Josh Patterson 84decc1db6 Merge remote-tracking branch 'origin/3/dev' into saltthangs 2026-05-13 14:09:15 -04:00
Josh Patterson 7d4d6a0756 prune images if so-docker-prune exists 2026-05-08 10:13:15 -04:00
Josh Patterson 66c0a662fc convert wait to script 2026-05-08 09:26:42 -04:00
Josh Patterson 778cc055ea wait for salt-minion service to be ready before finishing state run 2026-05-07 17:01:20 -04:00
Josh Patterson 932deab751 update the push map 2026-05-07 10:51:53 -04:00
Josh Patterson 1281f0ee37 Merge remote-tracking branch 'origin/3/dev' into saltthangs 2026-05-06 09:46:12 -04:00
Josh Patterson f774334b6c Merge remote-tracking branch 'origin/3/dev' into saltthangs 2026-05-06 08:16:41 -04:00
Josh Patterson 7fcace34c4 add sensoroni to push map 2026-04-30 16:09:08 -04:00
Josh Patterson 9541024eb7 fix broken things 2026-04-30 15:35:24 -04:00
Josh Patterson 0d166ef732 remove trailing slashes 2026-04-30 09:53:00 -04:00
Josh Patterson f7d2994f8b filter temp files 2026-04-30 09:16:22 -04:00
Josh Patterson 8f0757606d include salt..minion 2026-04-29 16:42:19 -04:00
Josh Patterson 0a8f2e01a0 install pyinotify 2026-04-29 16:41:56 -04:00
Josh Patterson 4546d7bc52 Merge remote-tracking branch 'origin/3/dev' into saltthangs 2026-04-29 14:28:19 -04:00
Josh Patterson 17849d8758 Merge remote-tracking branch 'origin/3/dev' into saltthangs 2026-04-28 15:49:22 -04:00
Josh Patterson d3d30a587c Merge remote-tracking branch 'origin/3/dev' into saltthangs 2026-04-28 15:30:31 -04:00
Josh Patterson 034711d148 Merge remote-tracking branch 'origin/3/dev' into saltthangs 2026-04-28 10:47:29 -04:00
Mike Reeves a0cf0489d6 reduce highstate frequency with active push for rules and pillars
- schedule highstate every 2 hours (was 15 minutes); interval lives in
  global:push:highstate_interval_hours so the SOC admin UI can tune it and
  so-salt-minion-check derives its threshold as (interval + 1) * 3600
- add inotify beacon on the manager + master reactor + orch.push_batch that
  writes per-app intent files, with a so-push-drainer schedule on the manager
  that debounces, dedupes, and dispatches a single orchestration
- pillar_push_map.yaml allowlists the apps whose pillar changes trigger an
  immediate targeted state.apply (targets verified against salt/top.sls);
  edits under pillar/minions/ trigger a state.highstate on that one minion
- host-batch every push orchestration (batch: 25%, batch_wait: 15) so rule
  changes don't thundering-herd large fleets
- new global:push:enabled kill-switch tears down the beacon, reactor config,
  and drainer schedule on the next highstate for operators who want to keep
  highstate-only behavior
- set restart_policy: unless-stopped on 23 container states so docker
  recovers crashes without waiting for the next highstate; leave registry
  (always), strelka/backend (on-failure), kratos, and hydra alone with
  inline comments explaining why
2026-04-10 15:43:16 -04:00
Jason Ertel 613d31c8a6 merge 2026-03-05 11:52:09 -05:00
76 changed files with 1678 additions and 241 deletions
-59
View File
@@ -1,59 +0,0 @@
#!/usr/bin/env bash
# This script adds sensors/nodes/etc to the nodes tab
default_salt_dir=/opt/so/saltstack/default
local_salt_dir=/opt/so/saltstack/local
TYPE=$1
NAME=$2
IPADDRESS=$3
CPUS=$4
GUID=$5
MANINT=$6
ROOTFS=$7
NSM=$8
MONINT=$9
#NODETYPE=$10
#HOTNAME=$11
echo "Seeing if this host is already in here. If so delete it"
if grep -q $NAME "$local_salt_dir/pillar/data/$TYPE.sls"; then
echo "Node Already Present - Let's re-add it"
awk -v blah=" $NAME:" 'BEGIN{ print_flag=1 }
{
if( $0 ~ blah )
{
print_flag=0;
next
}
if( $0 ~ /^ [a-zA-Z0-9]+:$/ )
{
print_flag=1;
}
if ( print_flag == 1 )
print $0
} ' $local_salt_dir/pillar/data/$TYPE.sls > $local_salt_dir/pillar/data/tmp.$TYPE.sls
mv $local_salt_dir/pillar/data/tmp.$TYPE.sls $local_salt_dir/pillar/data/$TYPE.sls
echo "Deleted $NAME from the tab. Now adding it in again with updated info"
fi
echo " $NAME:" >> $local_salt_dir/pillar/data/$TYPE.sls
echo " ip: $IPADDRESS" >> $local_salt_dir/pillar/data/$TYPE.sls
echo " manint: $MANINT" >> $local_salt_dir/pillar/data/$TYPE.sls
echo " totalcpus: $CPUS" >> $local_salt_dir/pillar/data/$TYPE.sls
echo " guid: $GUID" >> $local_salt_dir/pillar/data/$TYPE.sls
echo " rootfs: $ROOTFS" >> $local_salt_dir/pillar/data/$TYPE.sls
echo " nsmfs: $NSM" >> $local_salt_dir/pillar/data/$TYPE.sls
if [ $TYPE == 'sensorstab' ]; then
echo " monint: bond0" >> $local_salt_dir/pillar/data/$TYPE.sls
fi
if [ $TYPE == 'evaltab' ] || [ $TYPE == 'standalonetab' ]; then
echo " monint: bond0" >> $local_salt_dir/pillar/data/$TYPE.sls
if [ ! $10 ]; then
salt-call state.apply utility queue=True
fi
fi
if [ $TYPE == 'nodestab' ]; then
salt-call state.apply elasticsearch queue=True
# echo " nodetype: $NODETYPE" >> $local_salt_dir/pillar/data/$TYPE.sls
# echo " hotname: $HOTNAME" >> $local_salt_dir/pillar/data/$TYPE.sls
fi
+2
View File
@@ -3,6 +3,8 @@ base:
- ca
- global.soc_global
- global.adv_global
- salt.soc_salt
- salt.adv_salt
- docker.soc_docker
- docker.adv_docker
- influxdb.token
+142
View File
@@ -0,0 +1,142 @@
# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
# https://securityonion.net/license; you may not use this file except in compliance with the
# Elastic License 2.0.
# Custom salt beacon that watches the SOC audit_settings table in postgres for
# new settings changes and emits a beacon event per new row. This replaces the
# inotify watch on /opt/so/saltstack/local/pillar -- instead of monitoring pillar
# files on disk, we monitor the securityonion.audit_settings table that SOC writes to.
#
# Detection is poll-based with a monotonic `id` watermark persisted to
# WATERMARK_FILE: each pass selects rows with id greater than the last id seen,
# which makes it self-healing (a missed poll simply catches up on the next one).
#
# Each emitted event carries setting_id and node_id; the push_pillar reactor maps
# setting_id -> app via pillar_push_map.yaml and writes a push intent, after which
# the existing so-push-drainer / orch.push_batch pipeline takes over unchanged.
import logging
import os
import subprocess
log = logging.getLogger(__name__)
WATERMARK_FILE = '/opt/so/state/postgres_pillar_beacon_watch.id'
CONTAINER = 'so-postgres'
DATABASE = 'securityonion'
# Unaligned, tuples-only psql output with a field separator that cannot appear in
# an id/setting_id/node_id, so we can split each row reliably.
FIELD_SEP = '\x1f'
def __virtual__():
return True
def validate(config):
return True, 'valid'
def _read_watermark():
# Returns the last processed id, or None if the watermark has not been seeded.
try:
with open(WATERMARK_FILE, 'r') as f:
return int((f.read() or '').strip())
except (IOError, ValueError):
return None
def _write_watermark(value):
try:
os.makedirs(os.path.dirname(WATERMARK_FILE), exist_ok=True)
tmp = WATERMARK_FILE + '.tmp'
with open(tmp, 'w') as f:
f.write(str(int(value)))
os.rename(tmp, WATERMARK_FILE)
except OSError:
log.exception('postgres_pillar_beacon: failed to persist watermark to %s', WATERMARK_FILE)
def _query(sql):
# Run a query against securityonion inside the so-postgres container over the unix
# socket (trust auth, no password). Returns stdout on success, or None on any
# failure so the caller can no-op and retry on the next interval.
cmd = [
'docker', 'exec', CONTAINER,
'psql', '-U', 'postgres', '-d', DATABASE,
'-tA', '-F', FIELD_SEP, '-c', sql,
]
try:
result = subprocess.run(cmd, capture_output=True, text=True, timeout=30)
except subprocess.TimeoutExpired:
log.warning('postgres_pillar_beacon: psql timed out')
return None
except Exception:
log.exception('postgres_pillar_beacon: failed to exec psql')
return None
if result.returncode != 0:
log.warning('postgres_pillar_beacon: psql failed (rc=%s): %s',
result.returncode, (result.stderr or '').strip())
return None
return result.stdout
def beacon(config):
retval = []
watermark = _read_watermark()
# First run / missing watermark: seed to the current MAX(id) and emit nothing
# so we never replay the entire settings history into a fleetwide push.
if watermark is None:
seed = _query('SELECT COALESCE(MAX(id), 0) FROM audit_settings;')
if seed is None:
return retval # postgres not ready yet; retry next interval
try:
_write_watermark(int((seed or '0').strip() or 0))
except ValueError:
log.warning('postgres_pillar_beacon: could not parse MAX(id) seed: %r', seed)
return retval
rows = _query(
"SELECT id, setting_id, COALESCE(node_id, '') FROM audit_settings "
"WHERE id > %d ORDER BY id;" % watermark
)
if rows is None:
return retval
max_id = watermark
for line in rows.splitlines():
# Do NOT str.strip() the whole line: Python treats the \x1f field
# separator (and \x1c-\x1e) as whitespace, so stripping would eat an
# empty trailing node_id field and make the row look malformed.
if not line.strip():
continue
parts = line.split(FIELD_SEP)
if len(parts) < 3:
log.warning('postgres_pillar_beacon: skipping malformed row: %r', line)
continue
try:
row_id = int(parts[0])
except ValueError:
log.warning('postgres_pillar_beacon: skipping row with non-int id: %r', line)
continue
setting_id = parts[1]
node_id = parts[2]
retval.append({
'tag': 'audit_settings',
'id': row_id,
'setting_id': setting_id,
'node_id': node_id,
})
if row_id > max_id:
max_id = row_id
if max_id > watermark:
_write_watermark(max_id)
log.info('postgres_pillar_beacon: emitted %d change(s), watermark %d -> %d',
len(retval), watermark, max_id)
return retval
+139
View File
@@ -0,0 +1,139 @@
# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
# https://securityonion.net/license; you may not use this file except in compliance with the
# Elastic License 2.0.
# Custom salt beacon that watches the suricata/strelka rule directories for changes
# and emits a beacon event per changed directory. This replaces the stock salt
# `inotify` beacon, which leaks a kernel inotify instance every time the minion
# rebuilds the beacon loader's __context__ (orphaning the old pyinotify.Notifier
# without closing it) until fs.inotify.max_user_instances is exhausted and the
# beacon dies with EMFILE. Polling holds zero inotify instances, so the leak is
# impossible, and it keeps firing during state runs (no blackout).
#
# Detection is poll-based with a per-directory fingerprint persisted to
# WATERMARK_DIR: each pass walks the directory and hashes every file's
# (relpath, st_mtime_ns, st_size), which catches content writes, additions,
# moves, and deletions. A change in the digest emits one event; an unchanged
# digest emits nothing. This makes it self-healing (a missed poll simply catches
# up on the next one).
#
# Each emitted event carries the watched directory path under the configured tag
# (e.g. salt/beacon/<minion>/rules_beacon/suricata); the push_suricata / push_strelka
# reactors write a push intent, after which the existing so-push-drainer /
# orch.push_batch pipeline takes over unchanged.
import hashlib
import logging
import os
import re
log = logging.getLogger(__name__)
WATERMARK_DIR = '/opt/so/state'
# Temp/editor files that should not trigger a push. Mirrors the exclude regexes
# the inotify beacon used. Matched against the full pathname.
EXCLUDES = [
re.compile(r'\.sw[a-z]$'),
re.compile(r'~$'),
re.compile(r'/4913$'),
re.compile(r'/\.#'),
]
def __virtual__():
return True
def validate(config):
return True, 'valid'
def _paths_from_config(config):
# The beacon config arrives as a list of single-key dicts (salt beacon style).
# Merge it and return the {dir: tag} mapping under the 'paths' key.
merged = {}
if isinstance(config, list):
for item in config:
if isinstance(item, dict):
merged.update(item)
elif isinstance(config, dict):
merged = config
paths = merged.get('paths', {})
return paths if isinstance(paths, dict) else {}
def _excluded(pathname):
for pattern in EXCLUDES:
if pattern.search(pathname):
return True
return False
def _fingerprint(directory):
# Stat-only walk; hash each file's (relpath, mtime_ns, size). Returns a hex
# digest, or the digest of an empty tree if the directory does not exist.
h = hashlib.sha1()
if os.path.isdir(directory):
entries = []
for root, _dirs, files in os.walk(directory):
for name in files:
full = os.path.join(root, name)
if _excluded(full):
continue
try:
st = os.stat(full)
except OSError:
continue
rel = os.path.relpath(full, directory)
entries.append('%s\0%d\0%d' % (rel, st.st_mtime_ns, st.st_size))
for line in sorted(entries):
h.update(line.encode('utf-8', 'surrogateescape'))
h.update(b'\n')
return h.hexdigest()
def _watermark_file(tag):
return os.path.join(WATERMARK_DIR, 'rules_beacon_%s.hash' % tag)
def _read_watermark(tag):
try:
with open(_watermark_file(tag), 'r') as f:
return (f.read() or '').strip() or None
except IOError:
return None
def _write_watermark(tag, digest):
path = _watermark_file(tag)
try:
os.makedirs(WATERMARK_DIR, exist_ok=True)
tmp = path + '.tmp'
with open(tmp, 'w') as f:
f.write(digest)
os.rename(tmp, path)
except OSError:
log.exception('rules_beacon: failed to persist watermark to %s', path)
def beacon(config):
retval = []
for directory, tag in _paths_from_config(config).items():
digest = _fingerprint(directory)
previous = _read_watermark(tag)
# First run / missing watermark: seed the digest and emit nothing so a
# fresh host does not fire a spurious fleetwide push.
if previous is None:
_write_watermark(tag, digest)
continue
if digest != previous:
_write_watermark(tag, digest)
retval.append({'tag': tag, 'path': directory})
log.info('rules_beacon: change detected in %s, emitting %s', directory, tag)
return retval
+1 -2
View File
@@ -37,8 +37,7 @@
'elasticfleet',
'elasticfleet.manager',
'elasticsearch.cluster',
'elastic-fleet-package-registry',
'utility'
'elastic-fleet-package-registry'
] %}
{% set sensor_states = [
+14
View File
@@ -291,6 +291,20 @@ download_and_verify() {
fi
}
# check if container with name is running and optionally stop it
docker_check_running() {
# show running containers, only names
if docker ps --format '{{.Names}}' | grep -q "^so-${1}$"; then
if [[ "$2" == "--stop" ]]; then
docker stop "so-${1}"
fi
return 0
else
return 1
fi
}
elastic_license() {
read -r -d '' message <<- EOM
+34 -20
View File
@@ -5,27 +5,41 @@
# https://securityonion.net/license; you may not use this file except in compliance with the
# Elastic License 2.0.
# Usage: so-restart kibana | playbook
. /usr/sbin/so-common
if [ $# -ge 1 ]; then
usage() {
echo "Usage: $0 <component> [args]"
echo ""
echo "Supported args:"
echo " --force | -f Force stop all Salt jobs before starting component."
echo ""
echo "Examples:"
echo " $0 kibana Restart Kibana"
echo " $0 kibana --force Force stop all Salt jobs before restarting Kibana"
exit 1
}
echo $banner
printf "Restarting $1...\n\nThis could take a while if another Salt job is running. \nRun this command with --force to stop all Salt jobs before proceeding.\n"
echo $banner
if [ "$2" = "--force" ]; then
printf "\nForce-stopping all Salt jobs before proceeding\n\n"
salt-call saltutil.kill_all_jobs
fi
case $1 in
"elastic-fleet") docker stop so-elastic-fleet && docker rm so-elastic-fleet && salt-call state.apply elasticfleet queue=True;;
*) docker stop so-$1 ; docker rm so-$1 ; salt-call state.apply $1 queue=True;;
esac
else
echo -e "\nPlease provide an argument by running like so-restart $component, or by using the component-specific script.\nEx. so-restart logstash, or so-logstash-restart\n"
if [[ $# -lt 1 ]]; then
usage
fi
#shellcheck disable=SC2154
echo "$banner"
printf "Restarting %s...\n\nThis could take a while if another Salt job is running. \nRun this command with --force to stop all Salt jobs before proceeding.\n" "$1"
echo "$banner"
if [[ "$2" = "--force" ]] || [[ "$2" = "-f" ]]; then
printf "\nForce-stopping all Salt jobs before proceeding\n\n"
salt-call saltutil.kill_all_jobs
fi
case $1 in
"elastic-fleet"|"elasticfleet")
docker_check_running "elastic-fleet" "--stop"
docker rm "so-elastic-fleet" 2> /dev/null
salt-call state.apply elasticfleet queue=True
;;
*)
docker_check_running "$1" "--stop"
docker rm "so-${1}" 2> /dev/null
salt-call state.apply "$1" queue=True
;;
esac
+47 -20
View File
@@ -5,27 +5,54 @@
# https://securityonion.net/license; you may not use this file except in compliance with the
# Elastic License 2.0.
# Usage: so-start all | kibana | playbook
# shellcheck disable=SC1091
. /usr/sbin/so-common
if [ $# -ge 1 ]; then
echo $banner
printf "Starting $1...\n\nThis could take a while if another Salt job is running. \nRun this command with --force to stop all Salt jobs before proceeding.\n"
echo $banner
usage() {
echo "Usage: $0 <component> [args]"
echo ""
echo "Supported args:"
echo " --force | -f Force stop all Salt jobs before starting component."
echo ""
echo "Examples:"
echo " $0 kibana Start Kibana"
echo " $0 kibana --force Force stop all Salt jobs before starting Kibana"
exit 1
}
if [ "$2" = "--force" ]; then
printf "\nForce-stopping all Salt jobs before proceeding\n\n"
salt-call saltutil.kill_all_jobs
fi
case $1 in
"all") salt-call state.highstate queue=True;;
"elastic-fleet") if docker ps | grep -q so-$1; then printf "\n$1 is already running!\n\n"; else docker rm so-$1 >/dev/null 2>&1 ; salt-call state.apply elasticfleet queue=True; fi ;;
*) if docker ps | grep -E -q '^so-$1$'; then printf "\n$1 is already running\n\n"; else docker rm so-$1 >/dev/null 2>&1 ; salt-call state.apply $1 queue=True; fi ;;
esac
else
echo -e "\nPlease provide an argument by running like so-start $component, or by using the component-specific script.\nEx. so-start logstash, or so-logstash-start\n"
if [[ $# -lt 1 ]]; then
usage
fi
#shellcheck disable=SC2154
echo "$banner"
printf "Starting %s...\n\nThis could take a while if another Salt job is running. \nRun this command with --force to stop all Salt jobs before proceeding.\n" "$1"
echo "$banner"
if [[ "$2" = "--force" ]] || [[ "$2" == "-f" ]]; then
printf "\nForce-stopping all Salt jobs before proceeding\n\n"
salt-call saltutil.kill_all_jobs
fi
case "$1" in
"all")
salt-call state.highstate queue=True
;;
"elastic-fleet"|"elasticfleet")
if docker_check_running "elastic-fleet"; then
printf "\nso-%s is already running!\n\n" "elastic-fleet"
/usr/sbin/so-status
else
docker rm "so-elastic-fleet" 2> /dev/null
salt-call state.apply elasticfleet queue=True
fi
;;
*)
if docker_check_running "$1"; then
printf "\nso-%s is already running\n\n" "$1"
/usr/sbin/so-status
else
docker rm "so-${1}" 2> /dev/null
salt-call state.apply "$1" queue=True
fi
;;
esac
+25 -13
View File
@@ -5,21 +5,33 @@
# https://securityonion.net/license; you may not use this file except in compliance with the
# Elastic License 2.0.
# Usage: so-stop kibana | playbook | thehive
# shellcheck disable=SC1091
. /usr/sbin/so-common
if [ $# -ge 1 ]; then
echo $banner
printf "Stopping $1...\n"
echo $banner
usage() {
echo "Usage: $0 <component>"
echo ""
echo "Examples:"
echo " $0 kibana Stop Kibana"
exit 1
}
case $1 in
*) docker stop so-$1 ; docker rm so-$1 ;;
esac
else
echo -e "\nPlease provide an argument by running like so-stop $component, or by using the component-specific script.\nEx. so-stop logstash, or so-logstash-stop\n"
if [[ $# -lt 1 ]]; then
usage
fi
#shellcheck disable=SC2154
echo "$banner"
printf "Stopping %s...\n" "$1"
echo "$banner"
case $1 in
"elasticfleet"|"elastic-fleet")
docker_check_running "elastic-fleet" "--stop"
docker rm "so-elastic-fleet" 2> /dev/null
;;
*)
docker_check_running "$1" "--stop"
docker rm "so-${1}" 2> /dev/null
;;
esac
@@ -1,5 +1,3 @@
{% import_yaml 'salt/minion.defaults.yaml' as SALT_MINION_DEFAULTS -%}
#!/bin/bash
#
# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
@@ -7,7 +5,7 @@
# https://securityonion.net/license; you may not use this file except in compliance with the
# Elastic License 2.0.
{% from 'salt/schedule.map.jinja' import SCHEDULEMERGED %}
# this script checks the time the file /opt/so/log/salt/state-apply-test was last modified and restarts the salt-minion service if it is outside a threshold date/time
# the file is modified via file.touch using a scheduled job healthcheck.salt-minion.state-apply-test that runs a state.apply.
@@ -25,7 +23,8 @@ SYSTEM_START_TIME=$(date -d "$(</proc/uptime awk '{print $1}') seconds ago" +%s)
LAST_HIGHSTATE_END=$([ -e "/opt/so/log/salt/lasthighstate" ] && date -r /opt/so/log/salt/lasthighstate +%s || echo 0)
LAST_HEALTHCHECK_STATE_APPLY=$([ -e "/opt/so/log/salt/state-apply-test" ] && date -r /opt/so/log/salt/state-apply-test +%s || echo 0)
# SETTING THRESHOLD TO ANYTHING UNDER 600 seconds may cause a lot of salt-minion restarts since the job to touch the file occurs every 5-8 minutes by default
THRESHOLD={{SALT_MINION_DEFAULTS.salt.minion.check_threshold}} #within how many seconds the file /opt/so/log/salt/state-apply-test must have been touched/modified before the salt minion is restarted
# THRESHOLD is derived from the salt schedule highstate interval + 1 hour, so the minion-check grace period tracks the schedule automatically.
THRESHOLD=$(( ({{ SCHEDULEMERGED.highstate_interval_hours }} + 1) * 3600 )) #within how many seconds the file /opt/so/log/salt/state-apply-test must have been touched/modified before the salt minion is restarted
THRESHOLD_DATE=$((LAST_HEALTHCHECK_STATE_APPLY+THRESHOLD))
logCmd() {
+2 -1
View File
@@ -9,7 +9,8 @@
prune_images:
cmd.run:
- name: so-docker-prune
- order: last
- onlyif: command -v /usr/sbin/so-docker-prune >/dev/null 2>&1
- order: 9000
{% else %}
+1
View File
@@ -19,6 +19,7 @@ wait_for_elasticsearch:
so-elastalert:
docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-elastalert:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- hostname: elastalert
- name: so-elastalert
- user: so-elastalert
@@ -15,6 +15,7 @@ include:
so-elastic-fleet-package-registry:
docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-elastic-fleet-package-registry:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- name: so-elastic-fleet-package-registry
- hostname: Fleet-package-reg-{{ GLOBALS.hostname }}
- detach: True
+1
View File
@@ -16,6 +16,7 @@ include:
so-elastic-agent:
docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-elastic-agent:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- name: so-elastic-agent
- hostname: {{ GLOBALS.hostname }}
- detach: True
+1
View File
@@ -42,6 +42,7 @@ elasticagent_syncartifacts:
so-elastic-fleet:
docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-elastic-agent:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- name: so-elastic-fleet
- hostname: FleetServer-{{ GLOBALS.hostname }}
- detach: True
+1
View File
@@ -24,6 +24,7 @@ include:
so-elasticsearch:
docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-elasticsearch:{{ ELASTICSEARCHMERGED.version }}
- restart_policy: unless-stopped
- hostname: elasticsearch
- name: so-elasticsearch
- user: elasticsearch
+1 -1
View File
@@ -1,3 +1,3 @@
global:
pcapengine: SURICATA
pipeline: REDIS
pipeline: REDIS
+1
View File
@@ -58,6 +58,7 @@ so-hydra:
- {{ ULIMIT.name }}={{ ULIMIT.soft }}:{{ ULIMIT.hard }}
{% endfor %}
{% endif %}
# Intentionally unless-stopped -- matches the fleet default.
- restart_policy: unless-stopped
- watch:
- file: hydraconfig
+1
View File
@@ -15,6 +15,7 @@ include:
so-idh:
docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-idh:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- name: so-idh
- detach: True
- network_mode: host
+1
View File
@@ -18,6 +18,7 @@ include:
so-influxdb:
docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-influxdb:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- hostname: influxdb
- networks:
- sobridge:
+1
View File
@@ -27,6 +27,7 @@ include:
so-kafka:
docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-kafka:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- hostname: so-kafka
- name: so-kafka
- networks:
+2 -1
View File
@@ -17,6 +17,7 @@ include:
so-kibana:
docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-kibana:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- hostname: kibana
- user: "932:0"
- networks:
@@ -69,7 +70,7 @@ wait_for_so-kibana:
- ssl: True
- verify_ssl: False
- status: 200
- wait_for: 300
- wait_for: 600
- request_interval: 15
- require:
- docker_container: so-kibana
+1
View File
@@ -51,6 +51,7 @@ so-kratos:
- {{ ULIMIT.name }}={{ ULIMIT.soft }}:{{ ULIMIT.hard }}
{% endfor %}
{% endif %}
# Intentionally unless-stopped -- matches the fleet default.
- restart_policy: unless-stopped
- watch:
- file: kratosschema
+1
View File
@@ -28,6 +28,7 @@ include:
so-logstash:
docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-logstash:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- hostname: so-logstash
- name: so-logstash
- networks:
+21
View File
@@ -0,0 +1,21 @@
{% from 'vars/globals.map.jinja' import GLOBALS %}
{% from 'salt/auto_apply.map.jinja' import AUTOAPPLY %}
include:
- salt.minion
{% if GLOBALS.is_manager and AUTOAPPLY.enabled %}
salt_beacons_pushstate:
file.managed:
- name: /etc/salt/minion.d/beacons_pushstate.conf
- source: salt://manager/files/beacons_pushstate.conf.jinja
- template: jinja
- watch_in:
- service: salt_minion_service
{% else %}
salt_beacons_pushstate:
file.absent:
- name: /etc/salt/minion.d/beacons_pushstate.conf
- watch_in:
- service: salt_minion_service
{% endif %}
@@ -0,0 +1,11 @@
{% from 'salt/auto_apply.map.jinja' import AUTOAPPLY %}
beacons:
postgres_pillar_beacon:
- interval: {{ AUTOAPPLY.drain_interval }}
- disable_during_state_run: False
rules_beacon:
- interval: {{ AUTOAPPLY.drain_interval }}
- disable_during_state_run: False
- paths:
/opt/so/saltstack/local/salt/suricata/rules: suricata
/opt/so/saltstack/local/salt/strelka/rules/compiled: strelka
+2 -2
View File
@@ -11,8 +11,8 @@ name=Security Onion Repo repo
mirrorlist=file:///opt/so/conf/reposync/mirror.txt
enabled=1
gpgcheck=1
[securityonionkernel]
name=Security Onion Repo repo
[securityonionkernelsync]
name=Security Onion Kernel Repo repo
mirrorlist=file:///opt/so/conf/reposync/mirror-kernel.txt
enabled=1
gpgcheck=1
+2
View File
@@ -15,6 +15,7 @@ include:
- manager.elasticsearch
- manager.kibana
- manager.managed_soc_annotations
- manager.beacons
repo_log_dir:
file.directory:
@@ -260,6 +261,7 @@ surifiltersrules:
- user: 939
- group: 939
{% else %}
{{sls}}_state_not_allowed:
+232
View File
@@ -0,0 +1,232 @@
#!/opt/saltstack/salt/bin/python3
# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
# https://securityonion.net/license; you may not use this file except in compliance with the
# Elastic License 2.0.
"""
so-push-drainer
===============
Scheduled drainer for the active-push feature. Runs on the manager every
drain_interval seconds (default 15) via a salt schedule in salt/salt/push_drain_schedule.sls.
For each intent file under /opt/so/state/push_pending/*.json whose last_touch
is older than debounce_seconds, this script:
* concatenates the actions lists from every ready intent
* dedupes by (state or __highstate__, tgt, tgt_type)
* dispatches a single `salt-run state.orchestrate orch.push_batch --async`
with the deduped actions list passed as pillar kwargs
* deletes the contributed intent files on successful dispatch
Reactor sls files (push_suricata, push_strelka, push_pillar) write intents
but never dispatch directly -- see plan
/home/mreeves/.claude/plans/goofy-marinating-hummingbird.md for the full design.
"""
import fcntl
import glob
import json
import logging
import logging.handlers
import os
import subprocess
import sys
import time
import salt.client
PENDING_DIR = '/opt/so/state/push_pending'
LOCK_FILE = os.path.join(PENDING_DIR, '.lock')
LOG_FILE = '/opt/so/log/salt/so-push-drainer.log'
HIGHSTATE_SENTINEL = '__highstate__'
def _make_logger():
logger = logging.getLogger('so-push-drainer')
logger.setLevel(logging.INFO)
if not logger.handlers:
os.makedirs(os.path.dirname(LOG_FILE), exist_ok=True)
handler = logging.handlers.RotatingFileHandler(
LOG_FILE, maxBytes=5 * 1024 * 1024, backupCount=3,
)
handler.setFormatter(logging.Formatter(
'%(asctime)s | %(levelname)s | %(message)s',
))
logger.addHandler(handler)
return logger
def _load_push_cfg():
"""Read the salt:auto_apply pillar subtree via salt-call. Returns a dict."""
caller = salt.client.Caller()
cfg = caller.cmd('pillar.get', 'salt:auto_apply', {})
return cfg if isinstance(cfg, dict) else {}
def _read_intent(path, log):
try:
with open(path, 'r') as f:
return json.load(f)
except (IOError, ValueError) as exc:
log.warning('cannot read intent %s: %s', path, exc)
return None
except Exception:
log.exception('unexpected error reading %s', path)
return None
def _dedupe_actions(actions):
seen = set()
deduped = []
for action in actions:
if not isinstance(action, dict):
continue
state_key = HIGHSTATE_SENTINEL if action.get('highstate') else action.get('state')
tgt = action.get('tgt')
tgt_type = action.get('tgt_type', 'compound')
if not state_key or not tgt:
continue
key = (state_key, tgt, tgt_type)
if key in seen:
continue
seen.add(key)
deduped.append(action)
return deduped
def _dispatch(actions, log):
pillar_arg = json.dumps({'actions': actions})
cmd = [
'salt-run',
'state.orchestrate',
'orch.push_batch',
'pillar={}'.format(pillar_arg),
'--async',
]
log.info('dispatching: %s', ' '.join(cmd[:3]) + ' pillar=<{} actions>'.format(len(actions)))
try:
result = subprocess.run(
cmd, check=True, capture_output=True, text=True, timeout=60,
)
except subprocess.CalledProcessError as exc:
log.error('dispatch failed (rc=%s): stdout=%s stderr=%s',
exc.returncode, exc.stdout, exc.stderr)
return False
except subprocess.TimeoutExpired:
log.error('dispatch timed out after 60s')
return False
except Exception:
log.exception('dispatch raised')
return False
log.info('dispatch accepted: %s', (result.stdout or '').strip())
return True
def main():
log = _make_logger()
if not os.path.isdir(PENDING_DIR):
# Nothing to do; reactors create the dir on first use.
return 0
try:
push = _load_push_cfg()
except Exception:
log.exception('failed to read salt:auto_apply pillar; aborting drain pass')
return 1
if not push.get('enabled', True):
log.debug('push disabled; exiting')
return 0
debounce_seconds = int(push.get('debounce_seconds', 30))
os.makedirs(PENDING_DIR, exist_ok=True)
lock_fd = os.open(LOCK_FILE, os.O_CREAT | os.O_RDWR, 0o644)
try:
fcntl.flock(lock_fd, fcntl.LOCK_EX)
intent_files = [
p for p in sorted(glob.glob(os.path.join(PENDING_DIR, '*.json')))
if os.path.basename(p) != '.lock'
]
if not intent_files:
return 0
now = time.time()
ready = []
skipped = 0
broken = []
for path in intent_files:
intent = _read_intent(path, log)
if not isinstance(intent, dict):
broken.append(path)
continue
last_touch = intent.get('last_touch', 0)
if now - last_touch < debounce_seconds:
skipped += 1
continue
ready.append((path, intent))
for path in broken:
try:
os.unlink(path)
except OSError:
pass
if not ready:
if skipped:
log.debug('no ready intents (%d still in debounce window)', skipped)
return 0
combined_actions = []
oldest_first_touch = now
all_paths = []
for path, intent in ready:
combined_actions.extend(intent.get('actions', []) or [])
first = intent.get('first_touch', now)
if first < oldest_first_touch:
oldest_first_touch = first
all_paths.extend(intent.get('paths', []) or [])
deduped = _dedupe_actions(combined_actions)
if not deduped:
log.warning('%d intent(s) had no usable actions; clearing', len(ready))
for path, _ in ready:
try:
os.unlink(path)
except OSError:
pass
return 0
debounce_duration = now - oldest_first_touch
log.info(
'draining %d intent(s): %d action(s) after dedupe (raw=%d), '
'debounce_duration=%.1fs, paths=%s',
len(ready), len(deduped), len(combined_actions),
debounce_duration, all_paths[:20],
)
if not _dispatch(deduped, log):
log.warning('dispatch failed; leaving intent files in place for retry')
return 1
for path, _ in ready:
try:
os.unlink(path)
except OSError:
log.exception('failed to remove drained intent %s', path)
return 0
finally:
try:
fcntl.flock(lock_fd, fcntl.LOCK_UN)
finally:
os.close(lock_fd)
if __name__ == '__main__':
sys.exit(main())
+3 -3
View File
@@ -17,9 +17,9 @@ createrepo /nsm/repo
# The kernel repo section is deployed to repodownload.conf by the manager highstate, which
# runs AFTER this script during soup. On the first upgrade to a kernel-aware version the
# on-disk config still predates the section, so guard on its presence to avoid dnf's
# "Unknown repo: 'securityonionkernel'" aborting the sync (set -e). The next sync after the
# "Unknown repo: 'securityonionkernelsync'" aborting the sync (set -e). The next sync after the
# highstate deploys the section will pick it up.
if grep -q '^\[securityonionkernel\]' /opt/so/conf/reposync/repodownload.conf; then
dnf reposync --norepopath -g --delete -m -c /opt/so/conf/reposync/repodownload.conf --repoid=securityonionkernel --download-metadata -p /nsm/kernelrepo/
if grep -q '^\[securityonionkernelsync\]' /opt/so/conf/reposync/repodownload.conf; then
dnf reposync --norepopath -g --delete -m -c /opt/so/conf/reposync/repodownload.conf --repoid=securityonionkernelsync --download-metadata -p /nsm/kernelrepo/
createrepo /nsm/kernelrepo
fi
+29 -5
View File
@@ -245,6 +245,7 @@ check_airgap() {
UPDATE_DIR=/tmp/soagupdate/SecurityOnion
AGDOCKER=/tmp/soagupdate/docker
AGREPO=/tmp/soagupdate/minimal/Packages
AGUEKREPO=/tmp/soagupdate/uek/Packages
else
is_airgap=1
fi
@@ -690,6 +691,21 @@ ensure_postgres_local_pillar() {
chown -R socore:socore "$dir"
}
ensure_salt_local_pillar() {
# The salt.auto_apply settings (moved from global.push) are a new SOC settings
# module, so the new pillar/top.sls references salt.soc_salt / salt.adv_salt
# unconditionally. Managers upgrading from before this change have no
# /opt/so/saltstack/local/pillar/salt/ (make_some_dirs only runs at install
# time), so the stubs must be created here before salt-master restarts against
# the new top.sls.
echo "Ensuring salt local pillar stubs exist."
local dir=/opt/so/saltstack/local/pillar/salt
mkdir -p "$dir"
[[ -f "$dir/soc_salt.sls" ]] || touch "$dir/soc_salt.sls"
[[ -f "$dir/adv_salt.sls" ]] || touch "$dir/adv_salt.sls"
chown -R socore:socore "$dir"
}
ensure_postgres_secret() {
# On a fresh install, generate_passwords + secrets_pillar seed
# secrets:postgres_pass in /opt/so/saltstack/local/pillar/secrets.sls. That
@@ -873,6 +889,8 @@ update_kafka_metadata() {
}
up_to_3.2.0() {
ensure_salt_local_pillar
fix_logstash_0013_lumberjack_pipeline_name
pin_elasticsearch_data_retention_method
@@ -1004,13 +1022,19 @@ update_airgap_rules() {
rsync -a $UPDATE_DIR/agrules/securityonion-resources/* /nsm/securityonion-resources/
}
update_airgap_repo() {
update_airgap_repos() {
# Update the files in the repo
echo "Syncing new updates to /nsm/repo"
rsync -a $AGREPO/* /nsm/repo/
echo "Creating repo"
echo "Syncing new updates to /nsm/repo & /nsm/kernelrepo"
# Airgap soup copies new files into the local repo, but doesn't remove old packages. Retaining the ability to rollback package updates
rsync -a "$AGREPO"/ /nsm/repo/
rsync -a "$AGUEKREPO"/ /nsm/kernelrepo/
dnf -y install yum-utils createrepo_c
echo "Running createrepo for /nsm/repo"
createrepo /nsm/repo
echo "Running createrepo for /nsm/kernelrepo"
createrepo /nsm/kernelrepo
}
update_salt_mine() {
@@ -1766,7 +1790,7 @@ main() {
set -e
if [[ $is_airgap -eq 0 ]]; then
update_airgap_repo
update_airgap_repos
dnf clean all
check_os_updates
elif [[ $OS == 'oracle' ]]; then
+1
View File
@@ -34,6 +34,7 @@ make-rule-dir-nginx:
so-nginx:
docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-nginx:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- hostname: so-nginx
- networks:
- sobridge:
+37
View File
@@ -0,0 +1,37 @@
{% from 'salt/auto_apply.map.jinja' import AUTOAPPLY %}
{% set actions = salt['pillar.get']('actions', []) %}
{% set BATCH = AUTOAPPLY.batch %}
{% set BATCH_WAIT = AUTOAPPLY.batch_wait %}
{% for action in actions %}
{% if action.get('highstate') %}
apply_highstate_{{ loop.index }}:
salt.state:
- tgt: '{{ action.tgt }}'
- tgt_type: {{ action.get('tgt_type', 'compound') }}
- highstate: True
- batch: {{ action.get('batch', BATCH) }}
- batch_wait: {{ action.get('batch_wait', BATCH_WAIT) }}
- kwarg:
queue: 2
{% else %}
refresh_pillar_{{ loop.index }}:
salt.function:
- name: saltutil.refresh_pillar
- tgt: '{{ action.tgt }}'
- tgt_type: {{ action.get('tgt_type', 'compound') }}
apply_{{ action.state | replace('.', '_') }}_{{ loop.index }}:
salt.state:
- tgt: '{{ action.tgt }}'
- tgt_type: {{ action.get('tgt_type', 'compound') }}
- sls:
- {{ action.state }}
- batch: {{ action.get('batch', BATCH) }}
- batch_wait: {{ action.get('batch_wait', BATCH_WAIT) }}
- kwarg:
queue: 2
- require:
- salt: refresh_pillar_{{ loop.index }}
{% endif %}
{% endfor %}
+251
View File
@@ -0,0 +1,251 @@
# One pillar directory can map to multiple (state, tgt) actions.
# tgt is a raw salt compound expression. tgt_type is always "compound".
# Per-action `batch` / `batch_wait` override the orch defaults (25% / 15s).
# An action with `highstate: True` triggers state.highstate instead of
# state.apply -- see salt/orch/push_batch.sls.
#
# Notes:
# - `bpf` is a pillar-only dir (no state of its own) consumed by both
# zeek and suricata via macros, so a bpf pillar change re-applies both.
# - suricata/strelka/zeek/elasticsearch/redis/kafka/logstash etc. have
# their own pillar dirs AND their own state, so they map 1:1 (or 1:2
# in strelka's case, because of the split init.sls / manager.sls).
#
# Intentional omissions (these will log a "not in pillar_push_map.yaml"
# warning in push_pillar.sls and wait for the next scheduled highstate):
# - `data` and `node_data`: pillar-only data consumed by many states;
# handling them generically would amount to a fleetwide highstate.
# - `host`: soc_host describes mainint/mainip; a change is a re-IP and
# needs a coordinated procedure, not an immediate state push.
# - `hypervisor`: state changes touch libvirt and are disruptive; leave
# to the next scheduled highstate.
# - `sensor`: every field in soc_sensor.yaml is `readonly: True` or
# per-minion (`node: True`). Per-minion edits are persisted under
# pillar/minions/<id>.sls and are handled by Branch A of push_pillar.sls
# (per-minion highstate intent), not by this app-pillar map.
#
# The role sets here were verified line-by-line against salt/top.sls. If
# salt/top.sls changes how an app is targeted, update the corresponding
# compound here.
# firewall: the one pillar everyone touches. Applied everywhere intentionally
# because every host's iptables needs to know about every other host in the
# grid. Salt's firewall state is idempotent (file.managed + iptables-restore
# onchanges in salt/firewall/init.sls), so hosts whose rendered firewall is
# unchanged do a file comparison and no-op without touching iptables -- actual
# reload happens only on the hosts whose rules actually changed. Fleetwide
# blast radius is intentional and matches the pre-plan behavior via highstate.
# Adding N sensors in a burst coalesces into one dispatch via the drainer.
firewall:
- state: firewall
tgt: '*'
# backup: backup.config_backup runs on eval, standalone, manager, managerhype,
# managersearch (NOT import -- the backup pillar is included on import per
# pillar/top.sls but the backup state is not run there per salt/top.sls).
backup:
- state: backup.config_backup
tgt: 'G@role:so-eval or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone'
# bpf is pillar-only (no state); consumed by both zeek and suricata as macros.
# Both states run on sensor_roles + so-import per salt/top.sls.
bpf:
- state: zeek
tgt: 'G@role:so-eval or G@role:so-heavynode or G@role:so-import or G@role:so-sensor or G@role:so-standalone'
- state: suricata
tgt: 'G@role:so-eval or G@role:so-heavynode or G@role:so-import or G@role:so-sensor or G@role:so-standalone'
# ca is applied universally.
ca:
- state: ca
tgt: '*'
# docker: universal. The docker state is in both the all-non-managers and
# all-managers branches of salt/top.sls.
docker:
- state: docker
tgt: '*'
# elastalert: eval, standalone, manager, managerhype, managersearch (NOT import).
elastalert:
- state: elastalert
tgt: 'G@role:so-eval or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone'
# elastic-fleet-package-registry: manager_roles exactly.
elastic-fleet-package-registry:
- state: elastic-fleet-package-registry
tgt: 'G@role:so-eval or G@role:so-import or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone'
# elasticsearch: 8 roles.
elasticsearch:
- state: elasticsearch
tgt: 'G@role:so-eval or G@role:so-heavynode or G@role:so-import or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-searchnode or G@role:so-standalone'
# elasticagent: so-heavynode only.
elasticagent:
- state: elasticagent
tgt: 'G@role:so-heavynode'
# elasticfleet: base state only on pillar change. elasticfleet.install_agent_grid
# is a deploy/enrollment step, not a config reload; leave it to the next highstate.
elasticfleet:
- state: elasticfleet
tgt: 'G@role:so-eval or G@role:so-fleet or G@role:so-import or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone'
# global: fanout to a fleetwide highstate. The global pillar (soc_global.sls)
# carries cross-cutting settings (pipeline, url_base, imagerepo, mdengine, ...)
# that are consumed by virtually every state, so a targeted re-apply isn't
# meaningful. The drainer's batch/batch_wait throttling controls blast radius.
global:
- highstate: True
tgt: '*'
# healthcheck: eval, sensor, standalone only.
healthcheck:
- state: healthcheck
tgt: 'G@role:so-eval or G@role:so-sensor or G@role:so-standalone'
# hydra: manager_roles exactly.
hydra:
- state: hydra
tgt: 'G@role:so-eval or G@role:so-import or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone'
# idh: so-idh only.
idh:
- state: idh
tgt: 'G@role:so-idh'
# influxdb: manager_roles exactly.
influxdb:
- state: influxdb
tgt: 'G@role:so-eval or G@role:so-import or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone'
# kafka: standalone, manager, managerhype, managersearch, searchnode, receiver.
kafka:
- state: kafka
tgt: 'G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-receiver or G@role:so-searchnode or G@role:so-standalone'
# kibana: manager_roles exactly.
kibana:
- state: kibana
tgt: 'G@role:so-eval or G@role:so-import or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone'
# kratos: manager_roles exactly.
kratos:
- state: kratos
tgt: 'G@role:so-eval or G@role:so-import or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone'
# logrotate: universal (top-of-file '*' branch in salt/top.sls).
logrotate:
- state: logrotate
tgt: '*'
# logstash: 8 roles, no eval/import.
logstash:
- state: logstash
tgt: 'G@role:so-fleet or G@role:so-heavynode or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-receiver or G@role:so-searchnode or G@role:so-standalone'
# manager: manager_roles exactly. The manager state is also referenced under
# *_sensor / *_heavynode top.sls blocks via `sensor`, but the standalone
# `manager` state itself runs only on manager_roles.
manager:
- state: manager
tgt: 'G@role:so-eval or G@role:so-import or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone'
# nginx: 10 specific roles. NOT receiver, idh, hypervisor, desktop.
nginx:
- state: nginx
tgt: 'G@role:so-eval or G@role:so-fleet or G@role:so-heavynode or G@role:so-import or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-searchnode or G@role:so-sensor or G@role:so-standalone'
# ntp: universal (top-of-file '*' branch in salt/top.sls).
ntp:
- state: ntp
tgt: '*'
# patch: universal. soc_patch carries the OS update schedule, applied via
# patch.os.schedule on every node (it's in both the all-non-managers and
# all-managers branches of salt/top.sls).
patch:
- state: patch.os.schedule
tgt: '*'
# postgres: manager_roles exactly.
postgres:
- state: postgres
tgt: 'G@role:so-eval or G@role:so-import or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone'
# redis: 6 roles. standalone, manager, managerhype, managersearch, heavynode, receiver.
# (NOT eval, NOT import, NOT searchnode.)
redis:
- state: redis
tgt: 'G@role:so-heavynode or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-receiver or G@role:so-standalone'
# registry: manager_roles exactly.
registry:
- state: registry
tgt: 'G@role:so-eval or G@role:so-import or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone'
# salt: fanout to a fleetwide highstate. The salt.auto_apply settings tune the
# push pipeline itself (enabled, debounce/drain intervals, batch sizing) and
# salt.schedule sets the per-minion highstate interval; they are consumed by the
# manager's schedule, beacons, and master reactor config as well as every
# minion's highstate schedule, so a targeted re-apply isn't meaningful. A salt
# audit row only fires for SOC-driven salt.auto_apply / salt.schedule edits --
# salt version bumps go through soup, not SOC, so they never reach this map.
salt:
- highstate: True
tgt: '*'
# sensoroni: universal.
sensoroni:
- state: sensoroni
tgt: '*'
# soc: manager_roles exactly.
soc:
- state: soc
tgt: 'G@role:so-eval or G@role:so-import or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone'
# stig: broad. Runs on standalone, manager, managerhype, managersearch,
# searchnode, sensor, receiver, fleet, hypervisor, desktop.
# NOT eval, NOT import, NOT heavynode, NOT idh (the *_idh block in
# salt/top.sls intentionally omits stig).
stig:
- state: stig
tgt: 'G@role:so-desktop or G@role:so-fleet or G@role:so-hypervisor or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-receiver or G@role:so-searchnode or G@role:so-sensor or G@role:so-standalone'
# strelka: sensor-side only on pillar change (sensor_roles). strelka.manager is
# intentionally NOT fired on pillar changes -- YARA rule and strelka config
# pillar changes are consumed by the sensor-side strelka backend, and re-running
# strelka.manager on managers is both unnecessary and disruptive. strelka.manager
# is left to the 2-hour highstate.
strelka:
- state: strelka
tgt: 'G@role:so-eval or G@role:so-heavynode or G@role:so-sensor or G@role:so-standalone'
# suricata: sensor_roles + so-import (5 roles).
suricata:
- state: suricata
tgt: 'G@role:so-eval or G@role:so-heavynode or G@role:so-import or G@role:so-sensor or G@role:so-standalone'
# telegraf: universal.
telegraf:
- state: telegraf
tgt: '*'
# versionlock: universal (top-of-file '*' branch in salt/top.sls).
versionlock:
- state: versionlock
tgt: '*'
# vm: libvirt-driver hypervisors only. Matched by the salt-cloud:driver:libvirt
# grain (compound supports nested grain matching via G@<key>:<subkey>:<value>).
# pillar/vm/soc_vm.sls write path is referenced at salt/_runners/setup_hypervisor.py:856.
vm:
- state: vm
tgt: 'G@salt-cloud:driver:libvirt'
# zeek: sensor_roles + so-import (5 roles).
zeek:
- state: zeek
tgt: 'G@role:so-eval or G@role:so-heavynode or G@role:so-import or G@role:so-sensor or G@role:so-standalone'
+176
View File
@@ -0,0 +1,176 @@
#!py
# Reactor invoked by the postgres_pillar_beacon when SOC records settings changes in
# the securityonion.audit_settings table (see salt/_beacons/postgres_pillar_beacon.py). The beacon
# emits one event per new row carrying setting_id and node_id.
#
# Two branches, keyed on node_id:
# A) node_id populated -> the change is scoped to that one minion. Look up the
# app in pillar_push_map.yaml and write an intent that runs the app's mapped
# state(s) targeted to just that node.
# B) node_id empty -> grid-wide app change. Look up the app in
# pillar_push_map.yaml and write an intent with the entry's actions as-is.
#
# The app name is the first dotted segment of setting_id (e.g. "telegraf.output"
# -> "telegraf"), which matches the pillar_push_map.yaml keys 1:1.
#
# Reactors never dispatch directly. The so-push-drainer schedule picks up
# ready intents, dedupes across pending files, and dispatches orch.push_batch.
import fcntl
import json
import logging
import os
import time
from salt.client import Caller
import yaml
LOG = logging.getLogger(__name__)
PENDING_DIR = '/opt/so/state/push_pending'
LOCK_FILE = os.path.join(PENDING_DIR, '.lock')
MAX_PATHS = 20
# The pillar_push_map.yaml is shipped via salt:// but the reactor runs on the
# master, which mounts the default saltstack tree at this path.
PUSH_MAP_PATH = '/opt/so/saltstack/default/salt/reactor/pillar_push_map.yaml'
_PUSH_MAP_CACHE = {'mtime': 0, 'data': None}
def _load_push_map():
try:
st = os.stat(PUSH_MAP_PATH)
except OSError:
LOG.warning('push_pillar: %s not found', PUSH_MAP_PATH)
return {}
if _PUSH_MAP_CACHE['mtime'] != st.st_mtime:
try:
with open(PUSH_MAP_PATH, 'r') as f:
_PUSH_MAP_CACHE['data'] = yaml.safe_load(f) or {}
except Exception:
LOG.exception('push_pillar: failed to load %s', PUSH_MAP_PATH)
_PUSH_MAP_CACHE['data'] = {}
_PUSH_MAP_CACHE['mtime'] = st.st_mtime
return _PUSH_MAP_CACHE['data'] or {}
def _push_enabled():
try:
caller = Caller()
return bool(caller.cmd('pillar.get', 'salt:auto_apply:enabled', True))
except Exception:
LOG.exception('push_pillar: pillar.get salt:auto_apply:enabled failed, assuming enabled')
return True
def _write_intent(key, actions, path):
now = time.time()
try:
os.makedirs(PENDING_DIR, exist_ok=True)
except OSError:
LOG.exception('push_pillar: cannot create %s', PENDING_DIR)
return
intent_path = os.path.join(PENDING_DIR, '{}.json'.format(key))
lock_fd = os.open(LOCK_FILE, os.O_CREAT | os.O_RDWR, 0o644)
try:
fcntl.flock(lock_fd, fcntl.LOCK_EX)
intent = {}
if os.path.exists(intent_path):
try:
with open(intent_path, 'r') as f:
intent = json.load(f)
except (IOError, ValueError):
intent = {}
intent.setdefault('first_touch', now)
intent['last_touch'] = now
intent['actions'] = actions
paths = intent.get('paths', [])
if path and path not in paths:
paths.append(path)
paths = paths[-MAX_PATHS:]
intent['paths'] = paths
tmp_path = intent_path + '.tmp'
with open(tmp_path, 'w') as f:
json.dump(intent, f)
os.rename(tmp_path, intent_path)
except Exception:
LOG.exception('push_pillar: failed to write intent %s', intent_path)
finally:
try:
fcntl.flock(lock_fd, fcntl.LOCK_UN)
finally:
os.close(lock_fd)
def _app_from_setting(setting_id):
# setting_id is e.g. 'telegraf.output' -> 'telegraf', 'ntp.config.servers' -> 'ntp'
if not setting_id:
return None
return setting_id.split('.', 1)[0] or None
def _node_actions(entry, node_id):
# Copy the app's mapped actions but retarget each one to the single node.
# Preserves the state/highstate selection and any batch/batch_wait overrides.
actions = []
for action in entry:
if not isinstance(action, dict):
continue
node_action = dict(action)
node_action['tgt'] = node_id
node_action['tgt_type'] = 'glob'
actions.append(node_action)
return actions
def run():
if not _push_enabled():
LOG.info('push_pillar: push disabled, skipping')
return {}
# The postgres_pillar_beacon nests its payload under data['data']; fall back to the
# top level so the reactor is robust to either shape.
event = data.get('data', data) # noqa: F821 -- data provided by reactor
setting_id = event.get('setting_id', '')
node_id = (event.get('node_id') or '').strip()
app = _app_from_setting(setting_id)
if not app:
LOG.debug('push_pillar: ignoring event with no app segment: setting_id=%s', setting_id)
return {}
push_map = _load_push_map()
entry = push_map.get(app)
if not entry:
LOG.warning(
'push_pillar: app "%s" is not in pillar_push_map.yaml; change will be '
'picked up at the next scheduled highstate (setting_id=%s)',
app, setting_id,
)
return {}
# Branch A: per-node change -> retarget the app's states to just that node.
if node_id:
actions = _node_actions(entry, node_id)
if not actions:
LOG.warning('push_pillar: no usable actions for app "%s" (setting_id=%s)', app, setting_id)
return {}
_write_intent(
'node_{}_{}'.format(node_id, app), actions,
'audit:{}@{}'.format(setting_id, node_id),
)
LOG.info('push_pillar: per-node intent updated for %s on %s (setting_id=%s)',
app, node_id, setting_id)
return {}
# Branch B: grid-wide app change -> use the map entry's actions as-is.
actions = list(entry) # copy to avoid mutating the cache
_write_intent('pillar_{}'.format(app), actions, 'audit:{}'.format(setting_id))
LOG.info('push_pillar: app intent updated for %s (setting_id=%s)', app, setting_id)
return {}
+96
View File
@@ -0,0 +1,96 @@
#!py
# Reactor invoked by the rules_beacon poll beacon (salt/_beacons/rules_beacon.py) on rule
# file changes under /opt/so/saltstack/local/salt/strelka/rules/compiled/.
#
# Writes (or updates) a push intent at /opt/so/state/push_pending/rules_strelka.json
# and returns {}. The so-push-drainer schedule picks up ready intents, dedupes
# across pending files, and dispatches orch.push_batch. Reactors never dispatch
# directly -- see plan /home/mreeves/.claude/plans/goofy-marinating-hummingbird.md.
import fcntl
import json
import logging
import os
import time
from salt.client import Caller
LOG = logging.getLogger(__name__)
PENDING_DIR = '/opt/so/state/push_pending'
LOCK_FILE = os.path.join(PENDING_DIR, '.lock')
MAX_PATHS = 20
# Mirrors GLOBALS.sensor_roles in salt/vars/globals.map.jinja. Sensor-side
# strelka runs on exactly these four roles; so-import gets strelka.manager
# instead, which is not fired on pillar changes.
SENSOR_ROLES = ['so-eval', 'so-heavynode', 'so-sensor', 'so-standalone']
def _sensor_compound():
return ' or '.join('G@role:{}'.format(r) for r in SENSOR_ROLES)
def _push_enabled():
try:
caller = Caller()
return bool(caller.cmd('pillar.get', 'salt:auto_apply:enabled', True))
except Exception:
LOG.exception('push_strelka: pillar.get salt:auto_apply:enabled failed, assuming enabled')
return True
def _write_intent(key, actions, path):
now = time.time()
try:
os.makedirs(PENDING_DIR, exist_ok=True)
except OSError:
LOG.exception('push_strelka: cannot create %s', PENDING_DIR)
return
intent_path = os.path.join(PENDING_DIR, '{}.json'.format(key))
lock_fd = os.open(LOCK_FILE, os.O_CREAT | os.O_RDWR, 0o644)
try:
fcntl.flock(lock_fd, fcntl.LOCK_EX)
intent = {}
if os.path.exists(intent_path):
try:
with open(intent_path, 'r') as f:
intent = json.load(f)
except (IOError, ValueError):
intent = {}
intent.setdefault('first_touch', now)
intent['last_touch'] = now
intent['actions'] = actions
paths = intent.get('paths', [])
if path and path not in paths:
paths.append(path)
paths = paths[-MAX_PATHS:]
intent['paths'] = paths
tmp_path = intent_path + '.tmp'
with open(tmp_path, 'w') as f:
json.dump(intent, f)
os.rename(tmp_path, intent_path)
except Exception:
LOG.exception('push_strelka: failed to write intent %s', intent_path)
finally:
try:
fcntl.flock(lock_fd, fcntl.LOCK_UN)
finally:
os.close(lock_fd)
def run():
if not _push_enabled():
LOG.info('push_strelka: push disabled, skipping')
return {}
path = data.get('path', '') # noqa: F821 -- data provided by reactor
actions = [{'state': 'strelka', 'tgt': _sensor_compound()}]
_write_intent('rules_strelka', actions, path)
LOG.info('push_strelka: intent updated for path=%s', path)
return {}
+95
View File
@@ -0,0 +1,95 @@
#!py
# Reactor invoked by the rules_beacon poll beacon (salt/_beacons/rules_beacon.py) on rule
# file changes under /opt/so/saltstack/local/salt/suricata/rules/.
#
# Writes (or updates) a push intent at /opt/so/state/push_pending/rules_suricata.json
# and returns {}. The so-push-drainer schedule picks up ready intents, dedupes
# across pending files, and dispatches orch.push_batch. Reactors never dispatch
# directly -- see plan /home/mreeves/.claude/plans/goofy-marinating-hummingbird.md.
import fcntl
import json
import logging
import os
import time
from salt.client import Caller
LOG = logging.getLogger(__name__)
PENDING_DIR = '/opt/so/state/push_pending'
LOCK_FILE = os.path.join(PENDING_DIR, '.lock')
MAX_PATHS = 20
# Mirrors GLOBALS.sensor_roles in salt/vars/globals.map.jinja. Suricata also
# runs on so-import per salt/top.sls, so that role is appended below.
SENSOR_ROLES = ['so-eval', 'so-heavynode', 'so-sensor', 'so-standalone']
def _sensor_compound_plus_import():
return ' or '.join('G@role:{}'.format(r) for r in SENSOR_ROLES) + ' or G@role:so-import'
def _push_enabled():
try:
caller = Caller()
return bool(caller.cmd('pillar.get', 'salt:auto_apply:enabled', True))
except Exception:
LOG.exception('push_suricata: pillar.get salt:auto_apply:enabled failed, assuming enabled')
return True
def _write_intent(key, actions, path):
now = time.time()
try:
os.makedirs(PENDING_DIR, exist_ok=True)
except OSError:
LOG.exception('push_suricata: cannot create %s', PENDING_DIR)
return
intent_path = os.path.join(PENDING_DIR, '{}.json'.format(key))
lock_fd = os.open(LOCK_FILE, os.O_CREAT | os.O_RDWR, 0o644)
try:
fcntl.flock(lock_fd, fcntl.LOCK_EX)
intent = {}
if os.path.exists(intent_path):
try:
with open(intent_path, 'r') as f:
intent = json.load(f)
except (IOError, ValueError):
intent = {}
intent.setdefault('first_touch', now)
intent['last_touch'] = now
intent['actions'] = actions
paths = intent.get('paths', [])
if path and path not in paths:
paths.append(path)
paths = paths[-MAX_PATHS:]
intent['paths'] = paths
tmp_path = intent_path + '.tmp'
with open(tmp_path, 'w') as f:
json.dump(intent, f)
os.rename(tmp_path, intent_path)
except Exception:
LOG.exception('push_suricata: failed to write intent %s', intent_path)
finally:
try:
fcntl.flock(lock_fd, fcntl.LOCK_UN)
finally:
os.close(lock_fd)
def run():
if not _push_enabled():
LOG.info('push_suricata: push disabled, skipping')
return {}
path = data.get('path', '') # noqa: F821 -- data provided by reactor
actions = [{'state': 'suricata', 'tgt': _sensor_compound_plus_import()}]
_write_intent('rules_suricata', actions, path)
LOG.info('push_suricata: intent updated for path=%s', path)
return {}
+1
View File
@@ -17,6 +17,7 @@ include:
so-redis:
docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-redis:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- hostname: so-redis
- user: socore
- networks:
+3
View File
@@ -21,6 +21,9 @@ so-dockerregistry:
- networks:
- sobridge:
- ipv4_address: {{ DOCKERMERGED.containers['so-dockerregistry'].ip }}
# Intentionally `always` (not unless-stopped) -- registry is critical infra
# and must come back up even if it was manually stopped. Do not homogenize
# to unless-stopped; see the container auto-restart section of the plan.
- restart_policy: always
- port_bindings:
{% for BINDING in DOCKERMERGED.containers['so-dockerregistry'].port_bindings %}
View File
+2
View File
@@ -0,0 +1,2 @@
{% import_yaml 'salt/defaults.yaml' as SALT_DEFAULTS %}
{% set AUTOAPPLY = salt['pillar.get']('salt:auto_apply', SALT_DEFAULTS.salt.auto_apply, merge=True) %}
+4 -3
View File
@@ -3,7 +3,7 @@
{% set SCHEDULE = salt['pillar.get']('healthcheck:schedule', 30) %}
include:
- salt
- salt.minion
{% if CHECKS and ENABLED %}
salt_beacons:
@@ -14,12 +14,13 @@ salt_beacons:
- defaults:
CHECKS: {{ CHECKS }}
SCHEDULE: {{ SCHEDULE }}
- watch_in:
- watch_in:
- service: salt_minion_service
{% else %}
salt_beacons:
file.absent:
- name: /etc/salt/minion.d/beacons.conf
- watch_in:
- watch_in:
- service: salt_minion_service
{% endif %}
+9
View File
@@ -0,0 +1,9 @@
salt:
auto_apply:
enabled: true
debounce_seconds: 30
drain_interval: 15
batch: '25%'
batch_wait: 15
schedule:
highstate_interval_hours: 2
+7
View File
@@ -0,0 +1,7 @@
reactor:
- 'salt/beacon/*/rules_beacon/suricata':
- salt://reactor/push_suricata.sls
- 'salt/beacon/*/rules_beacon/strelka':
- salt://reactor/push_strelka.sls
- 'salt/beacon/*/postgres_pillar_beacon/audit_settings':
- salt://reactor/push_pillar.sls
+11
View File
@@ -0,0 +1,11 @@
{% from 'vars/globals.map.jinja' import GLOBALS %}
{% from 'salt/schedule.map.jinja' import SCHEDULEMERGED %}
highstate_schedule:
schedule.present:
- function: state.highstate
- hours: {{ SCHEDULEMERGED.highstate_interval_hours }}
- maxrunning: 1
{% if not GLOBALS.is_manager %}
- splay: 1800
{% endif %}
+8
View File
@@ -5,3 +5,11 @@ salt_bootstrap:
- source: salt://salt/scripts/bootstrap-salt.sh
- mode: 755
- show_changes: False
salt_sbin:
file.recurse:
- name: /usr/sbin
- source: salt://salt/tools/sbin
- user: 939
- group: 939
- file_mode: 755
+1 -1
View File
@@ -1,4 +1,4 @@
lasthighstate:
file.touch:
- name: /opt/so/log/salt/lasthighstate
- order: last
- order: 9001
+18 -1
View File
@@ -10,10 +10,12 @@
# software that is protected by the license key."
{% from 'allowed_states.map.jinja' import allowed_states %}
{% from 'salt/auto_apply.map.jinja' import AUTOAPPLY %}
{% if sls in allowed_states %}
include:
- salt.minion
- salt.master.pyinotify
- salt.master.boot_mine_update
{% if 'vrt' in salt['pillar.get']('features', []) %}
- salt.cloud
@@ -63,6 +65,21 @@ engines_config:
- name: /etc/salt/master.d/engines.conf
- source: salt://salt/files/engines.conf
{% if AUTOAPPLY.enabled %}
reactor_pushstate_config:
file.managed:
- name: /etc/salt/master.d/reactor_pushstate.conf
- source: salt://salt/files/reactor_pushstate.conf
- watch_in:
- service: salt_master_service
{% else %}
reactor_pushstate_config:
file.absent:
- name: /etc/salt/master.d/reactor_pushstate.conf
- watch_in:
- service: salt_master_service
{% endif %}
# update the bootstrap script when used for salt-cloud
salt_bootstrap_cloud:
file.managed:
@@ -78,7 +95,7 @@ salt_master_service:
- file: checkmine_engine
- file: pillarWatch_engine
- file: engines_config
- order: last
- order: 9002
{% else %}
+20
View File
@@ -0,0 +1,20 @@
# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
# https://securityonion.net/license; you may not use this file except in compliance with the
# Elastic License 2.0.
pyinotify_module_package:
file.recurse:
- name: /opt/so/conf/salt/module_packages/pyinotify
- source: salt://salt/module_packages/pyinotify
- clean: True
- makedirs: True
pyinotify_python_module_install:
cmd.run:
- name: /opt/saltstack/salt/bin/python3.10 -m pip install pyinotify --no-index --find-links=/opt/so/conf/salt/module_packages/pyinotify/ --upgrade
- onchanges:
- file: pyinotify_module_package
- failhard: True
- watch_in:
- service: salt_minion_service
-1
View File
@@ -2,4 +2,3 @@
salt:
minion:
version: '3006.19'
check_threshold: 3600 # in seconds, threshold used for so-salt-minion-check. any value less than 600 seconds may cause a lot of salt-minion restarts since the job to touch the file occurs every 5-8 minutes by default
+20 -2
View File
@@ -111,13 +111,17 @@ mark_setup_complete_for_upgrades:
{% endif %}
# this has to be outside the if statement above since there are <requisite>_in calls to this state
# this has to be outside the if statement above since there are <requisite>_in calls to this state.
# uses watch (not listen) so the restart fires in-state and its result lands on this state's
# running entry; that is what lets wait_for_salt_minion_ready below detect any restart
# uniformly via onchanges, regardless of whether the trigger came from these files or from
# external watch_in's (e.g. beacons, master/pyinotify).
salt_minion_service:
service.running:
- name: salt-minion
- enable: True
- onlyif: test "{{INSTALLEDSALTVERSION}}" == "{{SALTVERSION}}"
- listen:
- watch:
- file: mine_functions
{% if INSTALLEDSALTVERSION|string == SALTVERSION|string %}
- file: set_log_levels
@@ -126,3 +130,17 @@ salt_minion_service:
- file: signing_policy
{% endif %}
- order: last
# block until the just-restarted salt-minion is back and can execute modules locally, so
# follow-on jobs and the next highstate iteration do not race the restart. onchanges +
# require on salt_minion_service catches every restart trigger uniformly because watch
# mod_watch results replace the service state's running entry. wait logic lives in
# /usr/sbin/so-salt-minion-wait (deployed by common_sbin from common/tools/sbin/).
wait_for_salt_minion_ready:
cmd.run:
- name: /usr/sbin/so-salt-minion-wait
- onchanges:
- service: salt_minion_service
- require:
- service: salt_minion_service
- order: last
+17
View File
@@ -0,0 +1,17 @@
{% from 'vars/globals.map.jinja' import GLOBALS %}
{% from 'salt/auto_apply.map.jinja' import AUTOAPPLY %}
{% if GLOBALS.is_manager and AUTOAPPLY.enabled %}
push_drain_schedule:
schedule.present:
- function: cmd.run
- job_args:
- /usr/sbin/so-push-drainer
- seconds: {{ AUTOAPPLY.drain_interval }}
- maxrunning: 1
- return_job: False
{% elif GLOBALS.is_manager %}
push_drain_schedule:
schedule.absent:
- name: push_drain_schedule
{% endif %}
+2
View File
@@ -0,0 +1,2 @@
{% import_yaml 'salt/defaults.yaml' as SALT_DEFAULTS %}
{% set SCHEDULEMERGED = salt['pillar.get']('salt:schedule', SALT_DEFAULTS.salt.schedule, merge=True) %}
+39
View File
@@ -0,0 +1,39 @@
salt:
auto_apply:
enabled:
description: Master kill-switch for the active push feature. When disabled, rule and pillar changes are picked up at the next scheduled highstate instead of being pushed immediately.
forcedType: bool
helpLink: push
global: True
debounce_seconds:
description: Trailing-edge debounce window in seconds. A push intent must be quiet for this long before the drainer dispatches. Rapid bursts of edits within this window coalesce into one dispatch.
forcedType: int
helpLink: push
global: True
advanced: True
drain_interval:
description: How often the push drainer checks for ready intents, in seconds. Small values lower dispatch latency at the cost of more background work on the manager.
forcedType: int
helpLink: push
global: True
advanced: True
batch:
description: "Host batch size for push orchestrations. A number (e.g. '10') or a percentage (e.g. '25%'). Limits how many minions run the push state at once so large fleets don't thundering-herd."
helpLink: push
global: True
advanced: True
regex: '^([0-9]+%?)$'
regexFailureMessage: Enter a whole number or a whole-number percentage (e.g. 10 or 25%).
batch_wait:
description: Seconds to wait between host batches in a push orchestration. Gives the fleet time to breathe between waves.
forcedType: int
helpLink: push
global: True
advanced: True
schedule:
highstate_interval_hours:
description: How often every minion in the grid runs a scheduled state.highstate, in hours. Lower values keep minions closer in sync at the cost of more load; higher values reduce load but increase worst-case latency for non-pushed changes. The salt-minion health check restarts a minion if its last highstate is older than this value plus one hour.
forcedType: int
helpLink: push
global: True
advanced: True
+35
View File
@@ -0,0 +1,35 @@
#!/bin/bash
#
# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
# https://securityonion.net/license; you may not use this file except in compliance with the
# Elastic License 2.0.
# Block until the local salt-minion service is back up and can execute modules locally.
# Invoked from the wait_for_salt_minion_ready state in salt/minion/init.sls after
# salt_minion_service fires its watch-driven mod_watch (a non-blocking systemctl restart),
# so follow-on jobs and the next highstate iteration do not race the in-flight restart.
. /usr/sbin/so-common
# Initial sleep gives the systemctl restart (--no-block by default for salt-minion on
# >=3006.15) time to begin tearing down the old process before we probe for readiness.
INITIAL_SLEEP=3
TIMEOUT=120
PING_TIMEOUT=5
sleep "$INITIAL_SLEEP"
elapsed="$INITIAL_SLEEP"
while [ "$elapsed" -lt "$TIMEOUT" ]; do
if systemctl is-active --quiet salt-minion \
&& salt-call --local --timeout="$PING_TIMEOUT" --out=quiet test.ping >/dev/null 2>&1; then
echo "salt-minion ready after ${elapsed}s"
exit 0
fi
sleep 1
elapsed=$((elapsed + 1))
done
echo "salt-minion did not become ready within ${TIMEOUT}s" >&2
exit 1
-10
View File
@@ -1,10 +0,0 @@
{% from 'vars/globals.map.jinja' import GLOBALS %}
highstate_schedule:
schedule.present:
- function: state.highstate
- minutes: 15
- maxrunning: 1
{% if not GLOBALS.is_manager %}
- splay: 120
{% endif %}
+1
View File
@@ -14,6 +14,7 @@ include:
so-sensoroni:
docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-soc:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- network_mode: host
- binds:
- /nsm/import:/nsm/import:rw
+1
View File
@@ -18,6 +18,7 @@ include:
so-soc:
docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-soc:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- hostname: soc
- name: so-soc
- networks:
+4
View File
@@ -47,6 +47,10 @@ strelka_backend:
- {{ ULIMIT.name }}={{ ULIMIT.soft }}:{{ ULIMIT.hard }}
{% endfor %}
{% endif %}
# Intentionally `on-failure` (not unless-stopped) -- strelka backend shuts
# down cleanly during rule reloads and we do not want those clean exits to
# trigger an auto-restart. Do not homogenize; see the container
# auto-restart section of the plan.
- restart_policy: on-failure
- watch:
- file: strelkasensorcompiledrules
+1
View File
@@ -15,6 +15,7 @@ include:
strelka_coordinator:
docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-redis:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- name: so-strelka-coordinator
- networks:
- sobridge:
+1
View File
@@ -15,6 +15,7 @@ include:
strelka_filestream:
docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-strelka-manager:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- binds:
- /opt/so/conf/strelka/filestream/:/etc/strelka/:ro
- /nsm/strelka:/nsm/strelka
+1
View File
@@ -15,6 +15,7 @@ include:
strelka_frontend:
docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-strelka-manager:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- binds:
- /opt/so/conf/strelka/frontend/:/etc/strelka/:ro
- /nsm/strelka/log/:/var/log/strelka/:rw
+1
View File
@@ -15,6 +15,7 @@ include:
strelka_gatekeeper:
docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-redis:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- name: so-strelka-gatekeeper
- networks:
- sobridge:
+1
View File
@@ -15,6 +15,7 @@ include:
strelka_manager:
docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-strelka-manager:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- binds:
- /opt/so/conf/strelka/manager/:/etc/strelka/:ro
{% if DOCKERMERGED.containers['so-strelka-manager'].custom_bind_mounts %}
+4 -2
View File
@@ -18,6 +18,7 @@ so-suricata:
docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-suricata:{{ GLOBALS.so_version }}
- privileged: True
- restart_policy: unless-stopped
- environment:
- INTERFACE={{ GLOBALS.sensor.interface }}
{% if DOCKERMERGED.containers['so-suricata'].extra_env %}
@@ -65,10 +66,11 @@ so-suricata:
- file: suriclassifications
surirulereload:
cmd.run:
cmd.run:
- name: /usr/sbin/so-suricata-reload-rules >> /opt/so/log/suricata/reload.log 2>&1
- onchanges:
- onchanges:
- file: surirulesync
- onlyif: test -f /opt/so/rules/suricata/all-rulesets.rules
- require:
- docker_container: so-suricata
@@ -7,5 +7,59 @@
. /usr/sbin/so-common
retry 60 3 'docker exec so-suricata /opt/suricata/bin/suricatasc -c reload-rules /var/run/suricata/suricata-command.socket' '{"message":"done","return":"OK"}' || fail "The Suricata container was not ready in time."
retry 60 3 'docker exec so-suricata /opt/suricata/bin/suricatasc -c ruleset-reload-nonblocking /var/run/suricata/suricata-command.socket' '{"message":"done","return":"OK"}' || fail "The Suricata container was not ready in time."
RULES_FILE="/opt/so/rules/suricata/all-rulesets.rules"
SOCKET="/var/run/suricata/suricata-command.socket"
SURICATASC="docker exec so-suricata /opt/suricata/bin/suricatasc"
# Format an epoch as a human-readable local timestamp for log messages.
fmt_time() { date -d "@$1" '+%Y-%m-%d %H:%M:%S %Z' 2>/dev/null; }
# Prefix each input line with the current timestamp.
timestamp_lines() { while IFS= read -r line; do printf '%s %s\n' "$(date '+%Y-%m-%d %H:%M:%S %Z')" "$line"; done; }
# Epoch of Suricata's last *completed* ruleset reload; non-zero return on failure.
suricata_reload_epoch() {
local out ts
out=$($SURICATASC -c ruleset-reload-time "$SOCKET" 2>/dev/null)
ts=$(echo "$out" | jq -r '.message[0].last_reload // empty' 2>/dev/null)
[ -n "$ts" ] || return 1
date -d "$ts" +%s 2>/dev/null
}
# Trigger a fresh reload and confirm Suricata is running a ruleset at least as new
# as the rules file. Returns 0 only when both hold, so retry keeps going until an
# in-progress reload clears and our own reload completes.
reload_and_verify() {
local out reload_epoch
out=$($SURICATASC -c reload-rules "$SOCKET")
echo "reload-rules: $out"
if [[ "$out" =~ "Reload already in progress" ]]; then
echo "A reload is already in progress; waiting for it to clear so a fresh reload can load the current ruleset."
return 1
fi
if [[ ! "$out" =~ '{"message":"done","return":"OK"}' ]]; then
echo "Suricata not ready or unexpected reload output; will retry."
return 1
fi
reload_epoch=$(suricata_reload_epoch) || { echo "Could not read ruleset-reload-time; will retry."; return 1; }
if [ "$reload_epoch" -ge "$target_mtime" ]; then
echo "Loaded ruleset is current: last reload ($(fmt_time "$reload_epoch")) is newer than rules file ($(fmt_time "$target_mtime"))."
return 0
fi
echo "Loaded ruleset is stale: last reload ($(fmt_time "$reload_epoch")) is older than rules file ($(fmt_time "$target_mtime")); retrying."
return 1
}
# Run the reload/verify, timestamping every line of output (ours and the
# retry/fail helpers') so reload.log shows when each step ran. The pipeline is
# synchronous, so the log is fully flushed and ordered before we exit; the
# script's real exit code is preserved via PIPESTATUS.
{
# Epoch mtime of the ruleset we need Suricata to have loaded. Captured once so
# a file update mid-reload does not move the goalpost.
target_mtime=$(stat -c %Y "$RULES_FILE") || fail "Could not stat the Suricata rules file: $RULES_FILE"
retry 60 3 'reload_and_verify' || fail "Suricata did not load the current ruleset in time."
} 2>&1 | timestamp_lines
exit "${PIPESTATUS[0]}"
+1
View File
@@ -7,6 +7,7 @@ so-tcpreplay:
docker_container.running:
- network_mode: "host"
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-tcpreplay:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- name: so-tcpreplay
- user: root
- interactive: True
+1
View File
@@ -18,6 +18,7 @@ include:
so-telegraf:
docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-telegraf:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- user: 939
- group_add: 939,920
- environment:
+2 -8
View File
@@ -19,7 +19,7 @@ base:
- repo.client
- versionlock
- ntp
- schedule
- salt.highstate_schedule
- logrotate
# manager node on proper salt version with empty node_data pillar
@@ -55,6 +55,7 @@ base:
- motd
- salt.minion-check
- salt.lasthighstate
- salt.push_drain_schedule
- common
- docker
- docker_clean
@@ -83,7 +84,6 @@ base:
- zeek
- strelka
- elastalert
- utility
- elasticfleet
- pcap.cleanup
@@ -113,7 +113,6 @@ base:
- zeek
- strelka
- elastalert
- utility
- elasticfleet
- stig
- kafka
@@ -141,7 +140,6 @@ base:
- elastic-fleet-package-registry
- kibana
- elastalert
- utility
- elasticfleet
- stig
- kafka
@@ -168,7 +166,6 @@ base:
- elastic-fleet-package-registry
- kibana
- elastalert
- utility
- elasticfleet
- kafka
@@ -198,7 +195,6 @@ base:
- elastic-fleet-package-registry
- kibana
- elastalert
- utility
- elasticfleet
- stig
- kafka
@@ -222,7 +218,6 @@ base:
- elasticsearch
- elastic-fleet-package-registry
- kibana
- utility
- suricata
- zeek
- elasticfleet
@@ -300,7 +295,6 @@ base:
- nginx
- elasticfleet
- elasticfleet.install_agent_grid
- schedule
- stig
'*_hypervisor and I@features:vrt and G@saltversion:{{saltversion}}':
-29
View File
@@ -1,29 +0,0 @@
#!/bin/bash
# Wait for ElasticSearch to come up, so that we can query for version infromation
echo -n "Waiting for ElasticSearch..."
COUNT=0
ELASTICSEARCH_CONNECTED="no"
while [[ "$COUNT" -le 30 ]]; do
curl -K /opt/so/conf/elasticsearch/curl.config -k --output /dev/null --silent --head --fail -L https://{{ GLOBALS.manager_ip }}:9200
if [ $? -eq 0 ]; then
ELASTICSEARCH_CONNECTED="yes"
echo "connected!"
break
else
((COUNT+=1))
sleep 1
echo -n "."
fi
done
if [ "$ELASTICSEARCH_CONNECTED" == "no" ]; then
echo
echo -e "Connection attempt timed out. Unable to connect to ElasticSearch. \nPlease try: \n -checking log(s) in /var/log/elasticsearch/\n -running 'docker ps' \n -running 'sudo so-elastic-restart'"
echo
exit
fi
echo "Applying cross cluster search config..."
curl -K /opt/so/conf/elasticsearch/curl.config -s -k -XPUT -L https://{{ GLOBALS.manager_ip }}:9200/_cluster/settings \
-H 'Content-Type: application/json' \
-d "{\"persistent\": {\"search\": {\"remote\": {\"{{ grains.host }}\": {\"seeds\": [\"127.0.0.1:9300\"]}}}}}"
-22
View File
@@ -1,22 +0,0 @@
{% from 'allowed_states.map.jinja' import allowed_states %}
{% from 'vars/globals.map.jinja' import GLOBALS %}
{% if sls in allowed_states %}
{% if grains['role'] in ['so-eval', 'so-import'] %}
fixsearch:
cmd.script:
- shell: /bin/bash
- cwd: /opt/so
- source: salt://utility/bin/eval
- template: jinja
- defaults:
GLOBALS: {{ GLOBALS }}
{% endif %}
{% else %}
{{sls}}_state_not_allowed:
test.fail_without_changes:
- name: {{sls}}_state_not_allowed
{% endif %}
+1
View File
@@ -18,6 +18,7 @@ so-zeek:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-zeek:{{ GLOBALS.so_version }}
- start: True
- privileged: True
- restart_policy: unless-stopped
{% if DOCKERMERGED.containers['so-zeek'].ulimits %}
- ulimits:
{% for ULIMIT in DOCKERMERGED.containers['so-zeek'].ulimits %}
+18 -17
View File
@@ -29,8 +29,12 @@ title() {
}
fail_setup() {
local err_msg=$1
if [[ -n "$err_msg" ]]; then
error "$err_msg"
fi
error "Setup encountered an unrecoverable failure, exiting"
touch /root/failure
echo "setup incomplete: $err_msg" > /root/failure
exit 1
}
@@ -697,7 +701,7 @@ compare_main_nic_ip() {
EOM
[[ -n $TESTING ]] || whiptail --title "$whiptail_title" --msgbox "$message" 11 75
kill -SIGINT "$(ps --pid $$ -oppid=)"; fail_setup
kill -SIGINT "$(ps --pid $$ -oppid=)"; fail_setup "Main IP mismatch"
fi
else
# Setup uses MAINIP, but since we ignore the equality condition when using a VPN
@@ -755,8 +759,7 @@ configure_management_bond() {
info "Setting up $bond_name management interface with mode $bond_mode"
if [[ ${#MBNICS[@]} -eq 0 ]]; then
error "[ERROR] No management bond NICs were selected."
fail_setup
fail_setup "No management bond NICs selected"
fi
nmcli -t -f NAME con show | grep -Fxq "$bond_name"
@@ -914,8 +917,7 @@ detect_os() {
is_rpm=true
is_supported=true
else
info "This OS is not supported. Security Onion requires Oracle Linux 9."
fail_setup
fail_setup "This OS is not supported. Security Onion requires Oracle Linux 9."
fi
info "Found OS: $OS $OSVER"
@@ -923,7 +925,7 @@ detect_os() {
download_elastic_agent_artifacts() {
if ! update_elastic_agent 2>&1 | tee -a "$setup_log"; then
fail_setup
fail_setup "Failed to update Elastic Agent"
fi
}
@@ -1433,7 +1435,7 @@ make_some_dirs() {
mkdir -p $local_salt_dir/salt/firewall/portgroups
mkdir -p $local_salt_dir/salt/firewall/ports
for THEDIR in bpf elasticsearch ntp firewall redis backup influxdb postgres strelka sensoroni soc docker zeek suricata nginx telegraf logstash soc manager kratos hydra idh elastalert stig global kafka versionlock hypervisor vm; do
for THEDIR in bpf elasticsearch ntp firewall redis backup influxdb postgres strelka sensoroni soc docker zeek suricata nginx telegraf logstash soc manager kratos hydra idh elastalert stig global salt kafka versionlock hypervisor vm; do
mkdir -p $local_salt_dir/pillar/$THEDIR
touch $local_salt_dir/pillar/$THEDIR/adv_$THEDIR.sls
touch $local_salt_dir/pillar/$THEDIR/soc_$THEDIR.sls
@@ -1567,7 +1569,7 @@ proxy_validate() {
error "Received error: $proxy_test_err"
if [[ -n $TESTING ]]; then
error "Exiting setup"
kill -SIGINT "$(ps --pid $$ -oppid=)"; fail_setup
kill -SIGINT "$(ps --pid $$ -oppid=)"; fail_setup "Proxy validation failed"
fi
fi
return $ret
@@ -1774,8 +1776,7 @@ ensure_pyyaml() {
local result=$?
set +o pipefail
if [[ $result -ne 0 ]] || ! rpm -q python3-pyyaml >/dev/null 2>&1; then
error "Failed to install python3-pyyaml (exit=$result)"
fail_setup
fail_setup "Failed to install python3-pyyaml (exit=$result)"
fi
info "python3-pyyaml installed successfully"
}
@@ -1910,8 +1911,8 @@ repo_sync_local() {
if [[ ! $is_airgap ]]; then
curl --retry 5 --retry-delay 60 -A "netinstall/$SOVERSION/$OS/$(uname -r)/1" https://sigs.securityonion.net/checkup --output /tmp/install
retry 5 60 "dnf reposync --norepopath -g --delete -m -c /opt/so/conf/reposync/repodownload.conf --repoid=securityonionsync --download-metadata -p /nsm/repo/" >> "$setup_log" 2>&1 || fail_setup
retry 5 60 "dnf reposync --norepopath -g --delete -m -c /opt/so/conf/reposync/repodownload.conf --repoid=securityonionkernel --download-metadata -p /nsm/kernelrepo/" >> "$setup_log" 2>&1 || fail_setup
retry 5 60 "dnf reposync --norepopath -g --delete -m -c /opt/so/conf/reposync/repodownload.conf --repoid=securityonionsync --download-metadata -p /nsm/repo/" >> "$setup_log" 2>&1 || fail_setup "Failed to sync repos"
retry 5 60 "dnf reposync --norepopath -g --delete -m -c /opt/so/conf/reposync/repodownload.conf --repoid=securityonionkernel --download-metadata -p /nsm/kernelrepo/" >> "$setup_log" 2>&1 || fail_setup "Failed to sync kernel repos"
# After the download is complete run createrepo
create_repo
fi
@@ -1924,10 +1925,10 @@ saltify() {
if [[ $waitforstate ]]; then
# install all for a manager
retry 30 10 "bash ../salt/salt/scripts/bootstrap-salt.sh -r -M -X stable $SALTVERSION" || fail_setup
retry 30 10 "bash ../salt/salt/scripts/bootstrap-salt.sh -r -M -X stable $SALTVERSION" || fail_setup "Failed to install salt master"
else
# just a minion
retry 30 10 "bash ../salt/salt/scripts/bootstrap-salt.sh -r -X stable $SALTVERSION" || fail_setup
retry 30 10 "bash ../salt/salt/scripts/bootstrap-salt.sh -r -X stable $SALTVERSION" || fail_setup "Failed to install salt minion"
fi
salt_install_module_deps
@@ -1999,7 +2000,7 @@ set_main_ip() {
info "MAINIP=$MAINIP"
info "MNIC_IP=$MNIC_IP"
whiptail_error_message "The management IP could not be determined. Please check the log at /root/sosetup.log and verify the network configuration. Select OK to exit."
fail_setup
fail_setup "Could not determine MAINIP or MNIC_IP"
fi
sleep 1
done
@@ -2203,7 +2204,7 @@ set_initial_firewall_access() {
set_management_interface() {
title "Setting up the main interface"
if [[ $MNIC == "bond1" ]]; then
configure_management_bond || fail_setup
configure_management_bond || fail_setup "Failed to configure management bond"
fi
if [ "$address_type" = 'DHCP' ]; then
+5 -9
View File
@@ -90,8 +90,7 @@ if [[ "$setup_type" == 'iso' ]]; then
if [[ $is_rpm ]]; then
is_iso=true
else
echo "Only use 'so-setup iso' for an ISO install on Security Onion ISO images. Please run 'so-setup network' instead."
fail_setup
fail_setup "Only use 'so-setup iso' for an ISO install on Security Onion ISO images. Please run 'so-setup network' instead."
fi
fi
@@ -130,7 +129,7 @@ catch() {
info "Fatal error occurred at $1 in so-setup, failing setup."
grep --color=never "ERROR" "$setup_log" > "$error_log"
whiptail_setup_failed
fail_setup
fail_setup "Fatal error occurred at $1 in so-setup"
}
# Add the progress function for manager node type installs
@@ -238,8 +237,7 @@ case "$setup_type" in
info "Beginning Security Onion $setup_type install"
;;
*)
error "Invalid install type, must be 'iso', 'network' or 'desktop'."
fail_setup
fail_setup "Invalid install type, must be 'iso', 'network' or 'desktop'."
;;
esac
@@ -773,8 +771,7 @@ if ! [[ -f $install_opt_file ]]; then
logCmd "salt-call state.apply -l info registry"
title "Seeding the docker registry"
if ! docker_seed_registry; then
error "Failed to seed the docker registry"
fail_setup
fail_setup "Failed to seed the docker registry"
fi
title "Applying the manager state"
logCmd "salt-call state.apply -l info manager"
@@ -797,8 +794,7 @@ if ! [[ -f $install_opt_file ]]; then
title "Setting up Elastic Fleet"
logCmd "salt-call state.apply elasticfleet.config"
if ! logCmd so-elastic-fleet-setup; then
error "Failed to run so-elastic-fleet-setup"
fail_setup
fail_setup "Failed to run so-elastic-fleet-setup"
fi
mark_setup_complete
set_initial_firewall_access
+3 -3
View File
@@ -143,15 +143,15 @@ main() {
cat $error_log
echo "--------------------------"
exit_code=1
touch /root/failure
echo "Found setup errors. Check $error_log for details" > /root/failure
elif using_iso && cron_error_in_mail_spool; then
echo "WARNING: Unexpected cron job output in mail spool"
exit_code=1
touch /root/failure
echo "Unexpected cron job output found in /var/spool/mail/" > /root/failure
elif is_manager_node && status_failed; then
echo "WARNING: Containers are not in a healthy state"
exit_code=1
touch /root/failure
echo "Containers are not in a healthy state. Check so-status for details" > /root/failure
else
echo "Successfully completed setup!"
touch /root/success