Compare commits

...

192 Commits

Author SHA1 Message Date
Josh Patterson 8c17ae0f66 move so-salt-minion-wait 2026-06-01 14:48:54 -04:00
Josh Patterson f54939b444 Replace inotify pillar watch with postgres audit_settings beacon
The active-push feature detected pillar/settings changes via an inotify
beacon on the manager watching /opt/so/saltstack/local/pillar. Replace
that pillar watch with a custom salt beacon (pillar_db) that polls the
SOC so_soc.audit_settings table on a monotonic id watermark, so changes
made through SOC drive immediate pushes from the database instead of the
files. The suricata/strelka rule inotify watches (and pyinotify) are kept
unchanged, since rule-file edits are not recorded in audit_settings.

- salt/_beacons/pillar_db.py: new beacon. Polls audit_settings via
  `docker exec so-postgres psql` (unix-socket trust auth), tracks the last
  processed id in /opt/so/state/pillar_db_watch.id, seeds to MAX(id) on
  first run (no history replay), and emits one event per new row.
- salt/reactor/push_pillar.sls: consume setting_id/node_id from the beacon
  event instead of a file path. App = first dotted segment of setting_id,
  looked up in pillar_push_map.yaml. Empty node_id -> grid-wide actions as
  is; populated node_id -> the app's state(s) retargeted to that one node.
- salt/manager/files/beacons_pushstate.conf.jinja: drop the pillar inotify
  block, add the pillar_db beacon (interval = push.drain_interval); keep
  the suricata/strelka inotify watches.
- salt/salt/files/reactor_pushstate.conf: map salt/beacon/*/pillar_db/
  audit_settings to push_pillar.sls; remove the pillar inotify reactor
  lines; keep suricata/strelka.

The intent -> so-push-drainer -> orch.push_batch pipeline is unchanged.
Verified end-to-end on a standalone: a grid-wide telegraf.output change
re-applied telegraf fleetwide (container replaced), and a per-host
ntp.config.servers change applied ntp to only that node.
2026-05-29 14:55:13 -04:00
Josh Patterson d48a22e37e Merge pull request #15944 from Security-Onion-Solutions/jertel/wip
Jertel/wip
2026-05-28 14:01:42 -04:00
Josh Patterson 9a70a06b3b Merge remote-tracking branch 'origin/3/dev' into jertel/wip 2026-05-28 13:55:12 -04:00
Mike Reeves 526d739b3b Merge pull request #15940 from Security-Onion-Solutions/TOoSmOotH-patch-4
Remove outdated HOTFIX version number
2026-05-28 10:25:28 -04:00
Mike Reeves 68d783e760 Remove outdated HOTFIX version number 2026-05-28 10:24:47 -04:00
Mike Reeves 1e9b6b0975 Merge pull request #15939 from Security-Onion-Solutions/3/main
main to dev for hotfix
2026-05-28 10:24:21 -04:00
Mike Reeves 2131e7d450 Merge pull request #15937 from Security-Onion-Solutions/hotfix/3.1.0
Hotfix/3.1.0
2026-05-28 10:20:53 -04:00
Mike Reeves 2a2d853ac4 Merge pull request #15936 from Security-Onion-Solutions/hotfix310
3.1.0 hotfix
2026-05-28 09:53:00 -04:00
Mike Reeves 5abd6de4b5 3.1.0 hotfix 2026-05-28 09:34:17 -04:00
Josh Patterson bb8ae91d91 fix so-soc postgres bootstrap 2026-05-27 16:39:52 -04:00
Josh Patterson 93ffce98d7 add onionconfig and postgres modules to soc config 2026-05-27 15:07:25 -04:00
Jorge Reyes 5599cce22c Merge pull request #15934 from Security-Onion-Solutions/reyesj2-patch-1
keep logstash lumberjack pipeline name update unified
2026-05-27 13:37:41 -05:00
reyesj2 b2a82fec29 fix_logstash_0013_lumberjack_pipeline_name
Before removing from apply_hotfix function first verify that older installs < 3.1.0 are still upgradable when referencing 'so/0013_input_lumberjack_fleet.conf' via pillar. Failure to do so will prevent logstash from starting
2026-05-27 13:24:23 -05:00
reyesj2 613eca52fc update hotfix date 2026-05-27 13:24:10 -05:00
Josh Patterson 79987f3659 bootstrap so-soc db in postgres during soup 2026-05-27 13:55:30 -04:00
reyesj2 bf609a112e LF 2026-05-27 12:21:44 -05:00
reyesj2 0b4a4de609 always run logstash pipeline rename 2026-05-27 12:21:22 -05:00
Jorge Reyes ad376d2a43 Merge pull request #15930 from Security-Onion-Solutions/reyesj2-patch-1
check for stale logstash pipeline name in local pillar
2026-05-27 10:16:39 -05:00
reyesj2 0834998cca usuable for next soup 2026-05-27 09:52:29 -05:00
reyesj2 473f93f0ee check for stale logstash pipeline name in pillars 2026-05-27 09:33:15 -05:00
Josh Patterson 16055c4d88 Merge remote-tracking branch 'origin/3/dev' into jertel/wip 2026-05-27 09:18:33 -04:00
Josh Patterson 6393d08e86 merge 2026-05-27 08:59:28 -04:00
Jorge Reyes 7cc2e045fb Merge pull request #15925 from Security-Onion-Solutions/reyesj2/soup-heavynode
use multiple or combined input
2026-05-26 08:34:33 -05:00
Mike Reeves 6955ee73bf Merge pull request #15924 from Security-Onion-Solutions/TOoSmOotH-patch-3
Add version number to HOTFIX file
2026-05-26 09:28:41 -04:00
Mike Reeves c0272ddb81 Add version number to HOTFIX file 2026-05-26 09:24:10 -04:00
reyesj2 d72219c586 use multiple or combined input 2026-05-22 20:04:21 -05:00
Mike Reeves ffd34d4e0e Merge pull request #15919 from Security-Onion-Solutions/TOoSmOotH-patch-2
Add 3.2.0 option to discussion template
2026-05-21 15:58:28 -04:00
Mike Reeves aa78978740 Add 3.2.0 option to discussion template 2026-05-21 15:57:57 -04:00
Mike Reeves 75d4f5e496 Merge pull request #15918 from Security-Onion-Solutions/TOoSmOotH-patch-1
Bump version from 3.1.0 to 3.2.0
2026-05-21 15:49:08 -04:00
Mike Reeves 89a28d2cfe Bump version from 3.1.0 to 3.2.0 2026-05-21 15:45:58 -04:00
Mike Reeves c1d187599b Merge pull request #15912 from Security-Onion-Solutions/3/dev
3.1.0
2026-05-21 15:41:50 -04:00
Mike Reeves d87313db27 Merge pull request #15911 from Security-Onion-Solutions/3.1.0
3.1.0
2026-05-21 13:50:23 -04:00
Mike Reeves 141a61f5b5 3.1.0 2026-05-21 13:47:03 -04:00
Jorge Reyes 901cbf03e4 Merge pull request #15907 from Security-Onion-Solutions/reyesj2/es-verify-compat
Verify compatibility for all ES nodes in the cluster
2026-05-20 14:16:41 -05:00
reyesj2 b485be4602 separate salt-key command from main es version compatiblity loop 2026-05-20 14:12:58 -05:00
reyesj2 7d13007aa9 block soup if all ES nodes are not online and reporting their ES version for compatibility check 2026-05-20 10:03:37 -05:00
reyesj2 d7a1b67095 use pipefail on heavynode versino command to pass through error 2026-05-20 09:16:57 -05:00
reyesj2 6c8997b28a verify all heavynodes and all searchnodes are at compatible ES version before attempting an elasticsearch upgrade 2026-05-19 22:27:31 -05:00
Jorge Reyes 58f1d08ebe Merge pull request #15902 from Security-Onion-Solutions/reyesj2/ea-fleet-sync
sync elastic agent packages to fleet nodes
2026-05-19 11:08:48 -05:00
reyesj2 d0aa33a255 sync elastic agent packages to fleet nodes 2026-05-19 10:50:17 -05:00
Josh Patterson 730c828bec Merge remote-tracking branch 'origin/jertel/wip' into saltthangs 2026-05-19 10:23:45 -04:00
Jorge Reyes 74b50f6009 Merge pull request #15899 from Security-Onion-Solutions/revert-15895-reyesj2/agentinstall
Revert "use -verify flag during grid agent install to ensure agent health"
2026-05-16 10:01:58 -05:00
Jorge Reyes e89c820b65 Revert "use -verify flag during grid agent install to ensure agent health" 2026-05-16 09:59:14 -05:00
Jorge Reyes 9ac05a6ad1 Merge pull request #15895 from Security-Onion-Solutions/reyesj2/agentinstall
use -verify flag during grid agent install to ensure agent health
2026-05-15 12:58:09 -05:00
Jason Ertel 24ee3318bc Merge pull request #15898 from Security-Onion-Solutions/jertel/logcheck
exclude fps
2026-05-15 11:38:20 -04:00
Jason Ertel ce566ba174 exclude fps 2026-05-15 11:36:46 -04:00
Mike Reeves 2635a60a8c Merge pull request #15896 from Security-Onion-Solutions/quickfixes2
Make so-postgres-backup fail-safe against silent corruption
2026-05-15 09:32:15 -04:00
Mike Reeves 244a73b7a2 Make so-postgres-backup fail-safe against silent corruption
The dump pipeline returned gzip's exit status, so a pg_dumpall that
died mid-stream still produced a valid .gz holding a truncated dump,
written straight to the final filename. The idempotency check then
blocked retries for the day and the corrupt file counted toward
retention, evicting a good backup each day until none remained.

- set -o pipefail so a failed pg_dumpall fails the pipeline
- dump to a .tmp file and atomically rename only after success, so
  the final filename appears only for a complete backup
- gzip -t integrity check before publishing
- trap-based cleanup of the temp file; sweep stale temps at startup
- run retention only after a successful backup, with a glob
  restricted to finished backups
- log timestamped OK/ERROR outcomes to /opt/so/log/postgres/backup.log
2026-05-15 08:48:54 -04:00
Jason Ertel e45ad45d73 Merge branch '3/dev' into jertel/wip 2026-05-14 18:33:40 -04:00
Mike Reeves 1189621ec5 Merge pull request #15893 from Security-Onion-Solutions/quickfixes2 2026-05-14 18:21:30 -04:00
reyesj2 d2524a593f use -verify flag during grid agent install to ensure agent health 2026-05-14 17:12:02 -05:00
Josh Brower f2ab2354fd Merge pull request #15894 from Security-Onion-Solutions/3/nginx-fix
Tweak for nginx upgrade
2026-05-14 23:20:57 +02:00
Mike Reeves 64731c73ba Fix psql :var substitution in telegraf role and retention SQL
psql does not substitute :var references inside dollar-quoted strings,
so the DO blocks in the user and retention subcommands were receiving
literal colons and failing (silently for user, via hide_output: True).
Rewrite the conditional CREATE/ALTER ROLE with SELECT format(...) \\gexec
and guard the retention UPDATE with \\gset + \\if.
2026-05-14 17:17:49 -04:00
Josh Brower 024fece607 Tweak for nginx upgrade 2026-05-14 17:08:57 -04:00
Mike Reeves 249b126312 Quote telegraf role env vars to survive YAML-special chars in passwords 2026-05-14 17:08:51 -04:00
Mike Reeves 8e38bff0c3 Rename telegraf_postgres.sh to so-telegraf-postgres 2026-05-14 16:55:53 -04:00
Mike Reeves b9f2d56932 Consolidate telegraf postgres SQL into multi-mode script
Replace inline psql heredocs in telegraf_users.sls with subcommand
dispatcher telegraf_postgres.sh: create_db, group_role, user, retention.
2026-05-14 16:37:08 -04:00
Mike Reeves 03fa01a705 Move telegraf_role.sh to postgres tools/sbin 2026-05-14 16:18:01 -04:00
Mike Reeves 450eacca41 Move telegraf role provisioning to external script with env vars 2026-05-14 16:15:54 -04:00
Mike Reeves b7a13899f7 Suppress output logging for postgres telegraf role provisioning 2026-05-14 15:56:04 -04:00
Mike Reeves 6f273d7d97 Rename init-users.sh to init-db.sh and update all references 2026-05-14 15:53:00 -04:00
Jason Ertel 907f699721 state rename 2026-05-14 11:03:08 -04:00
Jason Ertel e7a7047f71 Merge branch '3/dev' into jertel/wip 2026-05-14 11:01:36 -04:00
Josh Patterson b4e5171415 Merge remote-tracking branch 'origin/3/dev' into saltthangs 2026-05-14 08:03:45 -04:00
Josh Brower b328820c01 Merge pull request #15792 from Security-Onion-Solutions/3/strelkalnk
Fix module name
2026-05-14 13:06:26 +02:00
Jason Ertel 936295f1c4 Merge branch '3/dev' into jertel/wip 2026-05-13 17:28:25 -04:00
Jason Ertel 61ca60a94c prep for soc db config 2026-05-13 17:28:07 -04:00
Jorge Reyes 638aca97c8 Merge pull request #15877 from Security-Onion-Solutions/reyesj2-patch-1
update redis index template
2026-05-13 13:44:04 -05:00
Jorge Reyes 74a5c895e8 Merge pull request #15889 from Security-Onion-Solutions/reyesj2/zeek-ja4d
add zeek.ja4d ingest pipeline
2026-05-13 13:43:56 -05:00
Josh Patterson 84decc1db6 Merge remote-tracking branch 'origin/3/dev' into saltthangs 2026-05-13 14:09:15 -04:00
reyesj2 d56bf01823 add zeek.ja4d ingest pipeline 2026-05-13 12:32:54 -05:00
Mike Reeves d29267d9c2 Merge pull request #15888 from Security-Onion-Solutions/TOoSmOotH-patch-1
Change Telegraf output from BOTH to INFLUXDB
2026-05-13 12:47:55 -04:00
Mike Reeves 72327285b2 Change Telegraf output from BOTH to INFLUXDB 2026-05-13 11:58:21 -04:00
Josh Patterson cc7a237457 Merge pull request #15887 from Security-Onion-Solutions/m0duspwnens-patch-1
remove stig from hypervisor and managerhype
2026-05-13 10:57:58 -04:00
Josh Patterson b068ad2b35 remove stig from hypervisor and managerhype 2026-05-13 10:53:11 -04:00
Jorge Reyes b103f412b5 Merge pull request #15884 from Security-Onion-Solutions/reyesj2/strelkalnk
rename strelka ScanLNK - ScanLnk
2026-05-13 09:46:52 -05:00
reyesj2 ef79c63858 Merge branch '3/dev' of github.com:Security-Onion-Solutions/securityonion into reyesj2/strelkalnk 2026-05-12 15:20:09 -05:00
reyesj2 01fb1aa156 check pillars for ScanLNK and rename to ScanLnk 2026-05-12 15:19:44 -05:00
Doug Burks f19bdd7aae Merge pull request #15883 from Security-Onion-Solutions/reyesj2/transformhealth
use temp files to prevent jq arg too long
2026-05-12 15:36:12 -04:00
reyesj2 f637dc62d1 use temp files to prevent jq arg too long 2026-05-12 13:29:32 -05:00
Jorge Reyes 081f6fa1fb Merge pull request #15878 from Security-Onion-Solutions/reyesj2/es-ingest-lag
add ingest latency metrics
2026-05-12 10:21:04 -05:00
Josh Brower d6d90d84cd Merge pull request #15880 from Security-Onion-Solutions/feature/import-overrides
Initial commit
2026-05-12 17:00:44 +02:00
Josh Brower 125610ed42 Additional test coverage 2026-05-12 10:11:22 -04:00
Josh Brower 306b0af4d0 Initial commit 2026-05-12 09:55:06 -04:00
reyesj2 492ae80da7 add ingest latency metrics 2026-05-11 16:51:38 -05:00
Jorge Reyes 4a2177c827 update redis index template
missing redis integration component templates
2026-05-11 16:15:56 -05:00
Josh Brower 006ac31109 Merge pull request #15579 from marcopedrinazzi/3/dev
New Sigma rules pipeline mapping for M365 and Fortigate
2026-05-11 21:03:53 +02:00
Josh Patterson 7d4d6a0756 prune images if so-docker-prune exists 2026-05-08 10:13:15 -04:00
Josh Patterson 66c0a662fc convert wait to script 2026-05-08 09:26:42 -04:00
Josh Brower 49a643fff4 Merge pull request #15875 from Security-Onion-Solutions/3/sigma-fp-os
proc_creation per OS type
2026-05-08 15:13:14 +02:00
Josh Brower e1d830da76 proc_creation per OS type 2026-05-08 09:11:24 -04:00
Josh Patterson 778cc055ea wait for salt-minion service to be ready before finishing state run 2026-05-07 17:01:20 -04:00
Josh Brower e847c46129 Merge pull request #15872 from Security-Onion-Solutions/3/soc-logs
cleanup status code
2026-05-07 19:01:24 +02:00
Josh Brower 499f7102bd cleanup status code 2026-05-07 11:27:49 -04:00
Josh Patterson 932deab751 update the push map 2026-05-07 10:51:53 -04:00
Josh Patterson 1281f0ee37 Merge remote-tracking branch 'origin/3/dev' into saltthangs 2026-05-06 09:46:12 -04:00
Josh Patterson 4bc19f91ce Merge pull request #15867 from Security-Onion-Solutions/fixhype
sanitize minion ids for hypervisor reactors / orchestration
2026-05-06 09:46:01 -04:00
Josh Patterson f774334b6c Merge remote-tracking branch 'origin/3/dev' into saltthangs 2026-05-06 08:16:41 -04:00
Mike Reeves 4990d0ddea Merge pull request #15866 from Security-Onion-Solutions/management-bond1
Management bond1
2026-05-05 17:17:58 -04:00
Mike Reeves 3e49322220 Allow preconfigured management bond in requirements 2026-05-05 15:35:12 -04:00
Mike Reeves ecb92d43fc Limit management bond setup to ISO installs 2026-05-05 15:30:09 -04:00
Mike Reeves 3b714db0bf Show management bond option consistently 2026-05-05 15:22:40 -04:00
Mike Reeves f17da4e68b Add management bond setup option 2026-05-05 15:13:24 -04:00
Jorge Reyes 04cfc22e3f Merge pull request #15864 from Security-Onion-Solutions/reyesj2/patch-2
update grok type conversion to convert processor
2026-05-05 13:58:39 -05:00
reyesj2 dceed421ae update grok type conversion to convert processor 2026-05-05 13:41:00 -05:00
Josh Patterson 652ac5d61f fix regex 2026-05-05 14:26:04 -04:00
Josh Patterson f888a2ba6b Merge remote-tracking branch 'origin/3/dev' into fixhype 2026-05-05 10:28:49 -04:00
Mike Reeves 8a1ee02335 Merge pull request #15846 from Security-Onion-Solutions/feature/ensure-pyyaml
Ensure python3-pyyaml is installed before continuing setup
2026-05-05 10:24:25 -04:00
Josh Patterson 192f6cfe13 Merge remote-tracking branch 'origin/3/dev' into fixhype 2026-05-05 08:18:26 -04:00
Mike Reeves 5bca81d833 Merge pull request #15858 from Security-Onion-Solutions/security-fix
Fix unsafe PyYAML load in filecheck
2026-05-04 16:16:40 -04:00
Josh Patterson 1c6574c694 ensure minion ids 2026-05-04 14:03:14 -04:00
Mike Reeves b701664e04 Fix unsafe PyYAML load in filecheck 2026-05-04 12:09:35 -04:00
Jorge Reyes bc64f1431d Merge pull request #15857 from Security-Onion-Solutions/reyesj2/package-registry-health
fleet package registry health check
2026-05-04 11:05:23 -05:00
reyesj2 2203037ce7 fleet package registry health check 2026-05-04 10:52:37 -05:00
Jorge Reyes 77a4ad877e Merge pull request #15851 from Security-Onion-Solutions/reyesj2/integration-transforms 2026-05-01 14:11:12 -05:00
reyesj2 702b3585cc excluding additional integration transform job failures 2026-05-01 12:57:59 -05:00
reyesj2 86966d2778 reauthorize unhealthy transform jobs using kibana 9.3.3 auth flow 2026-05-01 12:44:08 -05:00
Josh Patterson 7fcace34c4 add sensoroni to push map 2026-04-30 16:09:08 -04:00
Josh Patterson 9541024eb7 fix broken things 2026-04-30 15:35:24 -04:00
Jorge Reyes ce3ad3a895 Merge pull request #15844 from Security-Onion-Solutions/reyesj2/elastic-agent-warning
update default elastic agent logging level to warning
2026-04-30 09:46:28 -05:00
Mike Reeves 3a4b7b50de ensure python3-pyyaml is installed before continuing setup 2026-04-30 10:15:09 -04:00
Josh Patterson 0d166ef732 remove trailing slashes 2026-04-30 09:53:00 -04:00
Josh Patterson f7d2994f8b filter temp files 2026-04-30 09:16:22 -04:00
reyesj2 39d0947102 update default elastic agent logging level to warning 2026-04-29 17:38:40 -05:00
Josh Patterson 8f0757606d include salt..minion 2026-04-29 16:42:19 -04:00
Josh Patterson 0a8f2e01a0 install pyinotify 2026-04-29 16:41:56 -04:00
Josh Patterson 4546d7bc52 Merge remote-tracking branch 'origin/3/dev' into saltthangs 2026-04-29 14:28:19 -04:00
Jorge Reyes 0085d9a353 Merge pull request #15842 from Security-Onion-Solutions/reyesj2-patch-1
so-elastic-fleet-outputs-update now checks for cert drift. Remove run…
2026-04-29 12:37:04 -05:00
Jorge Reyes 2f01ce3b23 so-elastic-fleet-outputs-update now checks for cert drift. Remove running --cert arg on cert change to prevent highstate from running outputs-update 2x 2026-04-29 12:33:28 -05:00
Mike Reeves 71b19c1b5f Merge pull request #15840 from Security-Onion-Solutions/fix/import-postgres-firewall
Open postgres in DOCKER-USER firewall everywhere influxdb is open
2026-04-29 09:20:03 -04:00
Mike Reeves 82e55ae87f Open postgres on every hostgroup that opens influxdb
The static defaults only listed postgres on each role's self-hostgroup,
leaving sensor/searchnode/heavynode/receiver/fleet/idh/desktop/hypervisor
hostgroups unable to reach the manager's so-postgres in distributed
grids. A dynamic block in firewall/map.jinja added postgres to those
hostgroups only when telegraf.output was switched to POSTGRES/BOTH,
which left postgres unreachable by default.

Mirror influxdb statically across manager/managerhype/managersearch/
standalone for every hostgroup that already lists influxdb, and drop
the now-redundant telegraf-gated dynamic block from firewall/map.jinja.
2026-04-29 09:09:50 -04:00
Mike Reeves 3e02001544 Open postgres port for import role in DOCKER-USER firewall
When so-postgres was wired in (868cd1187), the import role's firewall
defaults were missed while every other manager-class role (manager,
managerhype, managersearch, standalone, eval) had postgres added to
their DOCKER-USER manager-hostgroup portgroups. As a result, on a
fresh import install the so-postgres container starts but tcp/5432 is
dropped at DOCKER-USER, so soc/kratos/telegraf can't reach it.

Add postgres alongside the existing influxdb entry so import nodes
match the other roles.
2026-04-29 08:48:45 -04:00
Josh Patterson 17849d8758 Merge remote-tracking branch 'origin/3/dev' into saltthangs 2026-04-28 15:49:22 -04:00
Mike Reeves 82f70bb53a Merge pull request #15839 from Security-Onion-Solutions/fix/drop-postgres-soc-module-injection
drop postgres module from soc defaults injection
2026-04-28 15:48:49 -04:00
Mike Reeves 2dcded6cca drop postgres module from soc defaults injection
The soc binary on 3/dev does not register a postgres module, so injecting
postgres into soc.config.server.modules makes soc abort at launch with
'Module does not exist: postgres'. The soc-side module is staged on
feature/postgres but is not landing this release. Drop the injection
until the module ships; salt/postgres state and pillars are unchanged.
2026-04-28 15:46:56 -04:00
Josh Patterson d3d30a587c Merge remote-tracking branch 'origin/3/dev' into saltthangs 2026-04-28 15:30:31 -04:00
Mike Reeves 8ca59e6f0c Merge pull request #15838 from Security-Onion-Solutions/fix/docker-refresh-multiarch-pull
Fix/docker refresh multiarch pull
2026-04-28 15:14:27 -04:00
Mike Reeves 82dac82d15 drop platform/digest pull resolution
The digest-pull logic was added to make `docker push` work for multi-arch
upstream tags. Now that the push step is `docker buildx imagetools create`
pinned to the gpg-verified RepoDigest, the registry-to-registry copy
handles single- and multi-arch sources without help. Reverts the pull
back to the original line and removes the unused PLATFORM_OS/_ARCH
detection.
2026-04-28 14:54:25 -04:00
Mike Reeves 288a823edf push images via buildx imagetools create
Replaces `docker push` with a registry-to-registry copy. On Docker 29.x
with the containerd image store, `docker push` of a freshly-pulled image
hits a path that wraps single-platform manifests in a synthetic index
and then can't push the layers it claims to reference, producing
`NotFound: content digest ...` even when the image is fully present.

Keep the local `docker tag` so so-image-pull's `docker images | grep :5000`
existence check continues to work.
2026-04-28 14:49:02 -04:00
Jorge Reyes f9e3d30a71 Merge pull request #15837 from Security-Onion-Solutions/reyesj2/elastic-fleet-cert-check
check current fleet policy cert against cert on disk
2026-04-28 13:47:55 -05:00
reyesj2 9cec79b299 check current fleet policy cert against cert on disk
Co-authored-by: Copilot <copilot@github.com>
2026-04-28 13:34:39 -05:00
Mike Reeves c86399327b fix so-docker-refresh push for multi-arch source images
docker pull of a multi-arch tag on Docker 29.x leaves the local tag
pointing at the image index rather than the platform-specific manifest.
The subsequent docker push then tries to push every sub-manifest the
index references and fails on layers we never fetched.

Resolve the local-platform manifest digest from the upstream index via
docker buildx imagetools inspect, pull by that digest, and re-tag locally
to the canonical tag. The signing flow and the existing tag/push to the
embedded registry are unchanged.
2026-04-28 14:27:59 -04:00
Josh Patterson 034711d148 Merge remote-tracking branch 'origin/3/dev' into saltthangs 2026-04-28 10:47:29 -04:00
Mike Reeves fa8162de02 Merge pull request #15749 from Security-Onion-Solutions/feature/postgres
Add so-postgres Salt states and infrastructure
2026-04-28 10:15:47 -04:00
Josh Patterson 33abc429d1 Merge pull request #15835 from Security-Onion-Solutions/fix/reactor/sominon_setup
fix sominion_setup reactor
2026-04-28 08:55:58 -04:00
Jorge Reyes b22585ca90 Merge pull request #15833 from Security-Onion-Solutions/reyesj2-es933
exclude more transform job errors
2026-04-27 15:05:11 -05:00
reyesj2 9f2ca7012f exclude more transform job errors 2026-04-27 15:02:13 -05:00
Josh Patterson 21aeb68188 fix sominion_setup reactor 2026-04-27 14:30:41 -04:00
Josh Patterson 81e60ec5bf Merge pull request #15829 from Security-Onion-Solutions/fix/reinstall2
fix reinstall
2026-04-24 16:20:53 -04:00
Josh Patterson 199c2746f1 stop salt-minion and salt-master regardless of install type. display reinstall on console and save to logfile 2026-04-24 15:24:11 -04:00
Josh Patterson 8eca465ef6 uninstall elastic-agent before stopping dockers on reinstall 2026-04-24 14:35:11 -04:00
Jorge Reyes a45e59239f Merge pull request #15826 from Security-Onion-Solutions/reyesj2-es933
heavynode should run es cluster state
2026-04-24 13:07:48 -05:00
Josh Patterson 2ad0bcab7c Merge pull request #15828 from Security-Onion-Solutions/fix/annotations
readonly soc and kratos enabled
2026-04-24 14:00:02 -04:00
Josh Patterson 070d150420 readonly soc and kratos enabled 2026-04-24 13:56:35 -04:00
reyesj2 90ecbe90d8 allow heavynodes to run elasticsearch/cluster state 2026-04-24 12:56:27 -05:00
Josh Patterson 813fa03dc3 Merge pull request #15824 from Security-Onion-Solutions/fix/reinstall2
fix reinstall issue with salt
2026-04-24 12:22:54 -04:00
Josh Patterson 02381fbbe9 stop salt-cloud , belt-and-suspenders against a broken/incomplete salt RPM 2026-04-24 11:33:21 -04:00
Josh Patterson 0722b681b1 redo service stop on reinstall 2026-04-24 11:04:46 -04:00
Josh Patterson 564815e836 redo how services are stopped during reinstall 2026-04-24 10:46:29 -04:00
Jorge Reyes 88b30adf7f Merge pull request #15823 from Security-Onion-Solutions/reyesj2-es933
typo
2026-04-24 09:27:08 -05:00
reyesj2 b6acf3b522 typo 2026-04-24 09:24:58 -05:00
Jason Ertel ba55468da8 Merge pull request #15822 from Security-Onion-Solutions/jertel/wip
numeric test description
2026-04-24 08:26:55 -04:00
Jason Ertel cdd217283d numeric test description 2026-04-24 08:13:36 -04:00
Jorge Reyes 810a582717 Merge pull request #15813 from Security-Onion-Solutions/reyesj2-es933
split up Elastic Fleet state
2026-04-23 14:51:32 -05:00
Mike Reeves a6948e8dcb Remove helpLink for influxdb in soc_global.yaml
Removed helpLink for influxdb from endgamehost configuration.
2026-04-23 13:56:41 -04:00
Mike Reeves 5f35554fdc Merge pull request #15712 from Security-Onion-Solutions/soupfix
Fix soup
2026-04-23 12:39:50 -04:00
reyesj2 fdfca469cc prevent non-manager nodes from running elasticsearch.cluster state manually 2026-04-23 09:53:07 -05:00
reyesj2 5f2ec76ba8 prevent fleetnode from being able to run elasticfleet.manager state manually 2026-04-23 09:50:45 -05:00
reyesj2 b015c8ff14 remove docker import 2026-04-23 09:31:30 -05:00
reyesj2 7e70870a9e remove globals import 2026-04-23 09:25:36 -05:00
reyesj2 22b32a16dd include elasticfleet.config 2026-04-23 08:30:47 -05:00
reyesj2 22f869734e add check for files before attempting to use file pattern to load templates 2026-04-22 23:11:31 -05:00
reyesj2 398bc9e4ed update kibana discardCorruptObjects version 2026-04-22 20:38:13 -05:00
reyesj2 72dbb69a1c fix searchnodes running elasticsearch/cluster state 2026-04-22 20:37:48 -05:00
reyesj2 339959d1c0 split up elasticfleet/enabled state 2026-04-22 20:30:40 -05:00
Josh Patterson cd6707a566 Merge pull request #15800 from Security-Onion-Solutions/feature/vm-raid-status
monitor raid for vms
2026-04-22 09:42:44 -04:00
Josh Patterson edd207a9d5 soup update socloud.conf 2026-04-22 09:20:53 -04:00
Jorge Reyes 01bd3b6e06 Merge pull request #15807 from Security-Onion-Solutions/reyesj2-es933
urlencode elasticsearch version
2026-04-21 14:11:04 -05:00
reyesj2 06a555fafb urlencode elasticsearch version 2026-04-21 14:01:31 -05:00
Jason Ertel 7411031e11 Merge pull request #15803 from Security-Onion-Solutions/jertel/wip
more error handling during image updates
2026-04-21 10:21:56 -04:00
Jason Ertel 247091766c more error handling during image updates 2026-04-21 10:18:05 -04:00
Josh Patterson 7f93110d68 Merge remote-tracking branch 'origin/3/dev' into feature/vm-raid-status 2026-04-21 10:10:38 -04:00
Jason Ertel 33ef138866 Merge pull request #15797 from Security-Onion-Solutions/jertel/wip
fix template annotation
2026-04-20 17:14:53 -04:00
Jason Ertel 71da27dc8e fix template annotation 2026-04-20 17:02:25 -04:00
Josh Patterson ee437265fc monitor raid for vms 2026-04-20 12:00:02 -04:00
Josh Brower affede7f0a Rename 'ScanLNK' to 'ScanLnk' in YAML config 2026-04-20 10:01:10 -04:00
Josh Brower 97366c0496 Rename 'ScanLNK' to 'ScanLnk' in defaults.yaml 2026-04-20 10:00:29 -04:00
Mike Reeves a0cf0489d6 reduce highstate frequency with active push for rules and pillars
- schedule highstate every 2 hours (was 15 minutes); interval lives in
  global:push:highstate_interval_hours so the SOC admin UI can tune it and
  so-salt-minion-check derives its threshold as (interval + 1) * 3600
- add inotify beacon on the manager + master reactor + orch.push_batch that
  writes per-app intent files, with a so-push-drainer schedule on the manager
  that debounces, dedupes, and dispatches a single orchestration
- pillar_push_map.yaml allowlists the apps whose pillar changes trigger an
  immediate targeted state.apply (targets verified against salt/top.sls);
  edits under pillar/minions/ trigger a state.highstate on that one minion
- host-batch every push orchestration (batch: 25%, batch_wait: 15) so rule
  changes don't thundering-herd large fleets
- new global:push:enabled kill-switch tears down the beacon, reactor config,
  and drainer schedule on the next highstate for operators who want to keep
  highstate-only behavior
- set restart_policy: unless-stopped on 23 container states so docker
  recovers crashes without waiting for the next highstate; leave registry
  (always), strelka/backend (on-failure), kratos, and hydra alone with
  inline comments explaining why
2026-04-10 15:43:16 -04:00
Mike Reeves 664f3fd18a Fix soup 2026-04-01 14:47:05 -04:00
Marco Pedrinazzi d7e971a0fc m365 and fortigate mappings sigma 2026-03-11 14:49:40 +01:00
Jason Ertel 613d31c8a6 merge 2026-03-05 11:52:09 -05:00
117 changed files with 3978 additions and 442 deletions
+1
View File
@@ -11,6 +11,7 @@ body:
-
- 3.0.0
- 3.1.0
- 3.2.0
- Other (please provide detail below)
validations:
required: true
+11 -11
View File
@@ -1,17 +1,17 @@
### 3.0.0-20260331 ISO image released on 2026/03/31
### 3.1.0-20260528 ISO image released on 2026/05/28
### Download and Verify
3.0.0-20260331 ISO image:
https://download.securityonion.net/file/securityonion/securityonion-3.0.0-20260331.iso
3.1.0-20260528 ISO image:
https://download.securityonion.net/file/securityonion/securityonion-3.1.0-20260528.iso
MD5: ECD318A1662A6FDE0EF213F5A9BD4B07
SHA1: E55BE314440CCF3392DC0B06BC5E270B43176D9C
SHA256: 7FC47405E335CBE5C2B6C51FE7AC60248F35CBE504907B8B5A33822B23F8F4D5
MD5: 9D6FF58DEEE24089D722C73169765B3E
SHA1: 2B8B816B6CEC3B7F96B3C5E040EBF502DD2C412F
SHA256: 62FAB57E247C843D6A04F0796D8162C732B65D82FC3E4A59D087135B9FD32912
Signature for ISO image:
https://github.com/Security-Onion-Solutions/securityonion/raw/3/main/sigs/securityonion-3.0.0-20260331.iso.sig
https://github.com/Security-Onion-Solutions/securityonion/raw/3/main/sigs/securityonion-3.1.0-20260528.iso.sig
Signing key:
https://raw.githubusercontent.com/Security-Onion-Solutions/securityonion/3/main/KEYS
@@ -25,22 +25,22 @@ wget https://raw.githubusercontent.com/Security-Onion-Solutions/securityonion/3/
Download the signature file for the ISO:
```
wget https://github.com/Security-Onion-Solutions/securityonion/raw/3/main/sigs/securityonion-3.0.0-20260331.iso.sig
wget https://github.com/Security-Onion-Solutions/securityonion/raw/3/main/sigs/securityonion-3.1.0-20260528.iso.sig
```
Download the ISO image:
```
wget https://download.securityonion.net/file/securityonion/securityonion-3.0.0-20260331.iso
wget https://download.securityonion.net/file/securityonion/securityonion-3.1.0-20260528.iso
```
Verify the downloaded ISO image using the signature file:
```
gpg --verify securityonion-3.0.0-20260331.iso.sig securityonion-3.0.0-20260331.iso
gpg --verify securityonion-3.1.0-20260528.iso.sig securityonion-3.1.0-20260528.iso
```
The output should show "Good signature" and the Primary key fingerprint should match what's shown below:
```
gpg: Signature made Mon 30 Mar 2026 06:22:14 PM EDT using RSA key ID FE507013
gpg: Signature made Wed 27 May 2026 03:03:59 PM EDT using RSA key ID FE507013
gpg: Good signature from "Security Onion Solutions, LLC <info@securityonionsolutions.com>"
gpg: WARNING: This key is not certified with a trusted signature!
gpg: There is no indication that the signature belongs to the owner.
+1
View File
@@ -0,0 +1 @@
+1 -1
View File
@@ -1 +1 @@
3.1.0
3.2.0
+142
View File
@@ -0,0 +1,142 @@
# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
# https://securityonion.net/license; you may not use this file except in compliance with the
# Elastic License 2.0.
# Custom salt beacon that watches the SOC audit_settings table in postgres for
# new settings changes and emits a beacon event per new row. This replaces the
# inotify watch on /opt/so/saltstack/local/pillar -- instead of monitoring pillar
# files on disk, we monitor the so_soc.audit_settings table that SOC writes to.
#
# Detection is poll-based with a monotonic `id` watermark persisted to
# WATERMARK_FILE: each pass selects rows with id greater than the last id seen,
# which makes it self-healing (a missed poll simply catches up on the next one).
#
# Each emitted event carries setting_id and node_id; the push_pillar reactor maps
# setting_id -> app via pillar_push_map.yaml and writes a push intent, after which
# the existing so-push-drainer / orch.push_batch pipeline takes over unchanged.
import logging
import os
import subprocess
log = logging.getLogger(__name__)
WATERMARK_FILE = '/opt/so/state/pillar_db_watch.id'
CONTAINER = 'so-postgres'
DATABASE = 'so_soc'
# Unaligned, tuples-only psql output with a field separator that cannot appear in
# an id/setting_id/node_id, so we can split each row reliably.
FIELD_SEP = '\x1f'
def __virtual__():
return True
def validate(config):
return True, 'valid'
def _read_watermark():
# Returns the last processed id, or None if the watermark has not been seeded.
try:
with open(WATERMARK_FILE, 'r') as f:
return int((f.read() or '').strip())
except (IOError, ValueError):
return None
def _write_watermark(value):
try:
os.makedirs(os.path.dirname(WATERMARK_FILE), exist_ok=True)
tmp = WATERMARK_FILE + '.tmp'
with open(tmp, 'w') as f:
f.write(str(int(value)))
os.rename(tmp, WATERMARK_FILE)
except OSError:
log.exception('pillar_db beacon: failed to persist watermark to %s', WATERMARK_FILE)
def _query(sql):
# Run a query against so_soc inside the so-postgres container over the unix
# socket (trust auth, no password). Returns stdout on success, or None on any
# failure so the caller can no-op and retry on the next interval.
cmd = [
'docker', 'exec', CONTAINER,
'psql', '-U', 'postgres', '-d', DATABASE,
'-tA', '-F', FIELD_SEP, '-c', sql,
]
try:
result = subprocess.run(cmd, capture_output=True, text=True, timeout=30)
except subprocess.TimeoutExpired:
log.warning('pillar_db beacon: psql timed out')
return None
except Exception:
log.exception('pillar_db beacon: failed to exec psql')
return None
if result.returncode != 0:
log.warning('pillar_db beacon: psql failed (rc=%s): %s',
result.returncode, (result.stderr or '').strip())
return None
return result.stdout
def beacon(config):
retval = []
watermark = _read_watermark()
# First run / missing watermark: seed to the current MAX(id) and emit nothing
# so we never replay the entire settings history into a fleetwide push.
if watermark is None:
seed = _query('SELECT COALESCE(MAX(id), 0) FROM audit_settings;')
if seed is None:
return retval # postgres not ready yet; retry next interval
try:
_write_watermark(int((seed or '0').strip() or 0))
except ValueError:
log.warning('pillar_db beacon: could not parse MAX(id) seed: %r', seed)
return retval
rows = _query(
"SELECT id, setting_id, COALESCE(node_id, '') FROM audit_settings "
"WHERE id > %d ORDER BY id;" % watermark
)
if rows is None:
return retval
max_id = watermark
for line in rows.splitlines():
# Do NOT str.strip() the whole line: Python treats the \x1f field
# separator (and \x1c-\x1e) as whitespace, so stripping would eat an
# empty trailing node_id field and make the row look malformed.
if not line.strip():
continue
parts = line.split(FIELD_SEP)
if len(parts) < 3:
log.warning('pillar_db beacon: skipping malformed row: %r', line)
continue
try:
row_id = int(parts[0])
except ValueError:
log.warning('pillar_db beacon: skipping row with non-int id: %r', line)
continue
setting_id = parts[1]
node_id = parts[2]
retval.append({
'tag': 'audit_settings',
'id': row_id,
'setting_id': setting_id,
'node_id': node_id,
})
if row_id > max_id:
max_id = row_id
if max_id > watermark:
_write_watermark(max_id)
log.info('pillar_db beacon: emitted %d change(s), watermark %d -> %d',
len(retval), watermark, max_id)
return retval
+3 -1
View File
@@ -35,6 +35,8 @@
'kratos',
'hydra',
'elasticfleet',
'elasticfleet.manager',
'elasticsearch.cluster',
'elastic-fleet-package-registry',
'utility'
] %}
@@ -79,7 +81,7 @@
),
'so-heavynode': (
sensor_states +
['elasticagent', 'elasticsearch', 'logstash', 'redis', 'nginx']
['elasticagent', 'elasticsearch', 'elasticsearch.cluster', 'logstash', 'redis', 'nginx']
),
'so-idh': (
['idh']
+23 -4
View File
@@ -164,8 +164,8 @@ update_docker_containers() {
# Pull down the trusted docker image
run_check_net_err \
"docker pull $CONTAINER_REGISTRY/$IMAGEREPO/$image" \
"Could not pull $image, please ensure connectivity to $CONTAINER_REGISTRY" >> "$LOG_FILE" 2>&1
"Could not pull $image, please ensure connectivity to $CONTAINER_REGISTRY" >> "$LOG_FILE" 2>&1
# Get signature
run_check_net_err \
"curl --retry 5 --retry-delay 60 -A '$CURLTYPE/$CURRENTVERSION/$OS/$(uname -r)' $sig_url --output $SIGNPATH/$image.sig" \
@@ -188,8 +188,27 @@ update_docker_containers() {
if [ -z "$HOSTNAME" ]; then
HOSTNAME=$(hostname)
fi
docker tag $CONTAINER_REGISTRY/$IMAGEREPO/$image $HOSTNAME:5000/$IMAGEREPO/$image >> "$LOG_FILE" 2>&1
docker push $HOSTNAME:5000/$IMAGEREPO/$image >> "$LOG_FILE" 2>&1
docker tag $CONTAINER_REGISTRY/$IMAGEREPO/$image $HOSTNAME:5000/$IMAGEREPO/$image >> "$LOG_FILE" 2>&1 || {
echo "Unable to tag $image" >> "$LOG_FILE" 2>&1
exit 1
}
# Push to the embedded registry via a registry-to-registry copy. Avoids
# `docker push`, which on Docker 29.x with the containerd image store
# represents freshly-pulled images as an index whose layer content
# isn't reachable through the push path. The local `docker tag` above
# is preserved so so-image-pull's `:5000` existence check still works.
# Pin to the digest already gpg-verified above so we copy exactly the
# bytes we approved.
local VERIFIED_REF
VERIFIED_REF=$(echo "$DOCKERINSPECT" | jq -r ".[0].RepoDigests[] | select(. | contains(\"$CONTAINER_REGISTRY\"))" | head -n 1)
if [ -z "$VERIFIED_REF" ] || [ "$VERIFIED_REF" = "null" ]; then
echo "Unable to determine verified digest for $image" >> "$LOG_FILE" 2>&1
exit 1
fi
docker buildx imagetools create --tag $HOSTNAME:5000/$IMAGEREPO/$image "$VERIFIED_REF" >> "$LOG_FILE" 2>&1 || {
echo "Unable to copy $image to embedded registry" >> "$LOG_FILE" 2>&1
exit 1
}
fi
else
echo "There is a problem downloading the $image image. Details: " >> "$LOG_FILE" 2>&1
+3 -1
View File
@@ -165,6 +165,8 @@ if [[ $EXCLUDE_FALSE_POSITIVE_ERRORS == 'Y' ]]; then
EXCLUDED_ERRORS="$EXCLUDED_ERRORS|upgrading component template" # false positive (elasticsearch index or template names contain 'error')
EXCLUDED_ERRORS="$EXCLUDED_ERRORS|upgrading composable template" # false positive (elasticsearch composable template names contain 'error')
EXCLUDED_ERRORS="$EXCLUDED_ERRORS|Error while parsing document for index \[.ds-logs-kratos-so-.*object mapping for \[file\]" # false positive (mapping error occuring BEFORE kratos index has rolled over in 2.4.210)
EXCLUDED_ERRORS="$EXCLUDED_ERRORS|No such container" # false positive (telegraf trying to run stats on an old container)
EXCLUDED_ERRORS="$EXCLUDED_ERRORS|passwords do not match" # false positive (automated hydra test)
fi
if [[ $EXCLUDE_KNOWN_ERRORS == 'Y' ]]; then
@@ -227,7 +229,7 @@ if [[ $EXCLUDE_KNOWN_ERRORS == 'Y' ]]; then
EXCLUDED_ERRORS="$EXCLUDED_ERRORS|from NIC checksum offloading" # zeek reporter.log
EXCLUDED_ERRORS="$EXCLUDED_ERRORS|marked for removal" # docker container getting recycled
EXCLUDED_ERRORS="$EXCLUDED_ERRORS|tcp 127.0.0.1:6791: bind: address already in use" # so-elastic-fleet agent restarting. Seen starting w/ 8.18.8 https://github.com/elastic/kibana/issues/201459
EXCLUDED_ERRORS="$EXCLUDED_ERRORS|TransformTask\] \[logs-(tychon|aws_billing|microsoft_defender_endpoint).*user so_kibana lacks the required permissions \[logs-\1" # Known issue with 3 integrations using kibana_system role vs creating unique api creds with proper permissions.
EXCLUDED_ERRORS="$EXCLUDED_ERRORS|TransformTask\] \[logs-(tychon|aws_billing|microsoft_defender_endpoint|armis|o365_metrics|microsoft_sentinel|snyk|cyera|island_browser).*user so_kibana lacks the required permissions \[(logs|metrics)-\1" # Known issue with integrations starting transform jobs that are explicitly not allowed to start as a system user. This error should not be seen on fresh ES 9.3.3 installs or after SO 3.1.0 with soups addition of check_transform_health_and_reauthorize()
EXCLUDED_ERRORS="$EXCLUDED_ERRORS|manifest unknown" # appears in so-dockerregistry log for so-tcpreplay following docker upgrade to 29.2.1-1
fi
+9 -2
View File
@@ -9,7 +9,7 @@
. /usr/sbin/so-common
software_raid=("SOSMN" "SOSMN-DE02" "SOSSNNV" "SOSSNNV-DE02" "SOS10k-DE02" "SOS10KNV" "SOS10KNV-DE02" "SOS10KNV-DE02" "SOS2000-DE02" "SOS-GOFAST-LT-DE02" "SOS-GOFAST-MD-DE02" "SOS-GOFAST-HV-DE02")
software_raid=("SOSMN" "SOSMN-DE02" "SOSSNNV" "SOSSNNV-DE02" "SOS10k-DE02" "SOS10KNV" "SOS10KNV-DE02" "SOS10KNV-DE02" "SOS2000-DE02" "SOS-GOFAST-LT-DE02" "SOS-GOFAST-MD-DE02" "SOS-GOFAST-HV-DE02" "HVGUEST")
hardware_raid=("SOS1000" "SOS1000F" "SOSSN7200" "SOS5000" "SOS4000")
{%- if salt['grains.get']('sosmodel', '') %}
@@ -87,6 +87,11 @@ check_boss_raid() {
}
check_software_raid() {
if [[ ! -f /proc/mdstat ]]; then
SWRAID=0
return
fi
SWRC=$(grep "_" /proc/mdstat)
if [[ -n $SWRC ]]; then
# RAID is failed in some way
@@ -107,7 +112,9 @@ if [[ "$is_hwraid" == "true" ]]; then
fi
if [[ "$is_softwareraid" == "true" ]]; then
check_software_raid
check_boss_raid
if [ "$model" != "HVGUEST" ]; then
check_boss_raid
fi
fi
sum=$(($SWRAID + $BOSSRAID + $HWRAID))
@@ -1,5 +1,3 @@
{% import_yaml 'salt/minion.defaults.yaml' as SALT_MINION_DEFAULTS -%}
#!/bin/bash
#
# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
@@ -25,7 +23,8 @@ SYSTEM_START_TIME=$(date -d "$(</proc/uptime awk '{print $1}') seconds ago" +%s)
LAST_HIGHSTATE_END=$([ -e "/opt/so/log/salt/lasthighstate" ] && date -r /opt/so/log/salt/lasthighstate +%s || echo 0)
LAST_HEALTHCHECK_STATE_APPLY=$([ -e "/opt/so/log/salt/state-apply-test" ] && date -r /opt/so/log/salt/state-apply-test +%s || echo 0)
# SETTING THRESHOLD TO ANYTHING UNDER 600 seconds may cause a lot of salt-minion restarts since the job to touch the file occurs every 5-8 minutes by default
THRESHOLD={{SALT_MINION_DEFAULTS.salt.minion.check_threshold}} #within how many seconds the file /opt/so/log/salt/state-apply-test must have been touched/modified before the salt minion is restarted
# THRESHOLD is derived from the global push highstate interval + 1 hour, so the minion-check grace period tracks the schedule automatically.
THRESHOLD=$(( ({{ salt['pillar.get']('global:push:highstate_interval_hours', 2) }} + 1) * 3600 )) #within how many seconds the file /opt/so/log/salt/state-apply-test must have been touched/modified before the salt minion is restarted
THRESHOLD_DATE=$((LAST_HEALTHCHECK_STATE_APPLY+THRESHOLD))
logCmd() {
+2 -1
View File
@@ -9,7 +9,8 @@
prune_images:
cmd.run:
- name: so-docker-prune
- order: last
- onlyif: command -v /usr/sbin/so-docker-prune >/dev/null 2>&1
- order: 9000
{% else %}
+1
View File
@@ -19,6 +19,7 @@ wait_for_elasticsearch:
so-elastalert:
docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-elastalert:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- hostname: elastalert
- name: so-elastalert
- user: so-elastalert
@@ -15,6 +15,7 @@ include:
so-elastic-fleet-package-registry:
docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-elastic-fleet-package-registry:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- name: so-elastic-fleet-package-registry
- hostname: Fleet-package-reg-{{ GLOBALS.hostname }}
- detach: True
@@ -51,6 +52,16 @@ so-elastic-fleet-package-registry:
- {{ ULIMIT.name }}={{ ULIMIT.soft }}:{{ ULIMIT.hard }}
{% endfor %}
{% endif %}
wait_for_so-elastic-fleet-package-registry:
http.wait_for_successful_query:
- name: "http://localhost:8080/health"
- status: 200
- wait_for: 300
- request_interval: 15
- require:
- docker_container: so-elastic-fleet-package-registry
delete_so-elastic-fleet-package-registry_so-status.disabled:
file.uncomment:
- name: /opt/so/conf/so-status/so-status.conf
+1
View File
@@ -16,6 +16,7 @@ include:
so-elastic-agent:
docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-elastic-agent:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- name: so-elastic-agent
- hostname: {{ GLOBALS.hostname }}
- detach: True
+6 -102
View File
@@ -17,65 +17,19 @@ include:
- logstash.ssl
- elasticfleet.config
- elasticfleet.sostatus
{%- if GLOBALS.role != "so-fleet" %}
- elasticfleet.manager
{%- endif %}
{% if grains.role not in ['so-fleet'] %}
{% if GLOBALS.role != "so-fleet" %}
# Wait for Elasticsearch to be ready - no reason to try running Elastic Fleet server if ES is not ready
wait_for_elasticsearch_elasticfleet:
cmd.run:
- name: so-elasticsearch-wait
{% endif %}
# If enabled, automatically update Fleet Logstash Outputs
{% if ELASTICFLEETMERGED.config.server.enable_auto_configuration and grains.role not in ['so-import', 'so-eval', 'so-fleet'] %}
so-elastic-fleet-auto-configure-logstash-outputs:
cmd.run:
- name: /usr/sbin/so-elastic-fleet-outputs-update
- retry:
attempts: 4
interval: 30
{# Separate from above in order to catch elasticfleet-logstash.crt changes and force update to fleet output policy #}
so-elastic-fleet-auto-configure-logstash-outputs-force:
cmd.run:
- name: /usr/sbin/so-elastic-fleet-outputs-update --certs
- retry:
attempts: 4
interval: 30
- onchanges:
- x509: etc_elasticfleet_logstash_crt
- x509: elasticfleet_kafka_crt
{% endif %}
# If enabled, automatically update Fleet Server URLs & ES Connection
{% if ELASTICFLEETMERGED.config.server.enable_auto_configuration and grains.role not in ['so-fleet'] %}
so-elastic-fleet-auto-configure-server-urls:
cmd.run:
- name: /usr/sbin/so-elastic-fleet-urls-update
- retry:
attempts: 4
interval: 30
{% endif %}
# Automatically update Fleet Server Elasticsearch URLs & Agent Artifact URLs
{% if grains.role not in ['so-fleet'] %}
so-elastic-fleet-auto-configure-elasticsearch-urls:
cmd.run:
- name: /usr/sbin/so-elastic-fleet-es-url-update
- retry:
attempts: 4
interval: 30
so-elastic-fleet-auto-configure-artifact-urls:
cmd.run:
- name: /usr/sbin/so-elastic-fleet-artifacts-url-update
- retry:
attempts: 4
interval: 30
{% endif %}
{% if GLOBALS.role == "so-fleet" %}
# Sync Elastic Agent artifacts to Fleet Node
{% if grains.role in ['so-fleet'] %}
elasticagent_syncartifacts:
file.recurse:
- name: /nsm/elastic-fleet/artifacts/beats
@@ -88,6 +42,7 @@ elasticagent_syncartifacts:
so-elastic-fleet:
docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-elastic-agent:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- name: so-elastic-fleet
- hostname: FleetServer-{{ GLOBALS.hostname }}
- detach: True
@@ -149,57 +104,6 @@ so-elastic-fleet:
- x509: etc_elasticfleet_crt
{% endif %}
{% if GLOBALS.role != "so-fleet" %}
so-elastic-fleet-package-statefile:
file.managed:
- name: /opt/so/state/elastic_fleet_packages.txt
- contents: {{ELASTICFLEETMERGED.packages}}
so-elastic-fleet-package-upgrade:
cmd.run:
- name: /usr/sbin/so-elastic-fleet-package-upgrade
- retry:
attempts: 3
interval: 10
- onchanges:
- file: /opt/so/state/elastic_fleet_packages.txt
so-elastic-fleet-integrations:
cmd.run:
- name: /usr/sbin/so-elastic-fleet-integration-policy-load
- retry:
attempts: 3
interval: 10
so-elastic-agent-grid-upgrade:
cmd.run:
- name: /usr/sbin/so-elastic-agent-grid-upgrade
- retry:
attempts: 12
interval: 5
so-elastic-fleet-integration-upgrade:
cmd.run:
- name: /usr/sbin/so-elastic-fleet-integration-upgrade
- retry:
attempts: 3
interval: 10
{# Optional integrations script doesn't need the retries like so-elastic-fleet-integration-upgrade which loads the default integrations #}
so-elastic-fleet-addon-integrations:
cmd.run:
- name: /usr/sbin/so-elastic-fleet-optional-integrations-load
{% if ELASTICFLEETMERGED.config.defend_filters.enable_auto_configuration %}
so-elastic-defend-manage-filters-file-watch:
cmd.run:
- name: python3 /sbin/so-elastic-defend-manage-filters.py -c /opt/so/conf/elasticsearch/curl.config -d /opt/so/conf/elastic-fleet/defend-exclusions/disabled-filters.yaml -i /nsm/securityonion-resources/event_filters/ -i /opt/so/conf/elastic-fleet/defend-exclusions/rulesets/custom-filters/ &>> /opt/so/log/elasticfleet/elastic-defend-manage-filters.log
- onchanges:
- file: elasticdefendcustom
- file: elasticdefenddisabled
{% endif %}
{% endif %}
delete_so-elastic-fleet_so-status.disabled:
file.uncomment:
- name: /opt/so/conf/so-status/so-status.conf
+101
View File
@@ -0,0 +1,101 @@
# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
# https://securityonion.net/license; you may not use this file except in compliance with the
# Elastic License 2.0.
{% from 'allowed_states.map.jinja' import allowed_states %}
{% if sls in allowed_states %}
{% from 'elasticfleet/map.jinja' import ELASTICFLEETMERGED %}
include:
- elasticfleet.config
# If enabled, automatically update Fleet Logstash Outputs
{% if ELASTICFLEETMERGED.config.server.enable_auto_configuration and grains.role not in ['so-import', 'so-eval'] %}
so-elastic-fleet-auto-configure-logstash-outputs:
cmd.run:
- name: /usr/sbin/so-elastic-fleet-outputs-update
- retry:
attempts: 4
interval: 30
{% endif %}
# If enabled, automatically update Fleet Server URLs & ES Connection
so-elastic-fleet-auto-configure-server-urls:
cmd.run:
- name: /usr/sbin/so-elastic-fleet-urls-update
- retry:
attempts: 4
interval: 30
# Automatically update Fleet Server Elasticsearch URLs & Agent Artifact URLs
so-elastic-fleet-auto-configure-elasticsearch-urls:
cmd.run:
- name: /usr/sbin/so-elastic-fleet-es-url-update
- retry:
attempts: 4
interval: 30
so-elastic-fleet-auto-configure-artifact-urls:
cmd.run:
- name: /usr/sbin/so-elastic-fleet-artifacts-url-update
- retry:
attempts: 4
interval: 30
so-elastic-fleet-package-statefile:
file.managed:
- name: /opt/so/state/elastic_fleet_packages.txt
- contents: {{ELASTICFLEETMERGED.packages}}
so-elastic-fleet-package-upgrade:
cmd.run:
- name: /usr/sbin/so-elastic-fleet-package-upgrade
- retry:
attempts: 3
interval: 10
- onchanges:
- file: /opt/so/state/elastic_fleet_packages.txt
so-elastic-fleet-integrations:
cmd.run:
- name: /usr/sbin/so-elastic-fleet-integration-policy-load
- retry:
attempts: 3
interval: 10
so-elastic-agent-grid-upgrade:
cmd.run:
- name: /usr/sbin/so-elastic-agent-grid-upgrade
- retry:
attempts: 12
interval: 5
so-elastic-fleet-integration-upgrade:
cmd.run:
- name: /usr/sbin/so-elastic-fleet-integration-upgrade
- retry:
attempts: 3
interval: 10
{# Optional integrations script doesn't need the retries like so-elastic-fleet-integration-upgrade which loads the default integrations #}
so-elastic-fleet-addon-integrations:
cmd.run:
- name: /usr/sbin/so-elastic-fleet-optional-integrations-load
{% if ELASTICFLEETMERGED.config.defend_filters.enable_auto_configuration %}
so-elastic-defend-manage-filters-file-watch:
cmd.run:
- name: python3 /sbin/so-elastic-defend-manage-filters.py -c /opt/so/conf/elasticsearch/curl.config -d /opt/so/conf/elastic-fleet/defend-exclusions/disabled-filters.yaml -i /nsm/securityonion-resources/event_filters/ -i /opt/so/conf/elastic-fleet/defend-exclusions/rulesets/custom-filters/ &>> /opt/so/log/elasticfleet/elastic-defend-manage-filters.log
- onchanges:
- file: elasticdefendcustom
- file: elasticdefenddisabled
{% endif %}
{% else %}
{{sls}}_state_not_allowed:
test.fail_without_changes:
- name: {{sls}}_state_not_allowed
{% endif %}
@@ -240,7 +240,7 @@ elastic_fleet_policy_create() {
--arg DESC "$DESC" \
--arg TIMEOUT $TIMEOUT \
--arg FLEETSERVER "$FLEETSERVER" \
'{"name": $NAME,"id":$NAME,"description":$DESC,"namespace":"default","monitoring_enabled":["logs"],"inactivity_timeout":$TIMEOUT,"has_fleet_server":$FLEETSERVER}'
'{"name": $NAME,"id":$NAME,"description":$DESC,"namespace":"default","monitoring_enabled":["logs"],"inactivity_timeout":$TIMEOUT,"has_fleet_server":$FLEETSERVER,"advanced_settings":{"agent_logging_level": "warning"}}'
)
# Create Fleet Policy
if ! fleet_api "agent_policies" -XPOST -H 'kbn-xsrf: true' -H 'Content-Type: application/json' -d "$JSON_STRING"; then
@@ -5,11 +5,12 @@
# this file except in compliance with the Elastic License 2.0.
. /usr/sbin/so-common
. /usr/sbin/so-elastic-fleet-common
{%- import_yaml 'elasticsearch/defaults.yaml' as ELASTICSEARCHDEFAULTS %}
{%- import_yaml 'elasticfleet/defaults.yaml' as ELASTICFLEETDEFAULTS %}
{# Optionally override Elasticsearch version for Elastic Agent patch releases #}
{%- if ELASTICFLEETDEFAULTS.elasticfleet.patch_version is defined %}
{%- do ELASTICSEARCHDEFAULTS.update({'elasticsearch': {'version': ELASTICFLEETDEFAULTS.elasticfleet.patch_version}}) %}
{%- do ELASTICSEARCHDEFAULTS.elasticsearch.update({'version': ELASTICFLEETDEFAULTS.elasticfleet.patch_version}) %}
{%- endif %}
# Only run on Managers
@@ -19,13 +20,10 @@ if ! is_manager_node; then
fi
# Get current list of Grid Node Agents that need to be upgraded
RAW_JSON=$(curl -K /opt/so/conf/elasticsearch/curl.config -L "http://localhost:5601/api/fleet/agents?perPage=20&page=1&kuery=NOT%20agent.version%3A%20{{ELASTICSEARCHDEFAULTS.elasticsearch.version}}%20AND%20policy_id%3A%20so-grid-nodes_%2A&showInactive=false&getStatusSummary=true" --retry 3 --retry-delay 30 --fail 2>/dev/null)
if ! RAW_JSON=$(fleet_api "agents?perPage=20&page=1&kuery=NOT%20agent.version%3A%20{{ELASTICSEARCHDEFAULTS.elasticsearch.version | urlencode }}%20AND%20policy_id%3A%20so-grid-nodes_%2A&showInactive=false&getStatusSummary=true" -H 'kbn-xsrf: true' -H 'Content-Type: application/json'); then
# Check to make sure that the server responded with good data - else, bail from script
CHECKSUM=$(jq -r '.page' <<< "$RAW_JSON")
if [ "$CHECKSUM" -ne 1 ]; then
printf "Failed to query for current Grid Agents...\n"
exit 1
printf "Failed to query for current Grid Agents...\n"
exit 1
fi
# Generate list of Node Agents that need updates
@@ -36,10 +34,12 @@ if [ "$OUTDATED_LIST" != '[]' ]; then
printf "Initiating upgrades for $AGENTNUMBERS Agents to Elastic {{ELASTICSEARCHDEFAULTS.elasticsearch.version}}...\n\n"
# Generate updated JSON payload
JSON_STRING=$(jq -n --arg ELASTICVERSION {{ELASTICSEARCHDEFAULTS.elasticsearch.version}} --arg UPDATELIST $OUTDATED_LIST '{"version": $ELASTICVERSION,"agents": $UPDATELIST }')
JSON_STRING=$(jq -n --arg ELASTICVERSION "{{ELASTICSEARCHDEFAULTS.elasticsearch.version}}" --argjson UPDATELIST "$OUTDATED_LIST" '{"version": $ELASTICVERSION,"agents": $UPDATELIST }')
# Update Node Agents
curl -K /opt/so/conf/elasticsearch/curl.config -L -X POST "http://localhost:5601/api/fleet/agents/bulk_upgrade" -H 'kbn-xsrf: true' -H 'Content-Type: application/json' -d "$JSON_STRING"
if ! fleet_api "agents/bulk_upgrade" -XPOST -H 'kbn-xsrf: true' -H 'Content-Type: application/json' -d "$JSON_STRING"; then
printf "Failed to initiate Agent upgrades...\n"
fi
else
printf "No Agents need updates... Exiting\n\n"
exit 0
@@ -235,6 +235,16 @@ function update_kafka_outputs() {
{% endif %}
# Compare the current Elastic Fleet certificate against what is on disk
POLICY_CERT_SHA=$(jq -r '.item.ssl.certificate' <<< $RAW_JSON | openssl x509 -noout -sha256 -fingerprint)
DISK_CERT_SHA=$(openssl x509 -in /etc/pki/elasticfleet-logstash.crt -noout -sha256 -fingerprint)
if [[ "$POLICY_CERT_SHA" != "$DISK_CERT_SHA" ]]; then
printf "Certificate on disk doesn't match certificate in policy - forcing update\n"
UPDATE_CERTS=true
FORCE_UPDATE=true
fi
# Sort & hash the new list of Logstash Outputs
NEW_LIST_JSON=$(jq --compact-output --null-input '$ARGS.positional' --args -- "${NEW_LIST[@]}")
NEW_HASH=$(sha256sum <<< "$NEW_LIST_JSON" | awk '{print $1}')
+1 -1
View File
@@ -4,7 +4,7 @@
# Elastic License 2.0.
{% from 'allowed_states.map.jinja' import allowed_states %}
{% if sls.split('.')[0] in allowed_states %}
{% if sls in allowed_states %}
{% from 'vars/globals.map.jinja' import GLOBALS %}
{% from 'elasticsearch/config.map.jinja' import ELASTICSEARCHMERGED %}
{% from 'elasticsearch/template.map.jinja' import ES_INDEX_SETTINGS, SO_MANAGED_INDICES %}
+4 -1
View File
@@ -3958,10 +3958,13 @@ elasticsearch:
- vulnerability-mappings
- common-settings
- common-dynamic-mappings
- logs-redis.log@package
- logs-redis.log@custom
data_stream:
allow_custom_routing: false
hidden: false
ignore_missing_component_templates: []
ignore_missing_component_templates:
- logs-redis.log@custom
index_patterns:
- logs-redis.log*
priority: 501
+7 -7
View File
@@ -17,13 +17,14 @@ include:
- elasticsearch.ssl
- elasticsearch.config
- elasticsearch.sostatus
{%- if GLOBALS.role != 'so-searchode' %}
{%- if GLOBALS.role != "so-searchnode" %}
- elasticsearch.cluster
{%- endif%}
so-elasticsearch:
docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-elasticsearch:{{ ELASTICSEARCHMERGED.version }}
- restart_policy: unless-stopped
- hostname: elasticsearch
- name: so-elasticsearch
- user: elasticsearch
@@ -102,11 +103,6 @@ so-elasticsearch:
- cmd: auth_users_roles_inode
- cmd: auth_users_inode
delete_so-elasticsearch_so-status.disabled:
file.uncomment:
- name: /opt/so/conf/so-status/so-status.conf
- regex: ^so-elasticsearch$
wait_for_so-elasticsearch:
http.wait_for_successful_query:
- name: "https://localhost:9200/"
@@ -117,10 +113,14 @@ wait_for_so-elasticsearch:
- status: 200
- wait_for: 300
- request_interval: 15
- backend: requests
- require:
- docker_container: so-elasticsearch
delete_so-elasticsearch_so-status.disabled:
file.uncomment:
- name: /opt/so/conf/so-status/so-status.conf
- regex: ^so-elasticsearch$
{% else %}
{{sls}}_state_not_allowed:
+2 -1
View File
@@ -63,7 +63,8 @@
{ "set": { "if": "ctx.event?.dataset != null && !ctx.event.dataset.contains('.')", "field": "event.dataset", "value": "{{event.module}}.{{event.dataset}}" } },
{ "split": { "if": "ctx.event?.dataset != null && ctx.event.dataset.contains('.')", "field": "event.dataset", "separator": "\\.", "target_field": "dataset_tag_temp" } },
{ "append": { "if": "ctx.dataset_tag_temp != null", "field": "tags", "value": "{{dataset_tag_temp.1}}" } },
{ "grok": { "if": "ctx.http?.response?.status_code != null", "field": "http.response.status_code", "patterns": ["%{NUMBER:http.response.status_code:long} %{GREEDYDATA}"]} },
{ "grok": { "if": "ctx.http?.response?.status_code instanceof String", "field": "http.response.status_code", "patterns": ["%{NUMBER:http.response.status_code:long}(?:\\s+%{GREEDYDATA})?"], "ignore_failure": true } },
{ "convert": { "if": "ctx.http?.response?.status_code != null && !(ctx.http.response.status_code instanceof Number)", "field": "http.response.status_code", "type": "long", "ignore_failure": true } },
{ "set": { "if": "ctx?.metadata?.kafka != null" , "field": "kafka.id", "value": "{{metadata.kafka.partition}}{{metadata.kafka.offset}}{{metadata.kafka.timestamp}}", "ignore_failure": true } },
{ "remove": { "field": [ "message2", "type", "fields", "category", "module", "dataset", "dataset_tag_temp", "event.dataset_temp" ], "ignore_missing": true, "ignore_failure": true } },
{ "pipeline": { "name": "global@custom", "ignore_missing_pipeline": true, "description": "[Fleet] Global pipeline for all data streams" } }
+75 -2
View File
@@ -177,12 +177,84 @@
"description": "Extract IPs from Elastic Agent events (host.ip) and adds them to related.ip"
}
},
{
"script": {
"description": "Snapshot event.ingested into _tmp.event_ingested_pre_fleet before .fleet_final_pipeline-1 overwrites it with ES ingest time",
"lang": "painless",
"if": "ctx.event?.ingested != null && ctx.event?.created == null",
"ignore_failure": true,
"source": "ctx.putIfAbsent('_tmp', [:]); ctx._tmp.event_ingested_pre_fleet = ctx.event.ingested;"
}
},
{
"pipeline": {
"name": ".fleet_final_pipeline-1",
"ignore_missing_pipeline": true
}
},
{
"script": {
"description": "Calculate time from Elastic Agent to Logstash.",
"lang": "painless",
"if": "ctx._tmp?.logstash_from_agent != null",
"ignore_failure": true,
"source": "ZonedDateTime start = ctx._tmp.event_ingested_pre_fleet != null ? ZonedDateTime.parse(ctx._tmp.event_ingested_pre_fleet) : ZonedDateTime.parse(ctx['@timestamp']); ctx.event.putIfAbsent('ingestion', [:]); ctx.event.ingestion.latency_elasticagent_to_logstash = ChronoUnit.SECONDS.between(start, ZonedDateTime.parse(ctx._tmp.logstash_from_agent));"
}
},
{
"script": {
"description": "Calculate time from Logstash to Redis",
"lang": "painless",
"if": "ctx._tmp?.logstash_from_agent != null && ctx._tmp?.logstash_to_redis != null",
"ignore_failure": true,
"source": "ctx.event.putIfAbsent('ingestion', [:]); ctx.event.ingestion.latency_logstash_to_redis = ChronoUnit.SECONDS.between(ZonedDateTime.parse(ctx._tmp.logstash_from_agent), ZonedDateTime.parse(ctx._tmp.logstash_to_redis));"
}
},
{
"script": {
"description": "Calculate time message spends in redis queue (logstash delay in pulling event).",
"lang": "painless",
"if": "ctx._tmp?.logstash_to_redis != null && ctx._tmp?.logstash_from_redis != null",
"ignore_failure": true,
"source": "ctx.event.putIfAbsent('ingestion', [:]); ctx.event.ingestion.latency_redis_to_logstash = ChronoUnit.SECONDS.between(ZonedDateTime.parse(ctx._tmp.logstash_to_redis), ZonedDateTime.parse(ctx._tmp.logstash_from_redis));"
}
},
{
"script": {
"description": "Calculate time from Logstash to Elasticsearch (after read from Redis).",
"lang": "painless",
"if": "ctx._tmp?.logstash_from_redis != null",
"ignore_failure": true,
"source": "ctx.event.putIfAbsent('ingestion', [:]); ctx.event.ingestion.latency_logstash_to_elasticsearch = ChronoUnit.SECONDS.between(ZonedDateTime.parse(ctx._tmp.logstash_from_redis), metadata().now);"
}
},
{
"script": {
"description": "Calculate time from Elastic Agent to Kafka.",
"lang": "painless",
"if": "ctx._tmp?.logstash_from_kafka != null && ctx._tmp?.logstash_from_agent == null",
"ignore_failure": true,
"source": "ZonedDateTime start = ctx._tmp.event_ingested_pre_fleet != null ? ZonedDateTime.parse(ctx._tmp.event_ingested_pre_fleet) : ZonedDateTime.parse(ctx['@timestamp']); ctx.event.putIfAbsent('ingestion', [:]); ctx.event.ingestion.latency_elasticagent_to_kafka = ChronoUnit.SECONDS.between(start, ZonedDateTime.parse(ctx._tmp.logstash_from_kafka));"
}
},
{
"script": {
"description": "Calculate time message spends in Kafka queue (logstash delay in pulling event).",
"lang": "painless",
"if": "ctx._tmp?.logstash_from_kafka != null && ctx.metadata?.kafka?.timestamp != null && ctx._tmp?.logstash_from_agent == null",
"ignore_failure": true,
"source": "ctx.event.putIfAbsent('ingestion', [:]); ctx.event.ingestion.latency_kafka_queue = ChronoUnit.SECONDS.between(ZonedDateTime.ofInstant(Instant.ofEpochMilli(Long.parseLong(ctx.metadata.kafka.timestamp.toString())), ZoneId.of('UTC')), ZonedDateTime.parse(ctx._tmp.logstash_from_kafka));"
}
},
{
"script": {
"description": "Calculate time from Logstash to Elasticsearch (after read from Kafka).",
"lang": "painless",
"if": "ctx._tmp?.logstash_from_kafka != null && ctx._tmp?.logstash_from_agent == null",
"ignore_failure": true,
"source": "ctx.event.putIfAbsent('ingestion', [:]); ctx.event.ingestion.latency_kafka_to_elasticsearch = ChronoUnit.SECONDS.between(ZonedDateTime.parse(ctx._tmp.logstash_from_kafka), metadata().now);"
}
},
{
"remove": {
"field": "event.agent_id_status",
@@ -202,11 +274,12 @@
"event.dataset_temp",
"dataset_tag_temp",
"module_temp",
"datastream_dataset_temp"
"datastream_dataset_temp",
"_tmp"
],
"ignore_missing": true,
"ignore_failure": true
}
}
]
}
}
+71
View File
@@ -0,0 +1,71 @@
{
"description": "zeek.ja4d",
"processors": [
{
"set": {
"field": "event.dataset",
"value": "ja4d"
}
},
{
"remove": {
"field": [
"host"
],
"ignore_failure": true
}
},
{
"json": {
"field": "message",
"target_field": "message2",
"ignore_failure": true
}
},
{
"rename": {
"field": "message2.ja4d",
"target_field": "hash.ja4d",
"ignore_missing": true,
"if": "ctx?.message2?.ja4d != null && ctx.message2.ja4d.length() > 0"
}
},
{
"rename": {
"field": "message2.client_mac",
"target_field": "host.mac",
"ignore_missing": true,
"if": "ctx?.message2?.client_mac != null && ctx.message2.client_mac.length() > 0"
}
},
{
"rename": {
"field": "message2.hostname",
"target_field": "host.hostname",
"ignore_missing": true,
"if": "ctx?.message2?.hostname != null && ctx.message2.hostname.length() > 0"
}
},
{
"rename": {
"field": "message2.requested_ip",
"target_field": "dhcp.requested_address",
"ignore_missing": true,
"if": "ctx?.message2?.requested_ip != null && ctx.message2.requested_ip.length() > 0"
}
},
{
"rename": {
"field": "message2.vendor_class_id",
"target_field": "zeek.ja4d.vendor_class_id",
"ignore_missing": true,
"if": "ctx?.message2?.vendor_class_id != null && ctx.message2.vendor_class_id.length() > 0"
}
},
{
"pipeline": {
"name": "zeek.common"
}
}
]
}
@@ -103,11 +103,13 @@ load_component_templates() {
local pattern="${ELASTICSEARCH_TEMPLATES_DIR}/component/$2"
local append_mappings="${3:-"false"}"
# current state of nullglob shell option
shopt -q nullglob && nullglob_set=1 || nullglob_set=0
shopt -s nullglob
echo -e "\nLoading $printed_name component templates...\n"
if ! compgen -G "${pattern}/*.json" > /dev/null; then
echo "No $printed_name component templates found in ${pattern}, skipping."
return
fi
for component in "$pattern"/*.json; do
tmpl_name=$(basename "${component%.json}")
@@ -121,11 +123,6 @@ load_component_templates() {
SO_LOAD_FAILURES_NAMES+=("$component")
fi
done
# restore nullglob shell option if needed
if [[ $nullglob_set -eq 1 ]]; then
shopt -u nullglob
fi
}
check_elasticsearch_responsive() {
@@ -136,7 +133,32 @@ check_elasticsearch_responsive() {
fail "Elasticsearch is not responding. Please review Elasticsearch logs /opt/so/log/elasticsearch/securityonion.log for more details. Additionally, consider running so-elasticsearch-troubleshoot."
}
if [[ "$FORCE" == "true" || ! -f "$SO_STATEFILE_SUCCESS" ]]; then
index_templates_exist() {
local templates_dir="$1"
if [[ ! -d "$templates_dir" ]]; then
return 1
fi
compgen -G "${templates_dir}/*.json" > /dev/null
}
should_load_addon_templates() {
if [[ "$IS_HEAVYNODE" == "true" ]]; then
return 1
fi
# Skip statefile checks when forcing template load
if [[ "$FORCE" != "true" ]]; then
if [[ ! -f "$SO_STATEFILE_SUCCESS" || -f "$ADDON_STATEFILE_SUCCESS" ]]; then
return 1
fi
fi
index_templates_exist "$ADDON_TEMPLATES_DIR"
}
if [[ "$FORCE" == "true" || ! -f "$SO_STATEFILE_SUCCESS" ]] && index_templates_exist "$SO_TEMPLATES_DIR"; then
check_elasticsearch_responsive
if [[ "$IS_HEAVYNODE" == "false" ]]; then
@@ -201,13 +223,14 @@ if [[ "$FORCE" == "true" || ! -f "$SO_STATEFILE_SUCCESS" ]]; then
fail "Failed to load all Security Onion core templates successfully."
fi
fi
else
elif ! index_templates_exist "$SO_TEMPLATES_DIR"; then
echo "No Security Onion core index templates found in ${SO_TEMPLATES_DIR}, skipping."
elif [[ -f "$SO_STATEFILE_SUCCESS" ]]; then
echo "Security Onion core templates already loaded"
fi
# Start loading addon templates
if [[ (-d "$ADDON_TEMPLATES_DIR" && -f "$SO_STATEFILE_SUCCESS" && "$IS_HEAVYNODE" == "false" && ! -f "$ADDON_STATEFILE_SUCCESS") || (-d "$ADDON_TEMPLATES_DIR" && "$IS_HEAVYNODE" == "false" && "$FORCE" == "true") ]]; then
if should_load_addon_templates; then
check_elasticsearch_responsive
+33
View File
@@ -398,6 +398,7 @@ firewall:
- elasticsearch_rest
- docker_registry
- influxdb
- postgres
- sensoroni
- yum
- beats_5044
@@ -410,6 +411,7 @@ firewall:
portgroups:
- docker_registry
- influxdb
- postgres
- sensoroni
- yum
- beats_5044
@@ -427,6 +429,7 @@ firewall:
- yum
- docker_registry
- influxdb
- postgres
- sensoroni
searchnode:
portgroups:
@@ -437,6 +440,7 @@ firewall:
- yum
- docker_registry
- influxdb
- postgres
- elastic_agent_control
- elastic_agent_data
- elastic_agent_update
@@ -450,6 +454,7 @@ firewall:
- yum
- docker_registry
- influxdb
- postgres
- elastic_agent_control
- elastic_agent_data
- elastic_agent_update
@@ -459,6 +464,7 @@ firewall:
- yum
- docker_registry
- influxdb
- postgres
- elastic_agent_control
- elastic_agent_data
- elastic_agent_update
@@ -492,6 +498,7 @@ firewall:
portgroups:
- docker_registry
- influxdb
- postgres
- sensoroni
- yum
- elastic_agent_control
@@ -502,6 +509,7 @@ firewall:
- yum
- docker_registry
- influxdb
- postgres
- elastic_agent_control
- elastic_agent_data
- elastic_agent_update
@@ -610,6 +618,7 @@ firewall:
- elasticsearch_rest
- docker_registry
- influxdb
- postgres
- sensoroni
- yum
- beats_5044
@@ -622,6 +631,7 @@ firewall:
portgroups:
- docker_registry
- influxdb
- postgres
- sensoroni
- yum
- beats_5044
@@ -639,6 +649,7 @@ firewall:
- yum
- docker_registry
- influxdb
- postgres
- sensoroni
searchnode:
portgroups:
@@ -649,6 +660,7 @@ firewall:
- yum
- docker_registry
- influxdb
- postgres
- elastic_agent_control
- elastic_agent_data
- elastic_agent_update
@@ -662,6 +674,7 @@ firewall:
- yum
- docker_registry
- influxdb
- postgres
- elastic_agent_control
- elastic_agent_data
- elastic_agent_update
@@ -671,6 +684,7 @@ firewall:
- yum
- docker_registry
- influxdb
- postgres
- elastic_agent_control
- elastic_agent_data
- elastic_agent_update
@@ -702,6 +716,7 @@ firewall:
portgroups:
- docker_registry
- influxdb
- postgres
- sensoroni
- yum
- elastic_agent_control
@@ -712,6 +727,7 @@ firewall:
- yum
- docker_registry
- influxdb
- postgres
- elastic_agent_control
- elastic_agent_data
- elastic_agent_update
@@ -820,6 +836,7 @@ firewall:
- elasticsearch_rest
- docker_registry
- influxdb
- postgres
- sensoroni
- yum
- beats_5044
@@ -832,6 +849,7 @@ firewall:
portgroups:
- docker_registry
- influxdb
- postgres
- sensoroni
- yum
- beats_5044
@@ -849,6 +867,7 @@ firewall:
- yum
- docker_registry
- influxdb
- postgres
- sensoroni
searchnode:
portgroups:
@@ -858,6 +877,7 @@ firewall:
- yum
- docker_registry
- influxdb
- postgres
- elastic_agent_control
- elastic_agent_data
- elastic_agent_update
@@ -870,6 +890,7 @@ firewall:
- yum
- docker_registry
- influxdb
- postgres
- elastic_agent_control
- elastic_agent_data
- elastic_agent_update
@@ -879,6 +900,7 @@ firewall:
- yum
- docker_registry
- influxdb
- postgres
- elastic_agent_control
- elastic_agent_data
- elastic_agent_update
@@ -912,6 +934,7 @@ firewall:
portgroups:
- docker_registry
- influxdb
- postgres
- sensoroni
- yum
- elastic_agent_control
@@ -922,6 +945,7 @@ firewall:
- yum
- docker_registry
- influxdb
- postgres
- elastic_agent_control
- elastic_agent_data
- elastic_agent_update
@@ -1040,6 +1064,7 @@ firewall:
- elasticsearch_rest
- docker_registry
- influxdb
- postgres
- sensoroni
- yum
- beats_5044
@@ -1052,6 +1077,7 @@ firewall:
portgroups:
- docker_registry
- influxdb
- postgres
- sensoroni
- yum
- beats_5044
@@ -1063,6 +1089,7 @@ firewall:
portgroups:
- docker_registry
- influxdb
- postgres
- sensoroni
- yum
- beats_5044
@@ -1074,6 +1101,7 @@ firewall:
portgroups:
- docker_registry
- influxdb
- postgres
- sensoroni
- yum
- redis
@@ -1083,6 +1111,7 @@ firewall:
portgroups:
- docker_registry
- influxdb
- postgres
- sensoroni
- yum
- redis
@@ -1093,6 +1122,7 @@ firewall:
- yum
- docker_registry
- influxdb
- postgres
- elastic_agent_control
- elastic_agent_data
- elastic_agent_update
@@ -1129,6 +1159,7 @@ firewall:
portgroups:
- docker_registry
- influxdb
- postgres
- sensoroni
- yum
- elastic_agent_control
@@ -1139,6 +1170,7 @@ firewall:
- yum
- docker_registry
- influxdb
- postgres
- elastic_agent_control
- elastic_agent_data
- elastic_agent_update
@@ -1482,6 +1514,7 @@ firewall:
- kibana
- redis
- influxdb
- postgres
- elasticsearch_rest
- elasticsearch_node
- elastic_agent_control
-13
View File
@@ -1,6 +1,5 @@
{% from 'vars/globals.map.jinja' import GLOBALS %}
{% from 'docker/docker.map.jinja' import DOCKERMERGED %}
{% from 'telegraf/map.jinja' import TELEGRAFMERGED %}
{% import_yaml 'firewall/defaults.yaml' as FIREWALL_DEFAULT %}
{# add our ip to self #}
@@ -56,16 +55,4 @@
{% endif %}
{# Open Postgres (5432) to minion hostgroups when Telegraf is configured to write to Postgres #}
{% set TG_OUT = TELEGRAFMERGED.output | upper %}
{% if TG_OUT in ['POSTGRES', 'BOTH'] %}
{% if role.startswith('manager') or role == 'standalone' or role == 'eval' %}
{% for r in ['sensor', 'searchnode', 'heavynode', 'receiver', 'fleet', 'idh', 'desktop', 'import'] %}
{% if FIREWALL_DEFAULT.firewall.role[role].chain["DOCKER-USER"].hostgroups[r] is defined %}
{% do FIREWALL_DEFAULT.firewall.role[role].chain["DOCKER-USER"].hostgroups[r].portgroups.append('postgres') %}
{% endif %}
{% endfor %}
{% endif %}
{% endif %}
{% set FIREWALL_MERGED = salt['pillar.get']('firewall', FIREWALL_DEFAULT.firewall, merge=True) %}
+8 -1
View File
@@ -1,3 +1,10 @@
global:
pcapengine: SURICATA
pipeline: REDIS
pipeline: REDIS
push:
enabled: true
highstate_interval_hours: 2
debounce_seconds: 30
drain_interval: 15
batch: '25%'
batch_wait: 15
+37 -1
View File
@@ -59,5 +59,41 @@ global:
description: Allows use of Endgame with Security Onion. This feature requires a license from Endgame.
global: True
advanced: True
helpLink: influxdb
push:
enabled:
description: Master kill-switch for the active push feature. When disabled, rule and pillar changes are picked up at the next scheduled highstate instead of being pushed immediately.
forcedType: bool
helpLink: push
global: True
highstate_interval_hours:
description: How often every minion in the grid runs a scheduled state.highstate, in hours. Lower values keep minions closer in sync at the cost of more load; higher values reduce load but increase worst-case latency for non-pushed changes. The salt-minion health check restarts a minion if its last highstate is older than this value plus one hour.
forcedType: int
helpLink: push
global: True
advanced: True
debounce_seconds:
description: Trailing-edge debounce window in seconds. A push intent must be quiet for this long before the drainer dispatches. Rapid bursts of edits within this window coalesce into one dispatch.
forcedType: int
helpLink: push
global: True
advanced: True
drain_interval:
description: How often the push drainer checks for ready intents, in seconds. Small values lower dispatch latency at the cost of more background work on the manager.
forcedType: int
helpLink: push
global: True
advanced: True
batch:
description: "Host batch size for push orchestrations. A number (e.g. '10') or a percentage (e.g. '25%'). Limits how many minions run the push state at once so large fleets don't thundering-herd."
helpLink: push
global: True
advanced: True
regex: '^([0-9]+%?)$'
regexFailureMessage: Enter a whole number or a whole-number percentage (e.g. 10 or 25%).
batch_wait:
description: Seconds to wait between host batches in a push orchestration. Gives the fleet time to breathe between waves.
forcedType: int
helpLink: push
global: True
advanced: True
+1
View File
@@ -58,6 +58,7 @@ so-hydra:
- {{ ULIMIT.name }}={{ ULIMIT.soft }}:{{ ULIMIT.hard }}
{% endfor %}
{% endif %}
# Intentionally unless-stopped -- matches the fleet default.
- restart_policy: unless-stopped
- watch:
- file: hydraconfig
+1
View File
@@ -15,6 +15,7 @@ include:
so-idh:
docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-idh:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- name: so-idh
- detach: True
- network_mode: host
+1
View File
@@ -18,6 +18,7 @@ include:
so-influxdb:
docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-influxdb:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- hostname: influxdb
- networks:
- sobridge:
+1
View File
@@ -27,6 +27,7 @@ include:
so-kafka:
docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-kafka:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- hostname: so-kafka
- name: so-kafka
- networks:
+1 -1
View File
@@ -22,7 +22,7 @@ kibana:
- default
- file
migrations:
discardCorruptObjects: "8.18.8"
discardCorruptObjects: "9.3.3"
telemetry:
enabled: False
xpack:
+1
View File
@@ -16,6 +16,7 @@ include:
so-kibana:
docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-kibana:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- hostname: kibana
- user: kibana
- networks:
+1
View File
@@ -51,6 +51,7 @@ so-kratos:
- {{ ULIMIT.name }}={{ ULIMIT.soft }}:{{ ULIMIT.hard }}
{% endfor %}
{% endif %}
# Intentionally unless-stopped -- matches the fleet default.
- restart_policy: unless-stopped
- watch:
- file: kratosschema
+2 -2
View File
@@ -3,8 +3,8 @@ kratos:
description: Enables or disables the Kratos authentication system. WARNING - Disabling this process will cause the grid to malfunction. Re-enabling this setting will require manual effort via SSH.
forcedType: bool
advanced: True
readonly: True
helpLink: kratos
oidc:
enabled:
description: Set to True to enable OIDC / Single Sign-On (SSO) to SOC. Requires a valid Security Onion license key.
@@ -103,7 +103,7 @@ kratos:
config:
session:
lifespan:
description: Defines the length of a login session.
description: Defines the length of a login session before it will timeout, and require a new login.
global: True
helpLink: kratos
whoami:
+3 -2
View File
@@ -26,12 +26,12 @@ logstash:
manager:
- so/0011_input_endgame.conf
- so/0012_input_elastic_agent.conf.jinja
- so/0013_input_lumberjack_fleet.conf
- so/0013_input_lumberjack_fleet.conf.jinja
- so/9999_output_redis.conf.jinja
receiver:
- so/0011_input_endgame.conf
- so/0012_input_elastic_agent.conf.jinja
- so/0013_input_lumberjack_fleet.conf
- so/0013_input_lumberjack_fleet.conf.jinja
- so/9999_output_redis.conf.jinja
search:
- so/0900_input_redis.conf.jinja
@@ -69,4 +69,5 @@ logstash:
pipeline_x_batch_x_size: 125
pipeline_x_ecs_compatibility: disabled
dmz_nodes: []
latency_metrics: False
+1
View File
@@ -28,6 +28,7 @@ include:
so-logstash:
docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-logstash:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- hostname: so-logstash
- name: so-logstash
- networks:
@@ -1,3 +1,4 @@
{%- from 'logstash/map.jinja' import LOGSTASH_MERGED %}
input {
elastic_agent {
port => 5055
@@ -11,10 +12,15 @@ input {
}
}
filter {
if ![metadata] {
mutate {
rename => {"@metadata" => "metadata"}
{% if LOGSTASH_MERGED.get('latency_metrics', False) %}
ruby {
code => "event.set('[_tmp][logstash_from_agent]', Time.now().utc.iso8601(3));"
}
{% endif %}
if ![metadata] {
mutate {
rename => {"@metadata" => "metadata"}
}
}
}
}
@@ -1,23 +0,0 @@
input {
elastic_agent {
port => 5056
tags => [ "elastic-agent", "fleet-lumberjack-input" ]
ssl_enabled => true
ssl_certificate => "/usr/share/logstash/elasticfleet-lumberjack.crt"
ssl_key => "/usr/share/logstash/elasticfleet-lumberjack.key"
ecs_compatibility => v8
id => "fleet-lumberjack-in"
codec => "json"
}
}
filter {
if ![metadata] {
mutate {
rename => {"@metadata" => "metadata"}
}
}
}
@@ -0,0 +1,26 @@
{%- from 'logstash/map.jinja' import LOGSTASH_MERGED %}
input {
elastic_agent {
port => 5056
tags => [ "elastic-agent", "fleet-lumberjack-input" ]
ssl_enabled => true
ssl_certificate => "/usr/share/logstash/elasticfleet-lumberjack.crt"
ssl_key => "/usr/share/logstash/elasticfleet-lumberjack.key"
ecs_compatibility => v8
id => "fleet-lumberjack-in"
codec => "json"
}
}
filter {
{% if LOGSTASH_MERGED.get('latency_metrics', False) %}
ruby {
code => "event.set('[_tmp][logstash_from_fleet]', Time.now().utc.iso8601(3));"
}
{% endif %}
if ![metadata] {
mutate {
rename => {"@metadata" => "metadata"}
}
}
}
@@ -1,3 +1,4 @@
{%- from 'logstash/map.jinja' import LOGSTASH_MERGED %}
{%- set kafka_password = salt['pillar.get']('kafka:config:password') %}
{%- set kafka_trustpass = salt['pillar.get']('kafka:config:trustpass') %}
{%- set kafka_brokers = salt['pillar.get']('kafka:nodes', {}) %}
@@ -30,6 +31,11 @@ input {
}
}
filter {
{% if LOGSTASH_MERGED.get('latency_metrics', False) %}
ruby {
code => "event.set('[_tmp][logstash_from_kafka]', Time.now().utc.iso8601(3));"
}
{% endif %}
if ![metadata] {
mutate {
rename => { "@metadata" => "metadata" }
@@ -1,4 +1,4 @@
{%- from 'logstash/map.jinja' import LOGSTASH_REDIS_NODES with context %}
{%- from 'logstash/map.jinja' import LOGSTASH_REDIS_NODES, LOGSTASH_MERGED %}
{%- set REDIS_PASS = salt['pillar.get']('redis:config:requirepass') %}
{%- for index in range(LOGSTASH_REDIS_NODES|length) %}
@@ -18,3 +18,10 @@ input {
}
{% endfor %}
{% endfor -%}
filter {
{% if LOGSTASH_MERGED.get('latency_metrics', False) %}
ruby {
code => "event.set('[_tmp][logstash_from_redis]', Time.now().utc.iso8601(3));"
}
{% endif %}
}
@@ -1,3 +1,11 @@
{%- from 'logstash/map.jinja' import LOGSTASH_MERGED %}
{% if LOGSTASH_MERGED.get('latency_metrics', False) %}
filter {
ruby {
code => "event.set('[_tmp][logstash_to_elasticsearch]', Time.now().utc.iso8601(3));"
}
}
{% endif %}
output {
if "elastic-agent" in [tags] and "so-ip-mappings" in [tags] {
elasticsearch {
@@ -13,13 +13,20 @@ filter {
add_tag => "fleet-lumberjack-{{ GLOBALS.hostname }}"
}
}
output {
lumberjack {
codec => json
{%- from 'logstash/map.jinja' import LOGSTASH_MERGED %}
{% if LOGSTASH_MERGED.get('latency_metrics', False) %}
filter {
ruby {
code => "event.set('[_tmp][fleet_to_logstash]', Time.now().utc.iso8601(3));"
}
}
{% endif %}
output {
lumberjack {
codec => json
hosts => {{ FAILOVER_LOGSTASH_NODES }}
ssl_certificate => "/usr/share/filebeat/ca.crt"
port => 5056
port => 5056
id => "fleet-lumberjack-{{ GLOBALS.hostname }}"
}
}
}
@@ -1,10 +1,17 @@
{%- from 'logstash/map.jinja' import LOGSTASH_MERGED %}
{%- if grains.role in ['so-heavynode', 'so-receiver'] %}
{%- set HOST = GLOBALS.hostname %}
{%- else %}
{%- set HOST = GLOBALS.manager %}
{%- endif %}
{%- set REDIS_PASS = salt['pillar.get']('redis:config:requirepass') %}
{% if LOGSTASH_MERGED.get('latency_metrics', False) %}
filter {
ruby {
code => "event.set('[_tmp][logstash_to_redis]', Time.now().utc.iso8601(3));"
}
}
{% endif %}
output {
redis {
host => '{{ HOST }}'
+5
View File
@@ -86,3 +86,8 @@ logstash:
multiline: True
advanced: True
forcedType: "[]string"
latency_metrics:
description: Enable latency metrics within events processed by logstash. Useful for pinpointing log ingest delay.
forcedType: bool
global: False
advanced: True
+21
View File
@@ -0,0 +1,21 @@
{% from 'vars/globals.map.jinja' import GLOBALS %}
{% from 'global/map.jinja' import GLOBALMERGED %}
include:
- salt.minion
{% if GLOBALS.is_manager and GLOBALMERGED.push.enabled %}
salt_beacons_pushstate:
file.managed:
- name: /etc/salt/minion.d/beacons_pushstate.conf
- source: salt://manager/files/beacons_pushstate.conf.jinja
- template: jinja
- watch_in:
- service: salt_minion_service
{% else %}
salt_beacons_pushstate:
file.absent:
- name: /etc/salt/minion.d/beacons_pushstate.conf
- watch_in:
- service: salt_minion_service
{% endif %}
@@ -0,0 +1,41 @@
{% from 'global/map.jinja' import GLOBALMERGED %}
beacons:
pillar_db:
- interval: {{ GLOBALMERGED.push.drain_interval }}
- disable_during_state_run: True
inotify:
- disable_during_state_run: True
- coalesce: True
- files:
/opt/so/saltstack/local/salt/suricata/rules:
mask:
- close_write
- moved_to
- delete
recurse: True
auto_add: True
exclude:
- '\.sw[a-z]$':
regex: True
- '~$':
regex: True
- '/4913$':
regex: True
- '/\.#':
regex: True
/opt/so/saltstack/local/salt/strelka/rules/compiled:
mask:
- close_write
- moved_to
- delete
recurse: True
auto_add: True
exclude:
- '\.sw[a-z]$':
regex: True
- '~$':
regex: True
- '/4913$':
regex: True
- '/\.#':
regex: True
+2
View File
@@ -15,6 +15,7 @@ include:
- manager.elasticsearch
- manager.kibana
- manager.managed_soc_annotations
- manager.beacons
repo_log_dir:
file.directory:
@@ -231,6 +232,7 @@ surifiltersrules:
- user: 939
- group: 939
{% else %}
{{sls}}_state_not_allowed:
+381
View File
@@ -0,0 +1,381 @@
#!/usr/bin/env python3
# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
# https://securityonion.net/license; you may not use this file except in compliance with the
# Elastic License 2.0.
# Imports detection overrides (e.g. from so-detections-backup) into the so-detection
# index. Reads <publicId>.<ext> files (NDJSON, one override per line) from a source
# directory, looks up the matching detection by publicId+engine, validates each
# override against the same rules SOC enforces, dedupes against existing overrides
# (operational fields only), and appends new ones.
import argparse
import ipaddress
import json
import os
import re
import sys
from datetime import datetime
import requests
from requests.auth import HTTPBasicAuth
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
DEFAULT_INDEX = "so-detection"
AUTH_FILE = "/opt/so/conf/elasticsearch/curl.config"
ES_URL = "https://localhost:9200"
# Engines we know how to handle and the file extension the backup script writes.
ENGINES = {
"suricata": "txt",
}
# Standard Suricata variables that ship with Security Onion. Anything else
# referenced in an override is "custom" and the user needs to make sure it
# exists in SOC Config before the override will function.
BUILTIN_SURICATA_VARS = {
"$HOME_NET", "$EXTERNAL_NET",
"$HTTP_SERVERS", "$DNS_SERVERS", "$SQL_SERVERS", "$SMTP_SERVERS",
"$TELNET_SERVERS", "$AIM_SERVERS", "$DC_SERVERS", "$MODBUS_SERVER",
"$MODBUS_CLIENT", "$ENIP_CLIENT", "$ENIP_SERVER",
"$HTTP_PORTS", "$SHELLCODE_PORTS", "$ORACLE_PORTS", "$SSH_PORTS",
"$FTP_PORTS", "$FILE_DATA_PORTS",
}
VAR_PATTERN = re.compile(r"\$[A-Z_][A-Z0-9_]*")
# Canonical valid values, per securityonion-soc/model/detection.go.
SURICATA_OVERRIDE_TYPES = {"suppress", "threshold", "modify"}
SUPPRESS_TRACKS = {"by_src", "by_dst", "by_either"}
THRESHOLD_TRACKS = {"by_src", "by_dst", "by_both"}
THRESHOLD_TYPES = {"limit", "threshold", "both"}
STALE_WARNING = """\
WARNING: so-detections-backup does not remove backup files when overrides are
deleted via the Security Onion web UI. As a result, files in the source
directory may represent overrides that were intentionally deleted and should
NOT be re-imported.
Before continuing, verify that the source directory reflects the overrides you
actually want imported. Remove any files corresponding to overrides you previously deleted.
"""
def make_session(auth_file):
with open(auth_file, "r") as f:
for line in f:
if line.startswith("user ="):
creds = line.split("=", 1)[1].strip().replace('"', "")
user, _, password = creds.partition(":")
session = requests.Session()
session.auth = HTTPBasicAuth(user, password)
session.headers.update({"Content-Type": "application/json"})
session.verify = False
return session
raise RuntimeError(f"Could not find 'user =' line in {auth_file}")
def find_detection(session, index, public_id, engine):
query = {
"query": {"bool": {"must": [
{"term": {"so_detection.publicId": public_id}},
{"term": {"so_detection.engine": engine}},
]}},
"size": 2,
}
r = session.get(f"{ES_URL}/{index}/_search", json=query)
r.raise_for_status()
hits = r.json().get("hits", {}).get("hits", [])
if not hits:
return None, None, None
if len(hits) > 1:
# Shouldn't happen — publicId is unique per engine — but flag it.
print(f" WARN: {len(hits)} detections matched publicId={public_id} engine={engine}; using first")
hit = hits[0]
existing = hit["_source"].get("so_detection", {}).get("overrides") or []
return hit["_id"], hit["_index"], existing
def update_overrides(session, doc_index, doc_id, overrides):
body = {"doc": {"so_detection": {"overrides": overrides}}}
r = session.post(f"{ES_URL}/{doc_index}/_update/{doc_id}", json=body)
r.raise_for_status()
return r.json()
def dedupe_key(override):
"""Operational fields only, per Override.Equal() in detection.go.
Excludes timestamps and isEnabled so re-imports don't appear unique."""
t = override.get("type")
if t == "suppress":
return (t, override.get("track"), override.get("ip"))
if t == "threshold":
return (t, override.get("thresholdType"), override.get("track"),
override.get("count"), override.get("seconds"))
if t == "modify":
return (t, override.get("regex"), override.get("value"))
def _validate_suricata_ip(ip):
if not ip:
return "ip cannot be empty"
if ip.startswith("$"):
return None
if ip.startswith("[") and ip.endswith("]"):
for part in ip[1:-1].split(","):
err = _validate_single_ip(part.strip())
if err:
return f"invalid IP in list: {err}"
return None
return _validate_single_ip(ip)
def _validate_single_ip(ip):
try:
if "/" in ip:
ipaddress.ip_network(ip, strict=False)
else:
ipaddress.ip_address(ip)
except ValueError:
return f"invalid IP/CIDR {ip!r}"
return None
def validate_override(override, engine):
"""Mirror Override.Validate() from securityonion-soc/model/detection.go.
Returns None on success, an error string otherwise."""
t = override.get("type")
if not t:
return "override type is required"
if t not in SURICATA_OVERRIDE_TYPES:
return f"invalid type {t!r}: must be one of {sorted(SURICATA_OVERRIDE_TYPES)}"
has = {k: override.get(k) is not None for k in
("regex", "value", "thresholdType", "track", "ip", "count", "seconds", "customFilter")}
if t == "suppress":
if not has["ip"] or not has["track"]:
return "suppress requires 'ip' and 'track'"
if any(has[k] for k in ("regex", "value", "thresholdType", "count", "seconds", "customFilter")):
return "suppress has unnecessary fields"
if override["track"] not in SUPPRESS_TRACKS:
return f"invalid track {override['track']!r}: must be one of {sorted(SUPPRESS_TRACKS)}"
return _validate_suricata_ip(override["ip"])
if t == "threshold":
if not all(has[k] for k in ("thresholdType", "track", "count", "seconds")):
return "threshold requires 'thresholdType', 'track', 'count', 'seconds'"
if any(has[k] for k in ("regex", "value", "customFilter")):
return "threshold has unnecessary fields"
if override["thresholdType"] not in THRESHOLD_TYPES:
return f"invalid thresholdType {override['thresholdType']!r}: must be one of {sorted(THRESHOLD_TYPES)}"
if override["track"] not in THRESHOLD_TRACKS:
return f"invalid track {override['track']!r}: must be one of {sorted(THRESHOLD_TRACKS)}"
if not isinstance(override["count"], int) or override["count"] <= 0:
return f"count must be a positive integer, got {override['count']!r}"
if not isinstance(override["seconds"], int) or override["seconds"] <= 0:
return f"seconds must be a positive integer, got {override['seconds']!r}"
return None
if t == "modify":
if not has["regex"] or not has["value"]:
return "modify requires 'regex' and 'value'"
if any(has[k] for k in ("thresholdType", "track", "count", "seconds", "customFilter")):
return "modify has unnecessary fields"
try:
re.compile(override["regex"])
except re.error as e:
return f"invalid regex: {e}"
return None
def parse_overrides_file(path):
"""Parse a file written by so-detections-backup.py: NDJSON, one override
per line. Returns a list of (override_dict, line_number)."""
overrides = []
with open(path, "r") as f:
for i, line in enumerate(f, start=1):
line = line.strip()
if not line:
continue
overrides.append((json.loads(line), i))
return overrides
def describe(override):
"""Human-readable summary of the operational fields for a given override type."""
t = override.get("type")
if t == "suppress":
return f"type=suppress track={override.get('track')} ip={override.get('ip')}"
if t == "threshold":
return (f"type=threshold track={override.get('track')} "
f"thresholdType={override.get('thresholdType')} "
f"count={override.get('count')} seconds={override.get('seconds')}")
if t == "modify":
return f"type=modify regex={override.get('regex')!r}"
def collect_custom_vars(override):
found = set()
for value in override.values():
if isinstance(value, str):
for match in VAR_PATTERN.findall(value):
if match not in BUILTIN_SURICATA_VARS:
found.add(match)
return found
def parse_args():
p = argparse.ArgumentParser(
description="Import detection overrides into the so-detection index.",
)
p.add_argument("--source", "-s", required=True,
help="Source directory containing <publicId>.<ext> override files.")
p.add_argument("--engine", "-e", default="suricata", choices=list(ENGINES.keys()),
help="Detection engine (default: suricata).")
p.add_argument("--dry-run", "-n", action="store_true",
help="Print what would happen without writing to Elasticsearch.")
p.add_argument("--no-import-note", action="store_true",
help="Do not prepend '[Imported YYYY-MM-DD] ' to the override note.")
p.add_argument("--index", "-i", default=DEFAULT_INDEX,
help=f"Elasticsearch index to update (default: {DEFAULT_INDEX}).")
return p.parse_args()
def confirm_proceed(args):
"""Show the stale-backup warning. Dry-run prints it and continues. Real
runs require the user typing 'yes' at the prompt."""
print(STALE_WARNING)
if args.dry_run:
print("(dry-run: no acknowledgement required)\n")
return True
answer = input("Type 'yes' to acknowledge and continue: ").strip().lower()
print()
return answer == "yes"
def main():
args = parse_args()
if not os.path.isdir(args.source):
print(f"ERROR: source directory not found: {args.source}", file=sys.stderr)
sys.exit(1)
extension = ENGINES[args.engine]
files = sorted(f for f in os.listdir(args.source) if f.endswith(f".{extension}"))
if not files:
print(f"No *.{extension} files found in {args.source}")
sys.exit(0)
if not confirm_proceed(args):
print("Aborted.")
sys.exit(1)
session = make_session(AUTH_FILE)
today = datetime.now().strftime("%Y-%m-%d")
note_prefix = "" if args.no_import_note else f"[Imported {today}] "
counts = {"added": 0, "skipped_dedupe": 0, "skipped_not_found": 0, "invalid": 0, "error": 0}
custom_vars = set()
mode = "DRY-RUN" if args.dry_run else "IMPORT"
print(f"[{mode}] engine={args.engine} source={args.source} index={args.index}\n")
for filename in files:
public_id = os.path.splitext(filename)[0]
path = os.path.join(args.source, filename)
print(f"{public_id}:")
try:
new_overrides = parse_overrides_file(path)
except (json.JSONDecodeError, OSError) as e:
print(f" ERROR: could not parse {filename}: {e}")
counts["error"] += 1
continue
if not new_overrides:
print(" SKIP: empty file")
continue
try:
doc_id, doc_index, existing = find_detection(session, args.index, public_id, args.engine)
except requests.HTTPError as e:
print(f" ERROR: search failed: {e}")
counts["error"] += 1
continue
if doc_id is None:
print(f" WARN: no detection found for publicId={public_id} engine={args.engine}; skipping")
counts["skipped_not_found"] += len(new_overrides)
continue
existing_keys = {dedupe_key(o) for o in existing}
merged = list(existing)
added_this_file = 0
for override, line_no in new_overrides:
err = validate_override(override, args.engine)
if err:
print(f" INVALID (line {line_no}): {err}")
counts["invalid"] += 1
continue
custom_vars.update(collect_custom_vars(override))
key = dedupe_key(override)
if key in existing_keys:
print(f" SKIP (line {line_no}): duplicate of existing override [{describe(override)}]")
counts["skipped_dedupe"] += 1
continue
if note_prefix:
override = dict(override)
override["note"] = note_prefix + (override.get("note") or "")
merged.append(override)
existing_keys.add(key)
added_this_file += 1
print(f" ADD (line {line_no}): {describe(override)}")
if added_this_file == 0:
continue
if args.dry_run:
print(f" DRY-RUN: would update {doc_index}/{doc_id} "
f"({len(existing)} existing → {len(merged)} total)")
counts["added"] += added_this_file
continue
try:
update_overrides(session, doc_index, doc_id, merged)
print(f" UPDATED {doc_index}/{doc_id} ({len(existing)} → {len(merged)})")
counts["added"] += added_this_file
except requests.HTTPError as e:
print(f" ERROR: update failed: {e}")
counts["error"] += 1
print()
print("=" * 60)
print(f"Summary ({mode}):")
print(f" Overrides added: {counts['added']}")
print(f" Skipped (already present): {counts['skipped_dedupe']}")
print(f" Skipped (no detection): {counts['skipped_not_found']}")
print(f" Invalid (failed checks): {counts['invalid']}")
print(f" Errors: {counts['error']}")
if custom_vars:
print()
print("WARNING: detected custom Suricata variables in imported overrides:")
for v in sorted(custom_vars):
print(f" {v}")
print("If any of these are not already defined in SOC Config (Suricata variables),")
print("you must add them manually before the rules will function correctly.")
sys.exit(0 if counts["error"] == 0 and counts["invalid"] == 0 else 1)
if __name__ == "__main__":
main()
@@ -0,0 +1,588 @@
# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
# https://securityonion.net/license; you may not use this file except in compliance with the
# Elastic License 2.0.
import importlib.util
import json
import os
import shutil
import sys
import tempfile
import unittest
from importlib.machinery import SourceFileLoader
from io import StringIO
from unittest.mock import MagicMock, patch
import requests
# The script has no .py extension; spec_from_file_location can't auto-detect a
# loader, so we hand it a SourceFileLoader explicitly. (load_module() is
# deprecated in 3.14 and slated for removal in 3.15.)
HERE = os.path.dirname(os.path.abspath(__file__))
SCRIPT = os.path.join(HERE, "so-detections-overrides-import")
_loader = SourceFileLoader("so_overrides_import", SCRIPT)
_spec = importlib.util.spec_from_loader("so_overrides_import", _loader)
soi = importlib.util.module_from_spec(_spec)
_loader.exec_module(soi)
class TestValidateSuppress(unittest.TestCase):
def test_valid(self):
self.assertIsNone(soi.validate_override(
{"type": "suppress", "track": "by_src", "ip": "1.2.3.4"}, "suricata"))
def test_valid_var(self):
self.assertIsNone(soi.validate_override(
{"type": "suppress", "track": "by_either", "ip": "$HOME_NET"}, "suricata"))
def test_valid_cidr(self):
self.assertIsNone(soi.validate_override(
{"type": "suppress", "track": "by_dst", "ip": "10.0.0.0/8"}, "suricata"))
def test_valid_bracket_list(self):
self.assertIsNone(soi.validate_override(
{"type": "suppress", "track": "by_src", "ip": "[1.2.3.4,10.0.0.0/8]"}, "suricata"))
def test_missing_ip(self):
err = soi.validate_override({"type": "suppress", "track": "by_src"}, "suricata")
self.assertIn("requires", err)
def test_missing_track(self):
err = soi.validate_override({"type": "suppress", "ip": "1.2.3.4"}, "suricata")
self.assertIn("requires", err)
def test_invalid_track(self):
err = soi.validate_override(
{"type": "suppress", "track": "by_both", "ip": "1.2.3.4"}, "suricata")
self.assertIn("invalid track", err)
def test_invalid_ip(self):
err = soi.validate_override(
{"type": "suppress", "track": "by_src", "ip": "not-an-ip"}, "suricata")
self.assertIn("invalid IP", err)
def test_unnecessary_field(self):
err = soi.validate_override(
{"type": "suppress", "track": "by_src", "ip": "1.2.3.4", "count": 5}, "suricata")
self.assertIn("unnecessary fields", err)
class TestValidateThreshold(unittest.TestCase):
def test_valid(self):
self.assertIsNone(soi.validate_override({
"type": "threshold", "track": "by_src",
"thresholdType": "limit", "count": 10, "seconds": 60,
}, "suricata"))
def test_valid_by_both(self):
self.assertIsNone(soi.validate_override({
"type": "threshold", "track": "by_both",
"thresholdType": "both", "count": 1, "seconds": 1,
}, "suricata"))
def test_track_by_either_invalid(self):
err = soi.validate_override({
"type": "threshold", "track": "by_either",
"thresholdType": "limit", "count": 10, "seconds": 60,
}, "suricata")
self.assertIn("invalid track", err)
def test_invalid_threshold_type(self):
err = soi.validate_override({
"type": "threshold", "track": "by_src",
"thresholdType": "bogus", "count": 10, "seconds": 60,
}, "suricata")
self.assertIn("invalid thresholdType", err)
def test_zero_count(self):
err = soi.validate_override({
"type": "threshold", "track": "by_src",
"thresholdType": "limit", "count": 0, "seconds": 60,
}, "suricata")
self.assertIn("count", err)
def test_negative_seconds(self):
err = soi.validate_override({
"type": "threshold", "track": "by_src",
"thresholdType": "limit", "count": 10, "seconds": -1,
}, "suricata")
self.assertIn("seconds", err)
def test_missing_field(self):
err = soi.validate_override({
"type": "threshold", "track": "by_src",
"thresholdType": "limit", "count": 10, # missing seconds
}, "suricata")
self.assertIn("requires", err)
def test_unnecessary_field(self):
err = soi.validate_override({
"type": "threshold", "track": "by_src",
"thresholdType": "limit", "count": 10, "seconds": 60,
"regex": "foo",
}, "suricata")
self.assertIn("unnecessary fields", err)
class TestValidateModify(unittest.TestCase):
def test_valid(self):
self.assertIsNone(soi.validate_override(
{"type": "modify", "regex": r"content:\"foo\"", "value": "content:bar"}, "suricata"))
def test_invalid_regex(self):
err = soi.validate_override(
{"type": "modify", "regex": "(unbalanced", "value": "x"}, "suricata")
self.assertIn("invalid regex", err)
def test_missing_value(self):
err = soi.validate_override({"type": "modify", "regex": "x"}, "suricata")
self.assertIn("requires", err)
def test_unnecessary_field(self):
err = soi.validate_override(
{"type": "modify", "regex": "x", "value": "y", "track": "by_src"}, "suricata")
self.assertIn("unnecessary fields", err)
class TestValidateMisc(unittest.TestCase):
def test_unknown_type(self):
err = soi.validate_override({"type": "suppresss", "track": "by_src", "ip": "1.2.3.4"}, "suricata")
self.assertIn("invalid type", err)
def test_missing_type(self):
err = soi.validate_override({"track": "by_src"}, "suricata")
self.assertIn("type is required", err)
class TestValidateIP(unittest.TestCase):
def test_plain_ipv4(self):
self.assertIsNone(soi._validate_suricata_ip("1.2.3.4"))
def test_plain_ipv6(self):
self.assertIsNone(soi._validate_suricata_ip("::1"))
def test_cidr(self):
self.assertIsNone(soi._validate_suricata_ip("10.0.0.0/8"))
def test_var(self):
self.assertIsNone(soi._validate_suricata_ip("$CONCOURSEWORKERS"))
def test_bracket_list(self):
self.assertIsNone(soi._validate_suricata_ip("[1.2.3.4, 10.0.0.0/8]"))
def test_bracket_list_bad_member(self):
err = soi._validate_suricata_ip("[1.2.3.4,nope]")
self.assertIn("invalid IP in list", err)
def test_empty(self):
self.assertIn("empty", soi._validate_suricata_ip(""))
def test_invalid(self):
self.assertIn("invalid", soi._validate_suricata_ip("999.999.999.999"))
class TestDedupeKey(unittest.TestCase):
def test_suppress(self):
a = {"type": "suppress", "track": "by_src", "ip": "1.2.3.4", "count": 99}
b = {"type": "suppress", "track": "by_src", "ip": "1.2.3.4"}
# count is irrelevant for suppress dedupe
self.assertEqual(soi.dedupe_key(a), soi.dedupe_key(b))
def test_suppress_differs_on_ip(self):
a = {"type": "suppress", "track": "by_src", "ip": "1.2.3.4"}
b = {"type": "suppress", "track": "by_src", "ip": "5.6.7.8"}
self.assertNotEqual(soi.dedupe_key(a), soi.dedupe_key(b))
def test_threshold(self):
a = {"type": "threshold", "track": "by_src", "thresholdType": "limit",
"count": 10, "seconds": 60, "ip": "ignored"}
b = {"type": "threshold", "track": "by_src", "thresholdType": "limit",
"count": 10, "seconds": 60}
self.assertEqual(soi.dedupe_key(a), soi.dedupe_key(b))
def test_threshold_differs_on_count(self):
a = {"type": "threshold", "track": "by_src", "thresholdType": "limit",
"count": 10, "seconds": 60}
b = {"type": "threshold", "track": "by_src", "thresholdType": "limit",
"count": 20, "seconds": 60}
self.assertNotEqual(soi.dedupe_key(a), soi.dedupe_key(b))
def test_modify(self):
a = {"type": "modify", "regex": "x", "value": "y"}
b = {"type": "modify", "regex": "x", "value": "y"}
self.assertEqual(soi.dedupe_key(a), soi.dedupe_key(b))
class TestDescribe(unittest.TestCase):
def test_suppress(self):
s = soi.describe({"type": "suppress", "track": "by_src", "ip": "1.2.3.4"})
self.assertIn("suppress", s)
self.assertIn("by_src", s)
self.assertIn("1.2.3.4", s)
def test_threshold_includes_count(self):
s = soi.describe({"type": "threshold", "track": "by_src",
"thresholdType": "limit", "count": 10, "seconds": 60})
self.assertIn("count=10", s)
self.assertIn("seconds=60", s)
def test_modify(self):
s = soi.describe({"type": "modify", "regex": "foo"})
self.assertIn("modify", s)
self.assertIn("foo", s)
class TestParseOverridesFile(unittest.TestCase):
def _write(self, content):
fd, path = tempfile.mkstemp(suffix=".txt")
os.close(fd)
with open(path, "w") as f:
f.write(content)
self.addCleanup(os.unlink, path)
return path
def test_single_line(self):
path = self._write('{"type":"suppress","track":"by_src","ip":"1.2.3.4"}')
result = soi.parse_overrides_file(path)
self.assertEqual(len(result), 1)
self.assertEqual(result[0][0]["type"], "suppress")
self.assertEqual(result[0][1], 1)
def test_ndjson(self):
path = self._write(
'{"type":"suppress","track":"by_src","ip":"1.2.3.4"}\n'
'{"type":"suppress","track":"by_dst","ip":"5.6.7.8"}\n'
)
result = soi.parse_overrides_file(path)
self.assertEqual(len(result), 2)
self.assertEqual(result[1][1], 2)
def test_empty(self):
path = self._write("")
self.assertEqual(soi.parse_overrides_file(path), [])
def test_blank_lines_skipped(self):
path = self._write('\n{"type":"suppress","track":"by_src","ip":"1.2.3.4"}\n\n')
result = soi.parse_overrides_file(path)
self.assertEqual(len(result), 1)
self.assertEqual(result[0][1], 2) # line number reflects original position
def test_invalid_raises(self):
path = self._write("not json")
with self.assertRaises(json.JSONDecodeError):
soi.parse_overrides_file(path)
class TestCollectCustomVars(unittest.TestCase):
def test_finds_custom(self):
v = soi.collect_custom_vars({"ip": "$CONCOURSEWORKERS"})
self.assertEqual(v, {"$CONCOURSEWORKERS"})
def test_filters_builtins(self):
v = soi.collect_custom_vars({"ip": "$HOME_NET"})
self.assertEqual(v, set())
def test_mixed(self):
v = soi.collect_custom_vars({"ip": "[$HOME_NET,$MYNET]"})
self.assertEqual(v, {"$MYNET"})
def test_non_string_fields_ignored(self):
v = soi.collect_custom_vars({"count": 10, "isEnabled": True})
self.assertEqual(v, set())
class TestMakeSession(unittest.TestCase):
def _write(self, content):
fd, path = tempfile.mkstemp()
os.close(fd)
with open(path, "w") as f:
f.write(content)
self.addCleanup(os.unlink, path)
return path
def test_valid_auth_file(self):
path = self._write('user = "admin:secret"\n')
session = soi.make_session(path)
self.assertEqual(session.auth.username, "admin")
self.assertEqual(session.auth.password, "secret")
self.assertFalse(session.verify)
def test_missing_user_line(self):
path = self._write("# no user line here\n")
with self.assertRaises(RuntimeError):
soi.make_session(path)
class TestFindDetection(unittest.TestCase):
def _session_with_response(self, payload):
session = MagicMock()
response = MagicMock()
response.json.return_value = payload
response.raise_for_status.return_value = None
session.get.return_value = response
return session
def test_found(self):
session = self._session_with_response({"hits": {"hits": [{
"_id": "abc", "_index": "so-detection",
"_source": {"so_detection": {"overrides": [{"type": "suppress"}]}},
}]}})
doc_id, idx, existing = soi.find_detection(session, "so-detection", "2049201", "suricata")
self.assertEqual(doc_id, "abc")
self.assertEqual(idx, "so-detection")
self.assertEqual(len(existing), 1)
def test_not_found(self):
session = self._session_with_response({"hits": {"hits": []}})
doc_id, idx, existing = soi.find_detection(session, "so-detection", "x", "suricata")
self.assertIsNone(doc_id)
self.assertIsNone(idx)
self.assertIsNone(existing)
def test_no_overrides_field(self):
session = self._session_with_response({"hits": {"hits": [{
"_id": "abc", "_index": "so-detection",
"_source": {"so_detection": {}},
}]}})
_, _, existing = soi.find_detection(session, "so-detection", "x", "suricata")
self.assertEqual(existing, [])
def test_multiple_hits_warns(self):
session = self._session_with_response({"hits": {"hits": [
{"_id": "a", "_index": "i", "_source": {"so_detection": {"overrides": []}}},
{"_id": "b", "_index": "i", "_source": {"so_detection": {"overrides": []}}},
]}})
with patch("sys.stdout", new=StringIO()) as out:
doc_id, _, _ = soi.find_detection(session, "i", "x", "suricata")
self.assertEqual(doc_id, "a")
self.assertIn("WARN", out.getvalue())
class TestUpdateOverrides(unittest.TestCase):
def test_posts_to_update_endpoint(self):
session = MagicMock()
response = MagicMock()
response.raise_for_status.return_value = None
response.json.return_value = {"result": "updated"}
session.post.return_value = response
result = soi.update_overrides(session, "so-detection", "abc", [{"type": "suppress"}])
self.assertEqual(result, {"result": "updated"})
url = session.post.call_args[0][0]
self.assertIn("/_update/abc", url)
body = session.post.call_args[1]["json"]
self.assertEqual(body["doc"]["so_detection"]["overrides"], [{"type": "suppress"}])
class TestConfirmProceed(unittest.TestCase):
def test_dry_run_skips_prompt(self):
args = MagicMock(dry_run=True)
with patch("sys.stdout", new=StringIO()):
self.assertTrue(soi.confirm_proceed(args))
def test_yes_input(self):
args = MagicMock(dry_run=False)
with patch("sys.stdout", new=StringIO()):
with patch("builtins.input", return_value="yes"):
self.assertTrue(soi.confirm_proceed(args))
def test_yes_input_case_insensitive(self):
args = MagicMock(dry_run=False)
with patch("sys.stdout", new=StringIO()):
with patch("builtins.input", return_value="YES"):
self.assertTrue(soi.confirm_proceed(args))
def test_no_input_aborts(self):
args = MagicMock(dry_run=False)
with patch("sys.stdout", new=StringIO()):
with patch("builtins.input", return_value="no"):
self.assertFalse(soi.confirm_proceed(args))
def test_empty_input_aborts(self):
args = MagicMock(dry_run=False)
with patch("sys.stdout", new=StringIO()):
with patch("builtins.input", return_value=""):
self.assertFalse(soi.confirm_proceed(args))
class TestParseArgs(unittest.TestCase):
def test_defaults(self):
with patch.object(sys, "argv", ["cmd", "--source", "/some/path"]):
args = soi.parse_args()
self.assertEqual(args.source, "/some/path")
self.assertEqual(args.engine, "suricata")
self.assertFalse(args.dry_run)
self.assertFalse(args.no_import_note)
self.assertEqual(args.index, soi.DEFAULT_INDEX)
def test_all_options(self):
argv = ["cmd", "-s", "/x", "-e", "suricata", "-n",
"--no-import-note", "-i", "alt-index"]
with patch.object(sys, "argv", argv):
args = soi.parse_args()
self.assertEqual(args.source, "/x")
self.assertTrue(args.dry_run)
self.assertTrue(args.no_import_note)
self.assertEqual(args.index, "alt-index")
class TestMain(unittest.TestCase):
def setUp(self):
self.tmpdir = tempfile.mkdtemp()
self.addCleanup(shutil.rmtree, self.tmpdir, ignore_errors=True)
# Stub make_session so tests don't need /opt/so/conf/elasticsearch/curl.config.
p = patch.object(soi, "make_session", return_value=MagicMock())
p.start()
self.addCleanup(p.stop)
def _write_file(self, public_id, overrides, ext="txt"):
"""Write an NDJSON override file. Entries may be dicts or raw strings (for malformed input)."""
path = os.path.join(self.tmpdir, f"{public_id}.{ext}")
with open(path, "w") as f:
for o in overrides:
f.write(o if isinstance(o, str) else json.dumps(o))
f.write("\n")
return path
def _run_main(self, *extra_argv, input_response="yes"):
"""Run main() with stdout/stderr captured and input mocked. Returns (stdout, stderr, exit_code)."""
argv = ["cmd", "--source", self.tmpdir, *extra_argv]
out, err = StringIO(), StringIO()
with patch.object(sys, "argv", argv), \
patch("sys.stdout", new=out), \
patch("sys.stderr", new=err), \
patch("builtins.input", return_value=input_response):
with self.assertRaises(SystemExit) as cm:
soi.main()
return out.getvalue(), err.getvalue(), cm.exception.code
def test_source_dir_missing(self):
argv = ["cmd", "--source", "/no/such/path/here"]
err = StringIO()
with patch.object(sys, "argv", argv), patch("sys.stderr", new=err):
with self.assertRaises(SystemExit) as cm:
soi.main()
self.assertEqual(cm.exception.code, 1)
self.assertIn("source directory not found", err.getvalue())
def test_no_files_found(self):
out, _, code = self._run_main()
self.assertEqual(code, 0)
self.assertIn("No *.txt files found", out)
def test_user_aborts(self):
self._write_file("1001", [{"type": "suppress", "track": "by_src", "ip": "1.2.3.4"}])
out, _, code = self._run_main(input_response="no")
self.assertEqual(code, 1)
self.assertIn("Aborted", out)
def test_parse_error_increments_error(self):
# Malformed JSON line — parse_overrides_file raises JSONDecodeError.
self._write_file("1002", ["not json"])
out, _, code = self._run_main("--dry-run")
self.assertEqual(code, 1) # invalid+error → non-zero
self.assertIn("could not parse", out)
self.assertIn("Errors: 1", out)
def test_empty_file_skipped(self):
# Blank lines only — parse_overrides_file returns []; main reports "empty file" and continues.
path = os.path.join(self.tmpdir, "1003.txt")
with open(path, "w") as f:
f.write("\n\n")
out, _, code = self._run_main("--dry-run")
self.assertEqual(code, 0)
self.assertIn("empty file", out)
@patch.object(soi, "find_detection")
def test_search_http_error(self, mock_find):
mock_find.side_effect = requests.HTTPError("boom")
self._write_file("1004", [{"type": "suppress", "track": "by_src", "ip": "1.2.3.4"}])
out, _, code = self._run_main("--dry-run")
self.assertEqual(code, 1)
self.assertIn("search failed", out)
@patch.object(soi, "find_detection")
def test_no_detection_found(self, mock_find):
mock_find.return_value = (None, None, None)
self._write_file("1005", [{"type": "suppress", "track": "by_src", "ip": "1.2.3.4"}])
out, _, code = self._run_main("--dry-run")
self.assertEqual(code, 0)
self.assertIn("no detection found", out)
self.assertIn("Skipped (no detection): 1", out)
@patch.object(soi, "find_detection")
def test_all_duplicates_no_update(self, mock_find):
existing = [{"type": "suppress", "track": "by_src", "ip": "1.2.3.4"}]
mock_find.return_value = ("doc1", "so-detection", existing)
self._write_file("1006", [{"type": "suppress", "track": "by_src", "ip": "1.2.3.4"}])
out, _, code = self._run_main("--dry-run")
self.assertEqual(code, 0)
self.assertIn("SKIP", out)
self.assertNotIn("DRY-RUN: would update", out) # added_this_file == 0 branch
@patch.object(soi, "update_overrides")
@patch.object(soi, "find_detection")
def test_happy_path_full(self, mock_find, mock_update):
# Exercises: ADD, dedupe SKIP, INVALID, note prefix, UPDATE, custom-vars warning, exit=1 (invalid present)
existing = [{"type": "suppress", "track": "by_src", "ip": "9.9.9.9"}]
mock_find.return_value = ("doc1", "so-detection", existing)
mock_update.return_value = {"result": "updated"}
self._write_file("1007", [
{"type": "suppress", "track": "by_src", "ip": "1.2.3.4"}, # ADD
{"type": "suppress", "track": "by_src", "ip": "9.9.9.9"}, # SKIP (dupe of existing)
{"type": "suppress", "track": "bogus", "ip": "1.2.3.4"}, # INVALID
{"type": "suppress", "track": "by_src", "ip": "$CONCOURSEWORKERS"}, # ADD + custom var
])
out, _, code = self._run_main()
self.assertEqual(code, 1) # one invalid -> non-zero
mock_update.assert_called_once()
merged = mock_update.call_args[0][3]
self.assertEqual(len(merged), 3) # 1 existing + 2 new
new_notes = [o.get("note", "") for o in merged if o.get("ip") in ("1.2.3.4", "$CONCOURSEWORKERS")]
self.assertTrue(all(n.startswith("[Imported ") for n in new_notes))
self.assertIn("ADD", out)
self.assertIn("SKIP", out)
self.assertIn("INVALID", out)
self.assertIn("UPDATED", out)
self.assertIn("$CONCOURSEWORKERS", out)
@patch.object(soi, "update_overrides")
@patch.object(soi, "find_detection")
def test_no_import_note_preserves_note(self, mock_find, mock_update):
mock_find.return_value = ("doc1", "so-detection", [])
mock_update.return_value = {"result": "updated"}
self._write_file("1008", [
{"type": "suppress", "track": "by_src", "ip": "1.2.3.4", "note": "original"},
])
_, _, code = self._run_main("--no-import-note")
self.assertEqual(code, 0)
merged = mock_update.call_args[0][3]
self.assertEqual(merged[0]["note"], "original") # no prefix applied
@patch.object(soi, "find_detection")
def test_dry_run_skips_update(self, mock_find):
mock_find.return_value = ("doc1", "so-detection", [])
self._write_file("1009", [{"type": "suppress", "track": "by_src", "ip": "1.2.3.4"}])
with patch.object(soi, "update_overrides") as mock_update:
out, _, code = self._run_main("--dry-run")
self.assertEqual(code, 0)
mock_update.assert_not_called()
self.assertIn("DRY-RUN: would update", out)
@patch.object(soi, "update_overrides")
@patch.object(soi, "find_detection")
def test_update_http_error(self, mock_find, mock_update):
mock_find.return_value = ("doc1", "so-detection", [])
mock_update.side_effect = requests.HTTPError("nope")
self._write_file("1010", [{"type": "suppress", "track": "by_src", "ip": "1.2.3.4"}])
out, _, code = self._run_main()
self.assertEqual(code, 1)
self.assertIn("update failed", out)
if __name__ == "__main__":
unittest.main()
+232
View File
@@ -0,0 +1,232 @@
#!/opt/saltstack/salt/bin/python3
# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
# https://securityonion.net/license; you may not use this file except in compliance with the
# Elastic License 2.0.
"""
so-push-drainer
===============
Scheduled drainer for the active-push feature. Runs on the manager every
drain_interval seconds (default 15) via a salt schedule in salt/schedule.sls.
For each intent file under /opt/so/state/push_pending/*.json whose last_touch
is older than debounce_seconds, this script:
* concatenates the actions lists from every ready intent
* dedupes by (state or __highstate__, tgt, tgt_type)
* dispatches a single `salt-run state.orchestrate orch.push_batch --async`
with the deduped actions list passed as pillar kwargs
* deletes the contributed intent files on successful dispatch
Reactor sls files (push_suricata, push_strelka, push_pillar) write intents
but never dispatch directly -- see plan
/home/mreeves/.claude/plans/goofy-marinating-hummingbird.md for the full design.
"""
import fcntl
import glob
import json
import logging
import logging.handlers
import os
import subprocess
import sys
import time
import salt.client
PENDING_DIR = '/opt/so/state/push_pending'
LOCK_FILE = os.path.join(PENDING_DIR, '.lock')
LOG_FILE = '/opt/so/log/salt/so-push-drainer.log'
HIGHSTATE_SENTINEL = '__highstate__'
def _make_logger():
logger = logging.getLogger('so-push-drainer')
logger.setLevel(logging.INFO)
if not logger.handlers:
os.makedirs(os.path.dirname(LOG_FILE), exist_ok=True)
handler = logging.handlers.RotatingFileHandler(
LOG_FILE, maxBytes=5 * 1024 * 1024, backupCount=3,
)
handler.setFormatter(logging.Formatter(
'%(asctime)s | %(levelname)s | %(message)s',
))
logger.addHandler(handler)
return logger
def _load_push_cfg():
"""Read the global:push pillar subtree via salt-call. Returns a dict."""
caller = salt.client.Caller()
cfg = caller.cmd('pillar.get', 'global:push', {})
return cfg if isinstance(cfg, dict) else {}
def _read_intent(path, log):
try:
with open(path, 'r') as f:
return json.load(f)
except (IOError, ValueError) as exc:
log.warning('cannot read intent %s: %s', path, exc)
return None
except Exception:
log.exception('unexpected error reading %s', path)
return None
def _dedupe_actions(actions):
seen = set()
deduped = []
for action in actions:
if not isinstance(action, dict):
continue
state_key = HIGHSTATE_SENTINEL if action.get('highstate') else action.get('state')
tgt = action.get('tgt')
tgt_type = action.get('tgt_type', 'compound')
if not state_key or not tgt:
continue
key = (state_key, tgt, tgt_type)
if key in seen:
continue
seen.add(key)
deduped.append(action)
return deduped
def _dispatch(actions, log):
pillar_arg = json.dumps({'actions': actions})
cmd = [
'salt-run',
'state.orchestrate',
'orch.push_batch',
'pillar={}'.format(pillar_arg),
'--async',
]
log.info('dispatching: %s', ' '.join(cmd[:3]) + ' pillar=<{} actions>'.format(len(actions)))
try:
result = subprocess.run(
cmd, check=True, capture_output=True, text=True, timeout=60,
)
except subprocess.CalledProcessError as exc:
log.error('dispatch failed (rc=%s): stdout=%s stderr=%s',
exc.returncode, exc.stdout, exc.stderr)
return False
except subprocess.TimeoutExpired:
log.error('dispatch timed out after 60s')
return False
except Exception:
log.exception('dispatch raised')
return False
log.info('dispatch accepted: %s', (result.stdout or '').strip())
return True
def main():
log = _make_logger()
if not os.path.isdir(PENDING_DIR):
# Nothing to do; reactors create the dir on first use.
return 0
try:
push = _load_push_cfg()
except Exception:
log.exception('failed to read global:push pillar; aborting drain pass')
return 1
if not push.get('enabled', True):
log.debug('push disabled; exiting')
return 0
debounce_seconds = int(push.get('debounce_seconds', 30))
os.makedirs(PENDING_DIR, exist_ok=True)
lock_fd = os.open(LOCK_FILE, os.O_CREAT | os.O_RDWR, 0o644)
try:
fcntl.flock(lock_fd, fcntl.LOCK_EX)
intent_files = [
p for p in sorted(glob.glob(os.path.join(PENDING_DIR, '*.json')))
if os.path.basename(p) != '.lock'
]
if not intent_files:
return 0
now = time.time()
ready = []
skipped = 0
broken = []
for path in intent_files:
intent = _read_intent(path, log)
if not isinstance(intent, dict):
broken.append(path)
continue
last_touch = intent.get('last_touch', 0)
if now - last_touch < debounce_seconds:
skipped += 1
continue
ready.append((path, intent))
for path in broken:
try:
os.unlink(path)
except OSError:
pass
if not ready:
if skipped:
log.debug('no ready intents (%d still in debounce window)', skipped)
return 0
combined_actions = []
oldest_first_touch = now
all_paths = []
for path, intent in ready:
combined_actions.extend(intent.get('actions', []) or [])
first = intent.get('first_touch', now)
if first < oldest_first_touch:
oldest_first_touch = first
all_paths.extend(intent.get('paths', []) or [])
deduped = _dedupe_actions(combined_actions)
if not deduped:
log.warning('%d intent(s) had no usable actions; clearing', len(ready))
for path, _ in ready:
try:
os.unlink(path)
except OSError:
pass
return 0
debounce_duration = now - oldest_first_touch
log.info(
'draining %d intent(s): %d action(s) after dedupe (raw=%d), '
'debounce_duration=%.1fs, paths=%s',
len(ready), len(deduped), len(combined_actions),
debounce_duration, all_paths[:20],
)
if not _dispatch(deduped, log):
log.warning('dispatch failed; leaving intent files in place for retry')
return 1
for path, _ in ready:
try:
os.unlink(path)
except OSError:
log.exception('failed to remove drained intent %s', path)
return 0
finally:
try:
fcntl.flock(lock_fd, fcntl.LOCK_UN)
finally:
os.close(lock_fd)
if __name__ == '__main__':
sys.exit(main())
+432 -17
View File
@@ -24,6 +24,14 @@ BACKUPTOPFILE=/opt/so/saltstack/default/salt/top.sls.backup
SALTUPGRADED=false
SALT_CLOUD_INSTALLED=false
SALT_CLOUD_CONFIGURED=false
# Check if salt-cloud is installed
if rpm -q salt-cloud &>/dev/null; then
SALT_CLOUD_INSTALLED=true
fi
# Check if salt-cloud is configured
if [[ -f /etc/salt/cloud.profiles.d/socloud.conf ]]; then
SALT_CLOUD_CONFIGURED=true
fi
# used to display messages to the user at the end of soup
declare -a FINAL_MESSAGE_QUEUE=()
@@ -362,8 +370,9 @@ preupgrade_changes() {
# This function is to add any new pillar items if needed.
echo "Checking to see if changes are needed."
[[ "$INSTALLEDVERSION" =~ ^2\.4\.21[0-9]+$ ]] && up_to_3.0.0
[[ "$INSTALLEDVERSION" =~ ^2\.4\.21[0-9]+$ ]] && up_to_3.0.0
[[ "$INSTALLEDVERSION" == "3.0.0" ]] && up_to_3.1.0
[[ "$INSTALLEDVERSION" == "3.1.0" ]] && up_to_3.2.0
true
}
@@ -373,6 +382,7 @@ postupgrade_changes() {
[[ "$POSTVERSION" =~ ^2\.4\.21[0-9]+$ ]] && post_to_3.0.0
[[ "$POSTVERSION" == "3.0.0" ]] && post_to_3.1.0
[[ "$POSTVERSION" == "3.1.0" ]] && post_to_3.2.0
true
}
@@ -477,6 +487,158 @@ elasticsearch_backup_index_templates() {
tar -czf /nsm/backup/3.0.0_elasticsearch_index_templates.tar.gz -C /opt/so/conf/elasticsearch/templates/index/ .
}
elasticfleet_set_agent_logging_level_warn() {
. /usr/sbin/so-elastic-fleet-common
local current_agent_policies
if ! current_agent_policies=$(fleet_api "agent_policies?perPage=1000"); then
echo "Warning: unable to retrieve Fleet agent policies"
return 0
fi
# Only updating policies that are within Security Onion defaults and do not already have any user configured advanced_settings.
local policies_to_update
policies_to_update=$(jq -c '
.items[]
| select(has("advanced_settings") | not)
| select(
.id == "so-grid-nodes_general"
or .id == "so-grid-nodes_heavy"
or .id == "endpoints-initial"
or (.id | startswith("FleetServer_"))
)
' <<< "$current_agent_policies")
if [[ -z "$policies_to_update" ]]; then
return 0
fi
while IFS= read -r policy; do
[[ -z "$policy" ]] && continue
local policy_id policy_name policy_namespace
policy_id=$(jq -r '.id' <<< "$policy")
policy_name=$(jq -r '.name' <<< "$policy")
policy_namespace=$(jq -r '.namespace' <<< "$policy")
local update_logging
update_logging=$(jq -n \
--arg name "$policy_name" \
--arg namespace "$policy_namespace" \
'{name: $name, namespace: $namespace, advanced_settings: {agent_logging_level: "warning"}}'
)
echo "Setting elastic agent_logging_level to warning on policy '$policy_name' ($policy_id)."
if ! fleet_api "agent_policies/$policy_id" -XPUT -H 'kbn-xsrf: true' -H 'Content-Type: application/json' -d "$update_logging" >/dev/null; then
echo " warning: failed to update agent policy '$policy_name' ($policy_id)" >&2
fi
done <<< "$policies_to_update"
}
update_logstash_pipeline_name() {
local original_pipeline_name="$1"
local new_pipeline_name="$2"
echo "Checking for conflicting logstash defined_pipelines pillar value."
local LOGSTASH_FILE=/opt/so/saltstack/local/pillar/logstash/soc_logstash.sls
local MINIONDIR=/opt/so/saltstack/local/pillar/minions
for pillar_file in "$LOGSTASH_FILE" "$MINIONDIR"/*.sls; do
[[ -f "$pillar_file" ]] || continue
if grep -q "$original_pipeline_name$" "$pillar_file"; then
echo "Found conflicting defined_pipeline pillar value in $pillar_file. Updating to use the new logstash pipeline name."
sed -i "s#$original_pipeline_name\$#$new_pipeline_name#g" "$pillar_file"
chown socore:socore "$pillar_file"
fi
done
}
check_transform_health_and_reauthorize() {
. /usr/sbin/so-elastic-fleet-common
echo "Checking integration transform jobs for unhealthy / unauthorized status..."
local transforms_doc stats_doc installed_doc
if ! transforms_doc=$(so-elasticsearch-query "_transform/_all?size=1000" --fail --retry 3 --retry-delay 5 2>/dev/null); then
echo "Unable to query for transform jobs, skipping reauthorization."
return 0
fi
if ! stats_doc=$(so-elasticsearch-query "_transform/_all/_stats?size=1000" --fail --retry 3 --retry-delay 5 2>/dev/null); then
echo "Unable to query for transform job stats, skipping reauthorization."
return 0
fi
if ! installed_doc=$(fleet_api "epm/packages/installed?perPage=500"); then
echo "Unable to list installed Fleet packages, skipping reauthorization."
return 0
fi
# Get all transforms that meet the following
# - unhealthy (any non-green health status)
# - metadata has run_as_kibana_system: false (this fix is specific to transforms started prior to Kibana 9.3.3)
# - are not orphaned (integration is not somehow missing/corrupt/uninstalled)
local tmp_transforms tmp_stats tmp_installed
tmp_transforms=$(mktemp)
tmp_stats=$(mktemp)
tmp_installed=$(mktemp)
echo "$transforms_doc" > "$tmp_transforms"
echo "$stats_doc" > "$tmp_stats"
echo "$installed_doc" > "$tmp_installed"
local unhealthy_transforms
unhealthy_transforms=$(jq -c -n \
--slurpfile t "$tmp_transforms" \
--slurpfile s "$tmp_stats" \
--slurpfile i "$tmp_installed" '
($i[0].items | map({key: .name, value: .version}) | from_entries) as $pkg_ver
| ($s[0].transforms | map({key: .id, value: .health.status}) | from_entries) as $health
| [ $t[0].transforms[]
| select(._meta.run_as_kibana_system == false)
| select(($health[.id] // "unknown") != "green")
| {id, pkg: ._meta.package.name, ver: ($pkg_ver[._meta.package.name])}
]
| if length == 0 then empty else . end
| (map(select(.ver == null)) | map({orphan: .id})[]),
(map(select(.ver != null))
| group_by(.pkg)
| map({pkg: .[0].pkg, ver: .[0].ver, transformIds: map(.id)})[])
')
if [[ -z "$unhealthy_transforms" ]]; then
return 0
fi
local unhealthy_count
unhealthy_count=$(jq -s '[.[].transformIds? // empty | .[]] | length' <<< "$unhealthy_transforms")
echo "Found $unhealthy_count transform(s) needing reauthorization."
local total_failures=0
while IFS= read -r transform; do
[[ -z "$transform" ]] && continue
if jq -e 'has("orphan")' <<< "$transform" >/dev/null 2>&1; then
echo "Skipping transform not owned by any installed Fleet package: $(jq -r '.orphan' <<< "$transform")"
continue
fi
local pkg ver body resp
pkg=$(jq -r '.pkg' <<< "$transform")
ver=$(jq -r '.ver' <<< "$transform")
body=$(jq -c '{transforms: (.transformIds | map({transformId: .}))}' <<< "$transform")
echo "Reauthorizing transform(s) for ${pkg}-${ver}..."
resp=$(fleet_api "epm/packages/${pkg}/${ver}/transforms/authorize" \
-XPOST -H 'kbn-xsrf: true' -H 'Content-Type: application/json' \
-d "$body") || { echo "Could not reauthorize transform(s) for ${pkg}-${ver}"; continue; }
(( total_failures += $(jq 'map(select(.success != true)) | length' <<< "$resp" 2>/dev/null) ))
done <<< "$unhealthy_transforms"
rm -f "$tmp_transforms" "$tmp_stats" "$tmp_installed"
if [[ "$total_failures" -gt 0 ]]; then
echo "Some transform(s) failed to reauthorize."
fi
}
ensure_postgres_local_pillar() {
# Postgres was added as a service after 3.0.0, so the new pillar/top.sls
# references postgres.soc_postgres / postgres.adv_postgres unconditionally.
@@ -512,6 +674,31 @@ ensure_postgres_secret() {
chown socore:socore "$secrets_file"
}
rename_strelka_scan_lnk() {
echo "Renaming strelka pillar ScanLNK to ScanLnk."
local STRELKA_FILE=/opt/so/saltstack/local/pillar/strelka/soc_strelka.sls
local MINIONDIR=/opt/so/saltstack/local/pillar/minions
local OLD_KEY=strelka.backend.config.backend.scanners.ScanLNK
local NEW_KEY=strelka.backend.config.backend.scanners.ScanLnk
local TMP_VALUE_FILE
TMP_VALUE_FILE=$(mktemp)
for pillar_file in "$STRELKA_FILE" "$MINIONDIR"/*.sls; do
[[ -f "$pillar_file" ]] || continue
# Skip if ScanLNK doesn't exist
so-yaml.py get "$pillar_file" "$OLD_KEY" > "$TMP_VALUE_FILE" 2>/dev/null || continue
echo "Found 'ScanLNK' key in $pillar_file. Renaming to 'ScanLnk'."
so-yaml.py add "$pillar_file" "$NEW_KEY" "file:$TMP_VALUE_FILE"
so-yaml.py remove "$pillar_file" "$OLD_KEY"
done
rm -f "$TMP_VALUE_FILE"
}
fix_logstash_0013_lumberjack_pipeline_name() {
update_logstash_pipeline_name "so/0013_input_lumberjack_fleet.conf" "so/0013_input_lumberjack_fleet.conf.jinja"
}
up_to_3.1.0() {
ensure_postgres_local_pillar
ensure_postgres_secret
@@ -519,13 +706,18 @@ up_to_3.1.0() {
elasticsearch_backup_index_templates
# Clear existing component template state file.
rm -f /opt/so/state/esfleet_component_templates.json
rename_strelka_scan_lnk
fix_logstash_0013_lumberjack_pipeline_name
INSTALLEDVERSION=3.1.0
}
post_to_3.1.0() {
/usr/sbin/so-kibana-space-defaults
# ensure manager has new version of socloud.conf
if [[ $SALT_CLOUD_CONFIGURED == true ]]; then
salt-call state.apply salt.cloud.config concurrent=True
fi
# Backfill the Telegraf creds pillar for every accepted minion. so-telegraf-cred
# add is idempotent — it no-ops when an entry already exists — so this is safe
@@ -541,11 +733,53 @@ post_to_3.1.0() {
# file_roots of its own and --local would fail with "No matching sls found".
salt-call state.apply postgres.telegraf_users queue=True || true
# Update default agent policies to use logging level warn.
elasticfleet_set_agent_logging_level_warn || true
# Check for unhealthy / unauthorized integration transform jobs and attempt reauthorizations
check_transform_health_and_reauthorize || true
POSTVERSION=3.1.0
}
### 3.1.0 End ###
### 3.2.0 Scripts ###
bootstrap_so_soc_database() {
# init-db.sh is mounted into so-postgres at /docker-entrypoint-initdb.d/init-db.sh
# and runs automatically only on a fresh data directory. Hosts upgrading from
# 3.1.0 already have /nsm/postgres populated, so the so_soc bootstrap block
# added in 3.2 never fires. Re-run the script explicitly; it's idempotent.
echo "Bootstrapping so_soc database via init-db.sh."
# The postgres image has no USER directive, so `docker exec` defaults to
# root, and the container env intentionally omits POSTGRES_USER (the upstream
# entrypoint defaults it transiently during first-init only). Recreate both
# so psql inside init-db.sh resolves the connect user correctly.
local exec_cmd="docker exec -u postgres -e POSTGRES_USER=postgres so-postgres bash /docker-entrypoint-initdb.d/init-db.sh"
if ! /usr/sbin/so-postgres-wait; then
FINAL_MESSAGE_QUEUE+=("WARNING: so-postgres was not ready during the 3.2.0 upgrade; the so_soc database may not have been bootstrapped. Re-run manually: $exec_cmd")
return 0
fi
if ! $exec_cmd; then
FINAL_MESSAGE_QUEUE+=("WARNING: init-db.sh failed inside so-postgres during the 3.2.0 upgrade; the so_soc database may not have been bootstrapped. Re-run manually: $exec_cmd")
return 0
fi
echo "so_soc bootstrap complete."
}
up_to_3.2.0() {
INSTALLEDVERSION=3.2.0
}
post_to_3.2.0() {
bootstrap_so_soc_database
POSTVERSION=3.2.0
}
### 3.2.0 End ###
repo_sync() {
echo "Sync the local repo."
@@ -714,15 +948,6 @@ upgrade_check_salt() {
upgrade_salt() {
echo "Performing upgrade of Salt from $INSTALLEDSALTVERSION to $NEWSALTVERSION."
echo ""
# Check if salt-cloud is installed
if rpm -q salt-cloud &>/dev/null; then
SALT_CLOUD_INSTALLED=true
fi
# Check if salt-cloud is configured
if [[ -f /etc/salt/cloud.profiles.d/socloud.conf ]]; then
SALT_CLOUD_CONFIGURED=true
fi
echo "Removing yum versionlock for Salt."
echo ""
yum versionlock delete "salt"
@@ -806,6 +1031,9 @@ verify_es_version_compatibility() {
local is_active_intermediate_upgrade=1
# supported upgrade paths for SO-ES versions
declare -A es_upgrade_map=(
["8.18.4"]="8.18.6 8.18.8 9.0.8"
["8.18.6"]="8.18.8 9.0.8"
["8.18.8"]="9.0.8"
["9.0.8"]="9.3.3"
)
@@ -829,6 +1057,171 @@ verify_es_version_compatibility() {
exit 160
fi
compatible_es_versions="$target_es_version"
for current_version in "${!es_upgrade_map[@]}"; do
# shellcheck disable=SC2076
if [[ " ${es_upgrade_map[$current_version]} " =~ " $target_es_version " ]]; then
compatible_es_versions+=" $current_version"
fi
done
# Check if the given ES version can directly upgrade to the target ES version. Used to assist with catching lagging nodes during the upgrade process
es_version_can_upgrade_to_target() {
local current_version="$1"
# shellcheck disable=SC2076
if [[ -n "$current_version" && " $compatible_es_versions " =~ " $current_version " ]]; then
return 0
fi
return 1
}
# Gather Elasticsearch cluster version info and verify that each node in the cluster is running a version compatible with the target ES version.
verify_searchnodes_es_target_compatibility() {
local retries=20
local retry_count=0
local delay=180
local expected_es_nodes searchnode_minions attempt
local searchnode_discovery_success=false
SEARCHNODE_ES_VERSIONS=""
for attempt in {1..3}; do
if searchnode_minions=$(set -o pipefail; salt-key --out=json --list=accepted 2> /dev/null | jq -r '.minions[]? | select(endswith("searchnode"))'); then
searchnode_discovery_success=true
break
fi
echo "Failed to retrieve grid searchnodes via salt-key... Retrying in 30 seconds. Attempt $attempt of 3."
sleep 30
done
if [[ "$searchnode_discovery_success" != "true" ]]; then
echo "Failed to retrieve grid searchnodes via salt-key."
return 1
fi
# Always add node running soup to expected es nodes
expected_es_nodes="${MINIONID%_*}"
while IFS= read -r searchnode_minion; do
[[ -z "$searchnode_minion" ]] && continue
expected_es_nodes+=$'\n'"${searchnode_minion%_searchnode}"
done <<< "$searchnode_minions"
while [[ $retry_count -lt $retries ]]; do
SEARCHNODE_ES_VERSIONS=$(so-elasticsearch-query _nodes/_all/version --retry 5 --retry-delay 10 --fail 2>&1)
local exit_status=$?
if [[ $exit_status -ne 0 ]]; then
echo "Failed to retrieve Elasticsearch versions from searchnodes... Retrying in $delay seconds. Attempt $((retry_count + 1)) of $retries."
((retry_count++))
sleep $delay
continue
fi
local all_searchnodes_compatible=true
while IFS=$'\t' read -r node current_version; do
[[ -z "$node" ]] && continue
if ! es_version_can_upgrade_to_target "$current_version"; then
echo "Searchnode $node is running Elasticsearch $current_version, which is not directly upgradable to Elasticsearch $target_es_version."
all_searchnodes_compatible=false
fi
done < <(echo "$SEARCHNODE_ES_VERSIONS" | jq -r '.nodes | to_entries[] | [.value.name, .value.version] | @tsv')
while IFS= read -r expected_es_node; do
[[ -z "$expected_es_node" ]] && continue
if ! echo "$SEARCHNODE_ES_VERSIONS" | jq -e --arg node "$expected_es_node" '.nodes | to_entries | any(.value.name == $node)' > /dev/null; then
echo "Searchnode $expected_es_node did not report an Elasticsearch version. It may be offline or still upgrading."
all_searchnodes_compatible=false
fi
done <<< "$expected_es_nodes"
if [[ "$all_searchnodes_compatible" == true ]]; then
echo "All Searchnodes are upgradable to Elasticsearch $target_es_version."
return 0
fi
echo "One or more Searchnodes cannot upgrade directly to Elasticsearch $target_es_version. Rechecking in $delay seconds. Attempt $((retry_count + 1)) of $retries."
((retry_count++))
sleep $delay
done
return 1
}
# Gather heavynode version info and verify that each node is running a version compatible with the target ES version.
verify_heavynodes_es_target_compatibility() {
local heavynode_minions attempt
local retries=20
local retry_count=0
local delay=180
local heavynode_discovery_success=false
HEAVYNODE_ES_VERSIONS=""
for attempt in {1..3}; do
if heavynode_minions=$(set -o pipefail; salt-key --out=json --list=accepted 2> /dev/null | jq -r '.minions[]? | select(endswith("heavynode"))'); then
heavynode_discovery_success=true
break
fi
echo "Failed to retrieve grid heavynodes via salt-key... Retrying in 30 seconds. Attempt $attempt of 3."
sleep 30
done
if [[ "$heavynode_discovery_success" != "true" ]]; then
echo "Failed to retrieve grid heavynodes via salt-key."
return 1
fi
if [[ -z "$heavynode_minions" ]]; then
echo "No heavynodes detected. Skipping heavynode Elasticsearch version compatibility check."
return 0
fi
while [[ $retry_count -lt $retries ]]; do
HEAVYNODE_ES_VERSIONS=$(salt -C 'G@role:so-heavynode' cmd.run 'set -o pipefail; so-elasticsearch-query / --retry 5 --retry-delay 10 | jq -er ".version.number"' shell=/bin/bash --out=json 2> /dev/null)
local exit_status=$?
if [[ $exit_status -ne 0 ]]; then
echo "Failed to retrieve Elasticsearch version from one or more heavynodes... Retrying in $delay seconds. Attempt $((retry_count + 1)) of $retries."
((retry_count++))
sleep $delay
continue
fi
local all_heavynodes_compatible=true
while IFS=$'\t' read -r node current_version; do
[[ -z "$node" ]] && continue
if ! es_version_can_upgrade_to_target "$current_version"; then
echo "Heavynode $node is running Elasticsearch $current_version, which is not directly upgradable to Elasticsearch $target_es_version."
all_heavynodes_compatible=false
fi
done < <(echo "$HEAVYNODE_ES_VERSIONS" | jq -r 'to_entries[] | [.key, .value] | @tsv')
while IFS= read -r heavynode_minion; do
[[ -z "$heavynode_minion" ]] && continue
if ! echo "$HEAVYNODE_ES_VERSIONS" | jq -se --arg minion "$heavynode_minion" 'add | has($minion)' > /dev/null; then
echo "Heavynode $heavynode_minion did not report an Elasticsearch version. It may be offline or still upgrading."
all_heavynodes_compatible=false
fi
done <<< "$heavynode_minions"
if [[ "$all_heavynodes_compatible" == true ]]; then
echo -e "\nAll heavynodes can upgrade to Elasticsearch $target_es_version."
return 0
fi
echo "One or more heavynodes cannot upgrade directly to Elasticsearch $target_es_version. Rechecking in $delay seconds. Attempt $((retry_count + 1)) of $retries."
((retry_count++))
sleep $delay
done
return 1
}
if [[ ! -f "$es_verification_script" ]]; then
create_intermediate_upgrade_verification_script "$es_verification_script"
fi
for statefile in "${es_required_version_statefile_base}"-*; do
[[ -f $statefile ]] || continue
@@ -847,10 +1240,6 @@ verify_es_version_compatibility() {
continue
fi
if [[ ! -f "$es_verification_script" ]]; then
create_intermediate_upgrade_verification_script "$es_verification_script"
fi
echo -e "\n##############################################################################################################################\n"
echo "A previously required intermediate Elasticsearch upgrade was detected. Verifying that all Searchnodes/Heavynodes have successfully upgraded Elasticsearch to $es_required_version_statefile_value before proceeding with soup to avoid potential data loss! This command can take up to an hour to complete."
if ! timeout --foreground 4000 bash "$es_verification_script" "$es_required_version_statefile_value" "$statefile"; then
@@ -872,6 +1261,26 @@ verify_es_version_compatibility() {
# shellcheck disable=SC2076 # Do not want a regex here eg usage " 8.18.8 9.0.8 " =~ " 9.0.8 "
if [[ " ${es_upgrade_map[$es_version]} " =~ " $target_es_version " || "$es_version" == "$target_es_version" ]]; then
if ! verify_searchnodes_es_target_compatibility || ! verify_heavynodes_es_target_compatibility; then
echo -e "\n!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!\n"
echo "One or more Searchnode(s)/Heavynode(s) cannot upgrade directly to Elasticsearch $target_es_version. This can happen with soups that include Elasticsearch upgrades being run in quick succession. Typically, this will resolve itself as the grid synchronizes. Please allow time for all Searchnodes/Heavynodes to have upgraded Elasticsearch to a compatible version with $target_es_version before running soup again to avoid potential data loss!"
if [[ -n "$HEAVYNODE_ES_VERSIONS" ]]; then
echo "Current heavynode Elasticsearch versions:"
echo "$HEAVYNODE_ES_VERSIONS" | jq '.'
fi
if [[ -n "$SEARCHNODE_ES_VERSIONS" ]]; then
echo "Current searchnode Elasticsearch versions:"
echo "$SEARCHNODE_ES_VERSIONS" | jq '.nodes | to_entries | map({(.value.name): .value.version}) | sort | add'
fi
echo -e "\n!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!\n"
exit 161
fi
# supported upgrade
return 0
else
@@ -1157,7 +1566,13 @@ EOF
# Keeping this block in case we need to do a hotfix that requires salt update
apply_hotfix() {
echo "No actions required. ($INSTALLEDVERSION/$HOTFIXVERSION)"
if [[ "$INSTALLEDVERSION" == "3.1.0" ]] ; then
# Do not remove this fix_logstash_0013_lumberjack_pipeline_name in future hotfixes without first validating older
# installs referencing "so/0013_input_lumberjack_fleet.conf" via pillar are upgradable
fix_logstash_0013_lumberjack_pipeline_name
else
echo "No actions required. ($INSTALLEDVERSION/$HOTFIXVERSION)"
fi
}
failed_soup_restore_items() {
@@ -1229,7 +1644,7 @@ main() {
echo "Verifying we have the latest soup script."
verify_latest_update_script
echo "Verifying Elasticsearch version compatibility before upgrading."
echo "Verifying Elasticsearch version compatibility across the grid before upgrading."
verify_es_version_compatibility
echo "Let's see if we need to update Security Onion."
+1
View File
@@ -34,6 +34,7 @@ make-rule-dir-nginx:
so-nginx:
docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-nginx:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- hostname: so-nginx
- networks:
- sobridge:
+2
View File
@@ -225,6 +225,7 @@ http {
limit_req zone=auth_throttle burst={{ NGINXMERGED.config.throttle_login_burst }} nodelay;
limit_req_status 429;
proxy_pass http://{{ GLOBALS.manager }}:4433;
proxy_set_header Connection "Close";
proxy_read_timeout 90;
proxy_connect_timeout 90;
proxy_set_header Host $host;
@@ -237,6 +238,7 @@ http {
location ~ ^/auth/.*?(whoami|logout|settings|errors|webauthn.js) {
rewrite /auth/(.*) /$1 break;
proxy_pass http://{{ GLOBALS.manager }}:4433;
proxy_set_header Connection "Close";
proxy_read_timeout 90;
proxy_connect_timeout 90;
proxy_set_header Host $host;
+10 -1
View File
@@ -3,7 +3,14 @@
# https://securityonion.net/license; you may not use this file except in compliance with the
# Elastic License 2.0.
{% set hypervisor = pillar.minion_id %}
{% set hypervisor = pillar.get('minion_id', '') %}
{% if not hypervisor|regex_match('^([A-Za-z0-9._-]{1,253})$') %}
{% do salt.log.error('delete_hypervisor_orch: refusing unsafe minion_id=' ~ hypervisor) %}
delete_hypervisor_invalid_minion_id:
test.fail_without_changes:
- name: delete_hypervisor_invalid_minion_id
{% else %}
ensure_hypervisor_mine_deleted:
salt.function:
@@ -20,3 +27,5 @@ update_salt_cloud_profile:
- sls:
- salt.cloud.config
- concurrent: True
{% endif %}
+37
View File
@@ -0,0 +1,37 @@
{% from 'global/map.jinja' import GLOBALMERGED %}
{% set actions = salt['pillar.get']('actions', []) %}
{% set BATCH = GLOBALMERGED.push.batch %}
{% set BATCH_WAIT = GLOBALMERGED.push.batch_wait %}
{% for action in actions %}
{% if action.get('highstate') %}
apply_highstate_{{ loop.index }}:
salt.state:
- tgt: '{{ action.tgt }}'
- tgt_type: {{ action.get('tgt_type', 'compound') }}
- highstate: True
- batch: {{ action.get('batch', BATCH) }}
- batch_wait: {{ action.get('batch_wait', BATCH_WAIT) }}
- kwarg:
queue: 2
{% else %}
refresh_pillar_{{ loop.index }}:
salt.function:
- name: saltutil.refresh_pillar
- tgt: '{{ action.tgt }}'
- tgt_type: {{ action.get('tgt_type', 'compound') }}
apply_{{ action.state | replace('.', '_') }}_{{ loop.index }}:
salt.state:
- tgt: '{{ action.tgt }}'
- tgt_type: {{ action.get('tgt_type', 'compound') }}
- sls:
- {{ action.state }}
- batch: {{ action.get('batch', BATCH) }}
- batch_wait: {{ action.get('batch_wait', BATCH_WAIT) }}
- kwarg:
queue: 2
- require:
- salt: refresh_pillar_{{ loop.index }}
{% endif %}
{% endfor %}
+10 -1
View File
@@ -12,7 +12,14 @@
{% if 'vrt' in salt['pillar.get']('features', []) %}
{% do salt.log.debug('vm_pillar_clean_orch: Running') %}
{% set vm_name = pillar.get('vm_name') %}
{% set vm_name = pillar.get('vm_name', '') %}
{% if not vm_name|regex_match('^([A-Za-z0-9._-]{1,253})$') %}
{% do salt.log.error('vm_pillar_clean_orch: refusing unsafe vm_name=' ~ vm_name) %}
vm_pillar_clean_invalid_name:
test.fail_without_changes:
- name: vm_pillar_clean_invalid_name
{% else %}
delete_adv_{{ vm_name }}_pillar:
module.run:
@@ -24,6 +31,8 @@ delete_{{ vm_name }}_pillar:
- file.remove:
- path: /opt/so/saltstack/local/pillar/minions/{{ vm_name }}.sls
{% endif %}
{% else %}
{% do salt.log.error(
+3 -3
View File
@@ -46,10 +46,10 @@ postgresinitdir:
- require:
- file: postgresconfdir
postgresinitusers:
postgresinitdb:
file.managed:
- name: /opt/so/conf/postgres/init/init-users.sh
- source: salt://postgres/files/init-users.sh
- name: /opt/so/conf/postgres/init/init-db.sh
- source: salt://postgres/files/init-db.sh
- user: 939
- group: 939
- mode: 755
+4 -4
View File
@@ -31,7 +31,7 @@ so-postgres:
- POSTGRES_DB=securityonion
# Passwords are delivered via mounted 0600 secret files, not plaintext env vars.
# The upstream postgres image resolves POSTGRES_PASSWORD_FILE; entrypoint.sh and
# init-users.sh resolve SO_POSTGRES_PASS_FILE the same way.
# init-db.sh resolve SO_POSTGRES_PASS_FILE the same way.
- POSTGRES_PASSWORD_FILE=/run/secrets/postgres_password
- SO_POSTGRES_USER={{ SO_POSTGRES_USER }}
- SO_POSTGRES_PASS_FILE=/run/secrets/so_postgres_pass
@@ -46,7 +46,7 @@ so-postgres:
- /opt/so/conf/postgres/postgresql.conf:/conf/postgresql.conf:ro
- /opt/so/conf/postgres/pg_hba.conf:/conf/pg_hba.conf:ro
- /opt/so/conf/postgres/secrets:/run/secrets:ro
- /opt/so/conf/postgres/init/init-users.sh:/docker-entrypoint-initdb.d/init-users.sh:ro
- /opt/so/conf/postgres/init/init-db.sh:/docker-entrypoint-initdb.d/init-db.sh:ro
- /etc/pki/postgres.crt:/conf/postgres.crt:ro
- /etc/pki/postgres.key:/conf/postgres.key:ro
- /etc/pki/tls/certs/intca.crt:/conf/ca.crt:ro
@@ -70,7 +70,7 @@ so-postgres:
- watch:
- file: postgresconf
- file: postgreshba
- file: postgresinitusers
- file: postgresinitdb
- file: postgres_super_secret
- file: postgres_app_secret
- x509: postgres_crt
@@ -78,7 +78,7 @@ so-postgres:
- require:
- file: postgresconf
- file: postgreshba
- file: postgresinitusers
- file: postgresinitdb
- file: postgres_super_secret
- file: postgres_app_secret
- x509: postgres_crt
@@ -32,3 +32,8 @@ EOSQL
if ! psql -U "$POSTGRES_USER" -tAc "SELECT 1 FROM pg_database WHERE datname='so_telegraf'" | grep -q 1; then
psql -v ON_ERROR_STOP=1 -U "$POSTGRES_USER" -c "CREATE DATABASE so_telegraf"
fi
# Bootstrap the SOC database.
if ! psql -U "$POSTGRES_USER" -tAc "SELECT 1 FROM pg_database WHERE datname='so_soc'" | grep -q 1; then
psql -v ON_ERROR_STOP=1 -U "$POSTGRES_USER" -c "CREATE DATABASE so_soc"
fi
+18 -85
View File
@@ -18,38 +18,22 @@ include:
{% set TG_OUT = TELEGRAFMERGED.output | upper %}
{% if TG_OUT in ['POSTGRES', 'BOTH'] %}
# docker_container.running returns as soon as the container starts, but on
# first-init docker-entrypoint.sh starts a temporary postgres with
# `listen_addresses=''` to run /docker-entrypoint-initdb.d scripts, then
# shuts it down before exec'ing the real CMD. A default pg_isready check
# (Unix socket) passes during that ephemeral phase and races the shutdown
# with "the database system is shutting down". Checking TCP readiness on
# 127.0.0.1 only succeeds after the final postgres binds the port.
postgres_wait_ready:
cmd.run:
- name: |
for i in $(seq 1 60); do
if docker exec so-postgres pg_isready -h 127.0.0.1 -U postgres -q 2>/dev/null; then
exit 0
fi
sleep 2
done
echo "so-postgres did not accept TCP connections within 120s" >&2
exit 1
- name: /usr/sbin/so-postgres-wait
- require:
- docker_container: so-postgres
- file: postgres_sbin
# Ensure the shared Telegraf database exists. init-users.sh only runs on a
# Ensure the shared Telegraf database exists. init-db.sh only runs on a
# fresh data dir, so hosts upgraded onto an existing /nsm/postgres volume
# would otherwise never get so_telegraf.
postgres_create_telegraf_db:
cmd.run:
- name: |
if ! docker exec so-postgres psql -U postgres -tAc "SELECT 1 FROM pg_database WHERE datname='so_telegraf'" | grep -q 1; then
docker exec so-postgres psql -v ON_ERROR_STOP=1 -U postgres -c "CREATE DATABASE so_telegraf"
fi
- name: /usr/sbin/so-telegraf-postgres create_db
- require:
- cmd: postgres_wait_ready
- file: postgres_sbin
# Provision the shared group role and schema once. Every per-minion role is a
# member of so_telegraf, and each Telegraf connection does SET ROLE so_telegraf
@@ -57,68 +41,26 @@ postgres_create_telegraf_db:
# on first write are owned by the group role and every member can INSERT/SELECT.
postgres_telegraf_group_role:
cmd.run:
- name: |
docker exec -i so-postgres psql -v ON_ERROR_STOP=1 -U postgres -d so_telegraf <<'EOSQL'
DO $$
BEGIN
IF NOT EXISTS (SELECT FROM pg_catalog.pg_roles WHERE rolname = 'so_telegraf') THEN
CREATE ROLE so_telegraf NOLOGIN;
END IF;
END
$$;
GRANT CONNECT ON DATABASE so_telegraf TO so_telegraf;
CREATE SCHEMA IF NOT EXISTS telegraf AUTHORIZATION so_telegraf;
GRANT USAGE, CREATE ON SCHEMA telegraf TO so_telegraf;
CREATE SCHEMA IF NOT EXISTS partman;
CREATE EXTENSION IF NOT EXISTS pg_partman SCHEMA partman;
CREATE EXTENSION IF NOT EXISTS pg_cron;
-- Telegraf (running as so_telegraf) calls partman.create_parent()
-- on first write of each metric, which needs USAGE on the partman
-- schema, EXECUTE on its functions/procedures, and write access to
-- partman.part_config so it can register new partitioned parents.
GRANT USAGE, CREATE ON SCHEMA partman TO so_telegraf;
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA partman TO so_telegraf;
GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA partman TO so_telegraf;
GRANT EXECUTE ON ALL PROCEDURES IN SCHEMA partman TO so_telegraf;
-- partman creates per-parent template tables (partman.template_*) at
-- runtime; default privileges extend DML/sequence access to them.
ALTER DEFAULT PRIVILEGES IN SCHEMA partman
GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO so_telegraf;
ALTER DEFAULT PRIVILEGES IN SCHEMA partman
GRANT USAGE, SELECT, UPDATE ON SEQUENCES TO so_telegraf;
-- Hourly partman maintenance. cron.schedule is idempotent by jobname.
SELECT cron.schedule(
'telegraf-partman-maintenance',
'17 * * * *',
'CALL partman.run_maintenance_proc()'
);
EOSQL
- name: /usr/sbin/so-telegraf-postgres group_role
- require:
- cmd: postgres_create_telegraf_db
- file: postgres_sbin
{% set creds = salt['pillar.get']('telegraf:postgres_creds', {}) %}
{% for mid, entry in creds.items() %}
{% if entry.get('user') and entry.get('pass') %}
{% set u = entry.user %}
{% set p = entry.pass | replace("'", "''") %}
{% set p = entry.pass %}
postgres_telegraf_role_{{ u }}:
cmd.run:
- name: |
docker exec -i so-postgres psql -v ON_ERROR_STOP=1 -U postgres -d so_telegraf <<'EOSQL'
DO $$
BEGIN
IF NOT EXISTS (SELECT FROM pg_catalog.pg_roles WHERE rolname = '{{ u }}') THEN
EXECUTE format('CREATE ROLE %I WITH LOGIN PASSWORD %L', '{{ u }}', '{{ p }}');
ELSE
EXECUTE format('ALTER ROLE %I WITH PASSWORD %L', '{{ u }}', '{{ p }}');
END IF;
END
$$;
GRANT CONNECT ON DATABASE so_telegraf TO "{{ u }}";
GRANT so_telegraf TO "{{ u }}";
EOSQL
- name: /usr/sbin/so-telegraf-postgres user
- env:
- ROLE_USER: {{ u | tojson }}
- ROLE_PASS: {{ p | tojson }}
- hide_output: True
- require:
- file: postgres_sbin
- cmd: postgres_telegraf_group_role
{% endif %}
@@ -130,21 +72,12 @@ postgres_telegraf_role_{{ u }}:
{% set retention = salt['pillar.get']('postgres:telegraf:retention_days', 14) | int %}
postgres_telegraf_retention_reconcile:
cmd.run:
- name: |
docker exec -i so-postgres psql -v ON_ERROR_STOP=1 -U postgres -d so_telegraf <<'EOSQL'
DO $$
BEGIN
IF EXISTS (SELECT 1 FROM pg_catalog.pg_extension WHERE extname = 'pg_partman') THEN
UPDATE partman.part_config
SET retention = '{{ retention }} days',
retention_keep_table = false
WHERE parent_table LIKE 'telegraf.%';
END IF;
END
$$;
EOSQL
- name: /usr/sbin/so-telegraf-postgres retention
- env:
- RETENTION_DAYS: {{ retention }}
- require:
- cmd: postgres_telegraf_group_role
- file: postgres_sbin
{% endif %}
+41 -7
View File
@@ -7,15 +7,29 @@
. /usr/sbin/so-common
# Without pipefail, a pipeline's exit status is gzip's. A failed pg_dumpall would
# otherwise be masked by a successful gzip, silently producing a valid .gz that
# holds a truncated dump.
set -o pipefail
# Backups contain role password hashes and full chat data; keep them 0600.
umask 0077
TODAY=$(date '+%Y_%m_%d')
BACKUPDIR=/nsm/backup
BACKUPFILE="$BACKUPDIR/so-postgres-backup-$TODAY.sql.gz"
TMPFILE="$BACKUPFILE.tmp"
MAXBACKUPS=7
LOGFILE=/opt/so/log/postgres/backup.log
mkdir -p $BACKUPDIR
log() {
echo "$(date '+%Y-%m-%d %H:%M:%S') $*" >> "$LOGFILE"
}
mkdir -p "$BACKUPDIR"
# Remove any temp files left behind by a previously crashed run
rm -f "$BACKUPDIR"/so-postgres-backup-*.sql.gz.tmp
# Skip if already backed up today
if [ -f "$BACKUPFILE" ]; then
@@ -27,13 +41,33 @@ if ! docker ps --format '{{.Names}}' | grep -q '^so-postgres$'; then
exit 0
fi
# Dump all databases and roles, compress
docker exec so-postgres pg_dumpall -U postgres | gzip > "$BACKUPFILE"
# Always clean up the temp file on exit; the success path clears this trap
# after the atomic rename so the finished backup is not deleted.
trap 'rm -f "$TMPFILE"' EXIT
# Retention cleanup
NUMBACKUPS=$(find $BACKUPDIR -type f -name "so-postgres-backup*" | wc -l)
# Dump all databases and roles, compress. Write to a temp file so the final
# filename only ever appears for a complete, verified backup.
if ! docker exec so-postgres pg_dumpall -U postgres | gzip > "$TMPFILE"; then
log "ERROR: pg_dumpall/gzip failed; backup aborted"
exit 1
fi
# Verify the compressed stream is intact before publishing it
if ! gzip -t "$TMPFILE"; then
log "ERROR: backup failed gzip integrity check; backup aborted"
exit 1
fi
# Atomically publish the verified backup
mv "$TMPFILE" "$BACKUPFILE"
trap - EXIT
log "OK: wrote $BACKUPFILE"
# Retention cleanup (only reached after a successful backup). The glob is
# restricted to finished backups so an in-progress .tmp can never be counted.
NUMBACKUPS=$(find "$BACKUPDIR" -type f -name "so-postgres-backup-*.sql.gz" | wc -l)
while [ "$NUMBACKUPS" -gt "$MAXBACKUPS" ]; do
OLDEST=$(find $BACKUPDIR -type f -name "so-postgres-backup*" -printf '%T+ %p\n' | sort | head -n 1 | awk -F" " '{print $2}')
OLDEST=$(find "$BACKUPDIR" -type f -name "so-postgres-backup-*.sql.gz" -printf '%T+ %p\n' | sort | head -n 1 | awk -F" " '{print $2}')
rm -f "$OLDEST"
NUMBACKUPS=$(find $BACKUPDIR -type f -name "so-postgres-backup*" | wc -l)
NUMBACKUPS=$(find "$BACKUPDIR" -type f -name "so-postgres-backup-*.sql.gz" | wc -l)
done
+32
View File
@@ -0,0 +1,32 @@
#!/bin/bash
# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
# https://securityonion.net/license; you may not use this file except in compliance with the
# Elastic License 2.0.
# Wait for the so-postgres container to accept TCP connections.
#
# docker_container.running returns as soon as the container starts, but on
# first-init docker-entrypoint.sh starts a temporary postgres with
# `listen_addresses=''` to run /docker-entrypoint-initdb.d scripts, then
# shuts it down before exec'ing the real CMD. A default pg_isready check
# (Unix socket) passes during that ephemeral phase and races the shutdown
# with "the database system is shutting down". Checking TCP readiness on
# 127.0.0.1 only succeeds after the final postgres binds the port.
#
# Usage: so-postgres-wait [iterations] [sleep_seconds]
# Default: 60 iterations, 2s sleep (~120s total).
ITERATIONS=${1:-60}
SLEEP_SECONDS=${2:-2}
for i in $(seq 1 "$ITERATIONS"); do
if docker exec so-postgres pg_isready -h 127.0.0.1 -U postgres -q 2>/dev/null; then
exit 0
fi
sleep "$SLEEP_SECONDS"
done
echo "so-postgres did not accept TCP connections within $((ITERATIONS * SLEEP_SECONDS))s" >&2
exit 1
@@ -0,0 +1,110 @@
#!/bin/bash
set -e
# Provision Telegraf state inside the so-postgres container.
# Usage: so-telegraf-postgres <subcommand>
# create_db Ensure the so_telegraf database exists.
# group_role Provision the so_telegraf group role, telegraf/partman schemas,
# pg_partman, pg_cron, and the hourly partman maintenance job.
# user Create or update a per-minion login role granted to so_telegraf.
# Env: ROLE_USER, ROLE_PASS.
# retention Reconcile partman retention on telegraf parents.
# Env: RETENTION_DAYS.
cmd="${1:?subcommand required}"
case "$cmd" in
create_db)
if ! docker exec so-postgres psql -U postgres -tAc \
"SELECT 1 FROM pg_database WHERE datname='so_telegraf'" | grep -q 1; then
docker exec so-postgres psql -v ON_ERROR_STOP=1 -U postgres \
-c "CREATE DATABASE so_telegraf"
fi
;;
group_role)
docker exec -i so-postgres psql -v ON_ERROR_STOP=1 -U postgres -d so_telegraf <<'EOSQL'
DO $$
BEGIN
IF NOT EXISTS (SELECT FROM pg_catalog.pg_roles WHERE rolname = 'so_telegraf') THEN
CREATE ROLE so_telegraf NOLOGIN;
END IF;
END
$$;
GRANT CONNECT ON DATABASE so_telegraf TO so_telegraf;
CREATE SCHEMA IF NOT EXISTS telegraf AUTHORIZATION so_telegraf;
GRANT USAGE, CREATE ON SCHEMA telegraf TO so_telegraf;
CREATE SCHEMA IF NOT EXISTS partman;
CREATE EXTENSION IF NOT EXISTS pg_partman SCHEMA partman;
CREATE EXTENSION IF NOT EXISTS pg_cron;
-- Telegraf (running as so_telegraf) calls partman.create_parent()
-- on first write of each metric, which needs USAGE on the partman
-- schema, EXECUTE on its functions/procedures, and write access to
-- partman.part_config so it can register new partitioned parents.
GRANT USAGE, CREATE ON SCHEMA partman TO so_telegraf;
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA partman TO so_telegraf;
GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA partman TO so_telegraf;
GRANT EXECUTE ON ALL PROCEDURES IN SCHEMA partman TO so_telegraf;
-- partman creates per-parent template tables (partman.template_*) at
-- runtime; default privileges extend DML/sequence access to them.
ALTER DEFAULT PRIVILEGES IN SCHEMA partman
GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO so_telegraf;
ALTER DEFAULT PRIVILEGES IN SCHEMA partman
GRANT USAGE, SELECT, UPDATE ON SEQUENCES TO so_telegraf;
-- Hourly partman maintenance. cron.schedule is idempotent by jobname.
SELECT cron.schedule(
'telegraf-partman-maintenance',
'17 * * * *',
'CALL partman.run_maintenance_proc()'
);
EOSQL
;;
user)
: "${ROLE_USER:?ROLE_USER is required}"
: "${ROLE_PASS:?ROLE_PASS is required}"
# psql does not substitute :vars inside dollar-quoted strings, so the
# conditional CREATE/ALTER is built outside any DO block and dispatched
# with \gexec. format() handles identifier/literal quoting.
docker exec -i so-postgres psql \
-v ON_ERROR_STOP=1 \
-v role_user="$ROLE_USER" \
-v role_pass="$ROLE_PASS" \
-U postgres -d so_telegraf <<'EOSQL'
SELECT format(
CASE WHEN EXISTS (SELECT FROM pg_catalog.pg_roles WHERE rolname = :'role_user')
THEN 'ALTER ROLE %I WITH LOGIN PASSWORD %L'
ELSE 'CREATE ROLE %I WITH LOGIN PASSWORD %L'
END,
:'role_user',
:'role_pass'
) \gexec
GRANT CONNECT ON DATABASE so_telegraf TO :"role_user";
GRANT so_telegraf TO :"role_user";
EOSQL
;;
retention)
: "${RETENTION_DAYS:?RETENTION_DAYS is required}"
# \gset + \if guards against a missing pg_partman without using a DO
# block (psql :var substitution doesn't reach into dollar-quoted code).
docker exec -i so-postgres psql \
-v ON_ERROR_STOP=1 \
-v retention_days="$RETENTION_DAYS" \
-U postgres -d so_telegraf <<'EOSQL'
SELECT CASE WHEN EXISTS (SELECT 1 FROM pg_catalog.pg_extension WHERE extname = 'pg_partman')
THEN 'true' ELSE 'false' END AS has_partman \gset
\if :has_partman
UPDATE partman.part_config
SET retention = :'retention_days' || ' days',
retention_keep_table = false
WHERE parent_table LIKE 'telegraf.%';
\endif
EOSQL
;;
*)
echo "Unknown subcommand: $cmd" >&2
exit 1
;;
esac
+6 -4
View File
@@ -3,12 +3,15 @@
# https://securityonion.net/license; you may not use this file except in compliance with the
# Elastic License 2.0.
{% if data['id'].endswith('_hypervisor') and data['result'] == True %}
{% set hid = data['id'] %}
{% if hid|regex_match('^([A-Za-z0-9._-]{1,253})$')
and hid.endswith('_hypervisor')
and data['result'] == True %}
{% if data['act'] == 'accept' %}
check_and_trigger:
runner.setup_hypervisor.setup_environment:
- minion_id: {{ data['id'] }}
- minion_id: {{ hid }}
{% endif %}
{% if data['act'] == 'delete' %}
@@ -17,8 +20,7 @@ delete_hypervisor:
- args:
- mods: orch.delete_hypervisor
- pillar:
minion_id: {{ data['id'] }}
minion_id: {{ hid }}
{% endif %}
{% endif %}
+27 -15
View File
@@ -1,7 +1,7 @@
#!py
# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
# https://securityonion.net/license; you may not use this file except in compliance with the
# Elastic License 2.0.
@@ -9,30 +9,42 @@ import logging
import os
import pwd
import grp
import re
log = logging.getLogger(__name__)
PILLAR_ROOT = '/opt/so/saltstack/local/pillar/minions/'
_VMNAME_RE = re.compile(r'^[A-Za-z0-9._-]{1,253}$')
def run():
vm_name = data['kwargs']['name']
logging.error("createEmptyPillar reactor: vm_name: %s" % vm_name)
pillar_root = '/opt/so/saltstack/local/pillar/minions/'
vm_name = data.get('kwargs', {}).get('name', '')
if not _VMNAME_RE.match(str(vm_name)):
log.error("createEmptyPillar reactor: refusing unsafe vm_name=%r", vm_name)
return {}
log.info("createEmptyPillar reactor: vm_name: %s", vm_name)
pillar_files = ['adv_' + vm_name + '.sls', vm_name + '.sls']
try:
# Get socore user and group IDs
socore_uid = pwd.getpwnam('socore').pw_uid
socore_gid = grp.getgrnam('socore').gr_gid
pillar_root_real = os.path.realpath(PILLAR_ROOT)
for f in pillar_files:
full_path = pillar_root + f
if not os.path.exists(full_path):
# Create empty file
os.mknod(full_path)
# Set ownership to socore:socore
os.chown(full_path, socore_uid, socore_gid)
# Set mode to 644 (rw-r--r--)
os.chmod(full_path, 0o640)
logging.error("createEmptyPillar reactor: created %s with socore:socore ownership and mode 644" % f)
full_path = os.path.join(PILLAR_ROOT, f)
resolved = os.path.realpath(full_path)
if os.path.dirname(resolved) != pillar_root_real:
log.error("createEmptyPillar reactor: refusing path outside pillar root: %s", resolved)
continue
if os.path.exists(resolved):
continue
os.mknod(resolved)
os.chown(resolved, socore_uid, socore_gid)
os.chmod(resolved, 0o640)
log.info("createEmptyPillar reactor: created %s with socore:socore ownership and mode 0640", f)
except (KeyError, OSError) as e:
logging.error("createEmptyPillar reactor: Error setting ownership/permissions: %s" % str(e))
log.error("createEmptyPillar reactor: Error setting ownership/permissions: %s", e)
return {}
+33 -11
View File
@@ -1,18 +1,40 @@
#!py
# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
# https://securityonion.net/license; you may not use this file except in compliance with the
# Elastic License 2.0.
remove_key:
wheel.key.delete:
- args:
- match: {{ data['name'] }}
import logging
import re
{{ data['name'] }}_pillar_clean:
runner.state.orchestrate:
- args:
- mods: orch.vm_pillar_clean
- pillar:
vm_name: {{ data['name'] }}
log = logging.getLogger(__name__)
{% do salt.log.info('deleteKey reactor: deleted minion key: %s' % data['name']) %}
_VMNAME_RE = re.compile(r'^[A-Za-z0-9._-]{1,253}$')
def run():
name = data.get('name', '')
if not _VMNAME_RE.match(str(name)):
log.error("deleteKey reactor: refusing unsafe name=%r", name)
return {}
log.info("deleteKey reactor: deleted minion key: %s", name)
return {
'remove_key': {
'wheel.key.delete': [
{'args': [
{'match': name},
]},
],
},
'%s_pillar_clean' % name: {
'runner.state.orchestrate': [
{'args': [
{'mods': 'orch.vm_pillar_clean'},
{'pillar': {'vm_name': name}},
]},
],
},
}
+240
View File
@@ -0,0 +1,240 @@
# One pillar directory can map to multiple (state, tgt) actions.
# tgt is a raw salt compound expression. tgt_type is always "compound".
# Per-action `batch` / `batch_wait` override the orch defaults (25% / 15s).
# An action with `highstate: True` triggers state.highstate instead of
# state.apply -- see salt/orch/push_batch.sls.
#
# Notes:
# - `bpf` is a pillar-only dir (no state of its own) consumed by both
# zeek and suricata via macros, so a bpf pillar change re-applies both.
# - suricata/strelka/zeek/elasticsearch/redis/kafka/logstash etc. have
# their own pillar dirs AND their own state, so they map 1:1 (or 1:2
# in strelka's case, because of the split init.sls / manager.sls).
#
# Intentional omissions (these will log a "not in pillar_push_map.yaml"
# warning in push_pillar.sls and wait for the next scheduled highstate):
# - `data` and `node_data`: pillar-only data consumed by many states;
# handling them generically would amount to a fleetwide highstate.
# - `host`: soc_host describes mainint/mainip; a change is a re-IP and
# needs a coordinated procedure, not an immediate state push.
# - `hypervisor`: state changes touch libvirt and are disruptive; leave
# to the next scheduled highstate.
# - `sensor`: every field in soc_sensor.yaml is `readonly: True` or
# per-minion (`node: True`). Per-minion edits are persisted under
# pillar/minions/<id>.sls and are handled by Branch A of push_pillar.sls
# (per-minion highstate intent), not by this app-pillar map.
#
# The role sets here were verified line-by-line against salt/top.sls. If
# salt/top.sls changes how an app is targeted, update the corresponding
# compound here.
# firewall: the one pillar everyone touches. Applied everywhere intentionally
# because every host's iptables needs to know about every other host in the
# grid. Salt's firewall state is idempotent (file.managed + iptables-restore
# onchanges in salt/firewall/init.sls), so hosts whose rendered firewall is
# unchanged do a file comparison and no-op without touching iptables -- actual
# reload happens only on the hosts whose rules actually changed. Fleetwide
# blast radius is intentional and matches the pre-plan behavior via highstate.
# Adding N sensors in a burst coalesces into one dispatch via the drainer.
firewall:
- state: firewall
tgt: '*'
# backup: backup.config_backup runs on eval, standalone, manager, managerhype,
# managersearch (NOT import -- the backup pillar is included on import per
# pillar/top.sls but the backup state is not run there per salt/top.sls).
backup:
- state: backup.config_backup
tgt: 'G@role:so-eval or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone'
# bpf is pillar-only (no state); consumed by both zeek and suricata as macros.
# Both states run on sensor_roles + so-import per salt/top.sls.
bpf:
- state: zeek
tgt: 'G@role:so-eval or G@role:so-heavynode or G@role:so-import or G@role:so-sensor or G@role:so-standalone'
- state: suricata
tgt: 'G@role:so-eval or G@role:so-heavynode or G@role:so-import or G@role:so-sensor or G@role:so-standalone'
# ca is applied universally.
ca:
- state: ca
tgt: '*'
# docker: universal. The docker state is in both the all-non-managers and
# all-managers branches of salt/top.sls.
docker:
- state: docker
tgt: '*'
# elastalert: eval, standalone, manager, managerhype, managersearch (NOT import).
elastalert:
- state: elastalert
tgt: 'G@role:so-eval or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone'
# elastic-fleet-package-registry: manager_roles exactly.
elastic-fleet-package-registry:
- state: elastic-fleet-package-registry
tgt: 'G@role:so-eval or G@role:so-import or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone'
# elasticsearch: 8 roles.
elasticsearch:
- state: elasticsearch
tgt: 'G@role:so-eval or G@role:so-heavynode or G@role:so-import or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-searchnode or G@role:so-standalone'
# elasticagent: so-heavynode only.
elasticagent:
- state: elasticagent
tgt: 'G@role:so-heavynode'
# elasticfleet: base state only on pillar change. elasticfleet.install_agent_grid
# is a deploy/enrollment step, not a config reload; leave it to the next highstate.
elasticfleet:
- state: elasticfleet
tgt: 'G@role:so-eval or G@role:so-fleet or G@role:so-import or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone'
# global: fanout to a fleetwide highstate. The global pillar (soc_global.sls)
# carries cross-cutting settings (pipeline, url_base, imagerepo, mdengine, ...)
# that are consumed by virtually every state, so a targeted re-apply isn't
# meaningful. The drainer's batch/batch_wait throttling controls blast radius.
global:
- highstate: True
tgt: '*'
# healthcheck: eval, sensor, standalone only.
healthcheck:
- state: healthcheck
tgt: 'G@role:so-eval or G@role:so-sensor or G@role:so-standalone'
# hydra: manager_roles exactly.
hydra:
- state: hydra
tgt: 'G@role:so-eval or G@role:so-import or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone'
# idh: so-idh only.
idh:
- state: idh
tgt: 'G@role:so-idh'
# influxdb: manager_roles exactly.
influxdb:
- state: influxdb
tgt: 'G@role:so-eval or G@role:so-import or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone'
# kafka: standalone, manager, managerhype, managersearch, searchnode, receiver.
kafka:
- state: kafka
tgt: 'G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-receiver or G@role:so-searchnode or G@role:so-standalone'
# kibana: manager_roles exactly.
kibana:
- state: kibana
tgt: 'G@role:so-eval or G@role:so-import or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone'
# kratos: manager_roles exactly.
kratos:
- state: kratos
tgt: 'G@role:so-eval or G@role:so-import or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone'
# logrotate: universal (top-of-file '*' branch in salt/top.sls).
logrotate:
- state: logrotate
tgt: '*'
# logstash: 8 roles, no eval/import.
logstash:
- state: logstash
tgt: 'G@role:so-fleet or G@role:so-heavynode or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-receiver or G@role:so-searchnode or G@role:so-standalone'
# manager: manager_roles exactly. The manager state is also referenced under
# *_sensor / *_heavynode top.sls blocks via `sensor`, but the standalone
# `manager` state itself runs only on manager_roles.
manager:
- state: manager
tgt: 'G@role:so-eval or G@role:so-import or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone'
# nginx: 10 specific roles. NOT receiver, idh, hypervisor, desktop.
nginx:
- state: nginx
tgt: 'G@role:so-eval or G@role:so-fleet or G@role:so-heavynode or G@role:so-import or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-searchnode or G@role:so-sensor or G@role:so-standalone'
# ntp: universal (top-of-file '*' branch in salt/top.sls).
ntp:
- state: ntp
tgt: '*'
# patch: universal. soc_patch carries the OS update schedule, applied via
# patch.os.schedule on every node (it's in both the all-non-managers and
# all-managers branches of salt/top.sls).
patch:
- state: patch.os.schedule
tgt: '*'
# postgres: manager_roles exactly.
postgres:
- state: postgres
tgt: 'G@role:so-eval or G@role:so-import or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone'
# redis: 6 roles. standalone, manager, managerhype, managersearch, heavynode, receiver.
# (NOT eval, NOT import, NOT searchnode.)
redis:
- state: redis
tgt: 'G@role:so-heavynode or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-receiver or G@role:so-standalone'
# registry: manager_roles exactly.
registry:
- state: registry
tgt: 'G@role:so-eval or G@role:so-import or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone'
# sensoroni: universal.
sensoroni:
- state: sensoroni
tgt: '*'
# soc: manager_roles exactly.
soc:
- state: soc
tgt: 'G@role:so-eval or G@role:so-import or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-standalone'
# stig: broad. Runs on standalone, manager, managerhype, managersearch,
# searchnode, sensor, receiver, fleet, hypervisor, desktop.
# NOT eval, NOT import, NOT heavynode, NOT idh (the *_idh block in
# salt/top.sls intentionally omits stig).
stig:
- state: stig
tgt: 'G@role:so-desktop or G@role:so-fleet or G@role:so-hypervisor or G@role:so-manager or G@role:so-managerhype or G@role:so-managersearch or G@role:so-receiver or G@role:so-searchnode or G@role:so-sensor or G@role:so-standalone'
# strelka: sensor-side only on pillar change (sensor_roles). strelka.manager is
# intentionally NOT fired on pillar changes -- YARA rule and strelka config
# pillar changes are consumed by the sensor-side strelka backend, and re-running
# strelka.manager on managers is both unnecessary and disruptive. strelka.manager
# is left to the 2-hour highstate.
strelka:
- state: strelka
tgt: 'G@role:so-eval or G@role:so-heavynode or G@role:so-sensor or G@role:so-standalone'
# suricata: sensor_roles + so-import (5 roles).
suricata:
- state: suricata
tgt: 'G@role:so-eval or G@role:so-heavynode or G@role:so-import or G@role:so-sensor or G@role:so-standalone'
# telegraf: universal.
telegraf:
- state: telegraf
tgt: '*'
# versionlock: universal (top-of-file '*' branch in salt/top.sls).
versionlock:
- state: versionlock
tgt: '*'
# vm: libvirt-driver hypervisors only. Matched by the salt-cloud:driver:libvirt
# grain (compound supports nested grain matching via G@<key>:<subkey>:<value>).
# pillar/vm/soc_vm.sls write path is referenced at salt/_runners/setup_hypervisor.py:856.
vm:
- state: vm
tgt: 'G@salt-cloud:driver:libvirt'
# zeek: sensor_roles + so-import (5 roles).
zeek:
- state: zeek
tgt: 'G@role:so-eval or G@role:so-heavynode or G@role:so-import or G@role:so-sensor or G@role:so-standalone'
+176
View File
@@ -0,0 +1,176 @@
#!py
# Reactor invoked by the pillar_db beacon when SOC records settings changes in
# the so_soc.audit_settings table (see salt/_beacons/pillar_db.py). The beacon
# emits one event per new row carrying setting_id and node_id.
#
# Two branches, keyed on node_id:
# A) node_id populated -> the change is scoped to that one minion. Look up the
# app in pillar_push_map.yaml and write an intent that runs the app's mapped
# state(s) targeted to just that node.
# B) node_id empty -> grid-wide app change. Look up the app in
# pillar_push_map.yaml and write an intent with the entry's actions as-is.
#
# The app name is the first dotted segment of setting_id (e.g. "telegraf.output"
# -> "telegraf"), which matches the pillar_push_map.yaml keys 1:1.
#
# Reactors never dispatch directly. The so-push-drainer schedule picks up
# ready intents, dedupes across pending files, and dispatches orch.push_batch.
import fcntl
import json
import logging
import os
import time
from salt.client import Caller
import yaml
LOG = logging.getLogger(__name__)
PENDING_DIR = '/opt/so/state/push_pending'
LOCK_FILE = os.path.join(PENDING_DIR, '.lock')
MAX_PATHS = 20
# The pillar_push_map.yaml is shipped via salt:// but the reactor runs on the
# master, which mounts the default saltstack tree at this path.
PUSH_MAP_PATH = '/opt/so/saltstack/default/salt/reactor/pillar_push_map.yaml'
_PUSH_MAP_CACHE = {'mtime': 0, 'data': None}
def _load_push_map():
try:
st = os.stat(PUSH_MAP_PATH)
except OSError:
LOG.warning('push_pillar: %s not found', PUSH_MAP_PATH)
return {}
if _PUSH_MAP_CACHE['mtime'] != st.st_mtime:
try:
with open(PUSH_MAP_PATH, 'r') as f:
_PUSH_MAP_CACHE['data'] = yaml.safe_load(f) or {}
except Exception:
LOG.exception('push_pillar: failed to load %s', PUSH_MAP_PATH)
_PUSH_MAP_CACHE['data'] = {}
_PUSH_MAP_CACHE['mtime'] = st.st_mtime
return _PUSH_MAP_CACHE['data'] or {}
def _push_enabled():
try:
caller = Caller()
return bool(caller.cmd('pillar.get', 'global:push:enabled', True))
except Exception:
LOG.exception('push_pillar: pillar.get global:push:enabled failed, assuming enabled')
return True
def _write_intent(key, actions, path):
now = time.time()
try:
os.makedirs(PENDING_DIR, exist_ok=True)
except OSError:
LOG.exception('push_pillar: cannot create %s', PENDING_DIR)
return
intent_path = os.path.join(PENDING_DIR, '{}.json'.format(key))
lock_fd = os.open(LOCK_FILE, os.O_CREAT | os.O_RDWR, 0o644)
try:
fcntl.flock(lock_fd, fcntl.LOCK_EX)
intent = {}
if os.path.exists(intent_path):
try:
with open(intent_path, 'r') as f:
intent = json.load(f)
except (IOError, ValueError):
intent = {}
intent.setdefault('first_touch', now)
intent['last_touch'] = now
intent['actions'] = actions
paths = intent.get('paths', [])
if path and path not in paths:
paths.append(path)
paths = paths[-MAX_PATHS:]
intent['paths'] = paths
tmp_path = intent_path + '.tmp'
with open(tmp_path, 'w') as f:
json.dump(intent, f)
os.rename(tmp_path, intent_path)
except Exception:
LOG.exception('push_pillar: failed to write intent %s', intent_path)
finally:
try:
fcntl.flock(lock_fd, fcntl.LOCK_UN)
finally:
os.close(lock_fd)
def _app_from_setting(setting_id):
# setting_id is e.g. 'telegraf.output' -> 'telegraf', 'ntp.config.servers' -> 'ntp'
if not setting_id:
return None
return setting_id.split('.', 1)[0] or None
def _node_actions(entry, node_id):
# Copy the app's mapped actions but retarget each one to the single node.
# Preserves the state/highstate selection and any batch/batch_wait overrides.
actions = []
for action in entry:
if not isinstance(action, dict):
continue
node_action = dict(action)
node_action['tgt'] = node_id
node_action['tgt_type'] = 'glob'
actions.append(node_action)
return actions
def run():
if not _push_enabled():
LOG.info('push_pillar: push disabled, skipping')
return {}
# The pillar_db beacon nests its payload under data['data']; fall back to the
# top level so the reactor is robust to either shape.
event = data.get('data', data) # noqa: F821 -- data provided by reactor
setting_id = event.get('setting_id', '')
node_id = (event.get('node_id') or '').strip()
app = _app_from_setting(setting_id)
if not app:
LOG.debug('push_pillar: ignoring event with no app segment: setting_id=%s', setting_id)
return {}
push_map = _load_push_map()
entry = push_map.get(app)
if not entry:
LOG.warning(
'push_pillar: app "%s" is not in pillar_push_map.yaml; change will be '
'picked up at the next scheduled highstate (setting_id=%s)',
app, setting_id,
)
return {}
# Branch A: per-node change -> retarget the app's states to just that node.
if node_id:
actions = _node_actions(entry, node_id)
if not actions:
LOG.warning('push_pillar: no usable actions for app "%s" (setting_id=%s)', app, setting_id)
return {}
_write_intent(
'node_{}_{}'.format(node_id, app), actions,
'audit:{}@{}'.format(setting_id, node_id),
)
LOG.info('push_pillar: per-node intent updated for %s on %s (setting_id=%s)',
app, node_id, setting_id)
return {}
# Branch B: grid-wide app change -> use the map entry's actions as-is.
actions = list(entry) # copy to avoid mutating the cache
_write_intent('pillar_{}'.format(app), actions, 'audit:{}'.format(setting_id))
LOG.info('push_pillar: app intent updated for %s (setting_id=%s)', app, setting_id)
return {}
+96
View File
@@ -0,0 +1,96 @@
#!py
# Reactor invoked by the inotify beacon on rule file changes under
# /opt/so/saltstack/local/salt/strelka/rules/compiled/.
#
# Writes (or updates) a push intent at /opt/so/state/push_pending/rules_strelka.json
# and returns {}. The so-push-drainer schedule picks up ready intents, dedupes
# across pending files, and dispatches orch.push_batch. Reactors never dispatch
# directly -- see plan /home/mreeves/.claude/plans/goofy-marinating-hummingbird.md.
import fcntl
import json
import logging
import os
import time
from salt.client import Caller
LOG = logging.getLogger(__name__)
PENDING_DIR = '/opt/so/state/push_pending'
LOCK_FILE = os.path.join(PENDING_DIR, '.lock')
MAX_PATHS = 20
# Mirrors GLOBALS.sensor_roles in salt/vars/globals.map.jinja. Sensor-side
# strelka runs on exactly these four roles; so-import gets strelka.manager
# instead, which is not fired on pillar changes.
SENSOR_ROLES = ['so-eval', 'so-heavynode', 'so-sensor', 'so-standalone']
def _sensor_compound():
return ' or '.join('G@role:{}'.format(r) for r in SENSOR_ROLES)
def _push_enabled():
try:
caller = Caller()
return bool(caller.cmd('pillar.get', 'global:push:enabled', True))
except Exception:
LOG.exception('push_strelka: pillar.get global:push:enabled failed, assuming enabled')
return True
def _write_intent(key, actions, path):
now = time.time()
try:
os.makedirs(PENDING_DIR, exist_ok=True)
except OSError:
LOG.exception('push_strelka: cannot create %s', PENDING_DIR)
return
intent_path = os.path.join(PENDING_DIR, '{}.json'.format(key))
lock_fd = os.open(LOCK_FILE, os.O_CREAT | os.O_RDWR, 0o644)
try:
fcntl.flock(lock_fd, fcntl.LOCK_EX)
intent = {}
if os.path.exists(intent_path):
try:
with open(intent_path, 'r') as f:
intent = json.load(f)
except (IOError, ValueError):
intent = {}
intent.setdefault('first_touch', now)
intent['last_touch'] = now
intent['actions'] = actions
paths = intent.get('paths', [])
if path and path not in paths:
paths.append(path)
paths = paths[-MAX_PATHS:]
intent['paths'] = paths
tmp_path = intent_path + '.tmp'
with open(tmp_path, 'w') as f:
json.dump(intent, f)
os.rename(tmp_path, intent_path)
except Exception:
LOG.exception('push_strelka: failed to write intent %s', intent_path)
finally:
try:
fcntl.flock(lock_fd, fcntl.LOCK_UN)
finally:
os.close(lock_fd)
def run():
if not _push_enabled():
LOG.info('push_strelka: push disabled, skipping')
return {}
path = data.get('path', '') # noqa: F821 -- data provided by reactor
actions = [{'state': 'strelka', 'tgt': _sensor_compound()}]
_write_intent('rules_strelka', actions, path)
LOG.info('push_strelka: intent updated for path=%s', path)
return {}
+95
View File
@@ -0,0 +1,95 @@
#!py
# Reactor invoked by the inotify beacon on rule file changes under
# /opt/so/saltstack/local/salt/suricata/rules/.
#
# Writes (or updates) a push intent at /opt/so/state/push_pending/rules_suricata.json
# and returns {}. The so-push-drainer schedule picks up ready intents, dedupes
# across pending files, and dispatches orch.push_batch. Reactors never dispatch
# directly -- see plan /home/mreeves/.claude/plans/goofy-marinating-hummingbird.md.
import fcntl
import json
import logging
import os
import time
from salt.client import Caller
LOG = logging.getLogger(__name__)
PENDING_DIR = '/opt/so/state/push_pending'
LOCK_FILE = os.path.join(PENDING_DIR, '.lock')
MAX_PATHS = 20
# Mirrors GLOBALS.sensor_roles in salt/vars/globals.map.jinja. Suricata also
# runs on so-import per salt/top.sls, so that role is appended below.
SENSOR_ROLES = ['so-eval', 'so-heavynode', 'so-sensor', 'so-standalone']
def _sensor_compound_plus_import():
return ' or '.join('G@role:{}'.format(r) for r in SENSOR_ROLES) + ' or G@role:so-import'
def _push_enabled():
try:
caller = Caller()
return bool(caller.cmd('pillar.get', 'global:push:enabled', True))
except Exception:
LOG.exception('push_suricata: pillar.get global:push:enabled failed, assuming enabled')
return True
def _write_intent(key, actions, path):
now = time.time()
try:
os.makedirs(PENDING_DIR, exist_ok=True)
except OSError:
LOG.exception('push_suricata: cannot create %s', PENDING_DIR)
return
intent_path = os.path.join(PENDING_DIR, '{}.json'.format(key))
lock_fd = os.open(LOCK_FILE, os.O_CREAT | os.O_RDWR, 0o644)
try:
fcntl.flock(lock_fd, fcntl.LOCK_EX)
intent = {}
if os.path.exists(intent_path):
try:
with open(intent_path, 'r') as f:
intent = json.load(f)
except (IOError, ValueError):
intent = {}
intent.setdefault('first_touch', now)
intent['last_touch'] = now
intent['actions'] = actions
paths = intent.get('paths', [])
if path and path not in paths:
paths.append(path)
paths = paths[-MAX_PATHS:]
intent['paths'] = paths
tmp_path = intent_path + '.tmp'
with open(tmp_path, 'w') as f:
json.dump(intent, f)
os.rename(tmp_path, intent_path)
except Exception:
LOG.exception('push_suricata: failed to write intent %s', intent_path)
finally:
try:
fcntl.flock(lock_fd, fcntl.LOCK_UN)
finally:
os.close(lock_fd)
def run():
if not _push_enabled():
LOG.info('push_suricata: push disabled, skipping')
return {}
path = data.get('path', '') # noqa: F821 -- data provided by reactor
actions = [{'state': 'suricata', 'tgt': _sensor_compound_plus_import()}]
_write_intent('rules_suricata', actions, path)
LOG.info('push_suricata: intent updated for path=%s', path)
return {}
+53 -18
View File
@@ -6,39 +6,74 @@
# Elastic License 2.0.
import logging
from subprocess import call
import yaml
import os
import re
import shlex
import subprocess
log = logging.getLogger(__name__)
SO_MINION = '/usr/sbin/so-minion'
_NODETYPE_RE = re.compile(r'^[A-Z][A-Z0-9_]{0,31}$')
_MINIONID_RE = re.compile(r'^[A-Za-z0-9._-]{1,253}$')
_HOSTPART_RE = re.compile(r'^[A-Za-z0-9._-]{1,253}$')
_IPV4_RE = re.compile(
r'^(?:(?:25[0-5]|2[0-4]\d|[01]?\d?\d)\.){3}'
r'(?:25[0-5]|2[0-4]\d|[01]?\d?\d)$'
)
_HEAP_RE = re.compile(r'^\d{1,6}[kKmMgG]?$')
def _check(name, value, pattern):
s = str(value)
if not pattern.match(s):
raise ValueError("sominion_setup_reactor: refusing unsafe %s=%r" % (name, value))
return s
def run():
log.info('sominion_setup_reactor: Running')
minionid = data['id']
DATA = data['data']
hv_name = DATA['HYPERVISOR_HOST']
log.info('sominion_setup_reactor: DATA: %s' % DATA)
# Build the base command
cmd = "NODETYPE=" + DATA['NODETYPE'] + " /usr/sbin/so-minion -o=addVM -m=" + minionid + " -n=" + DATA['MNIC'] + " -i=" + DATA['MAINIP'] + " -c=" + str(DATA['CPUCORES']) + " -d='" + DATA['NODE_DESCRIPTION'] + "'"
# Add optional arguments only if they exist in DATA
nodetype = _check('NODETYPE', DATA['NODETYPE'], _NODETYPE_RE)
argv = [
SO_MINION,
'-o=addVM',
'-m=' + _check('minionid', minionid, _MINIONID_RE),
'-n=' + _check('MNIC', DATA['MNIC'], _HOSTPART_RE),
'-i=' + _check('MAINIP', DATA['MAINIP'], _IPV4_RE),
'-c=' + str(int(DATA['CPUCORES'])),
'-d=' + str(DATA['NODE_DESCRIPTION']),
]
if 'CORECOUNT' in DATA:
cmd += " -C=" + str(DATA['CORECOUNT'])
argv.append('-C=' + str(int(DATA['CORECOUNT'])))
if 'INTERFACE' in DATA:
cmd += " -a=" + DATA['INTERFACE']
argv.append('-a=' + _check('INTERFACE', DATA['INTERFACE'], _HOSTPART_RE))
if 'ES_HEAP_SIZE' in DATA:
cmd += " -e=" + DATA['ES_HEAP_SIZE']
argv.append('-e=' + _check('ES_HEAP_SIZE', DATA['ES_HEAP_SIZE'], _HEAP_RE))
if 'LS_HEAP_SIZE' in DATA:
cmd += " -l=" + DATA['LS_HEAP_SIZE']
argv.append('-l=' + _check('LS_HEAP_SIZE', DATA['LS_HEAP_SIZE'], _HEAP_RE))
if 'LSHOSTNAME' in DATA:
cmd += " -L=" + DATA['LSHOSTNAME']
log.info('sominion_setup_reactor: Command: %s' % cmd)
rc = call(cmd, shell=True)
argv.append('-L=' + _check('LSHOSTNAME', DATA['LSHOSTNAME'], _HOSTPART_RE))
env = os.environ.copy()
env['NODETYPE'] = nodetype
log.info(
'sominion_setup_reactor: argv: %s (NODETYPE=%s)',
' '.join(shlex.quote(a) for a in argv),
shlex.quote(nodetype),
)
rc = subprocess.call(argv, shell=False, env=env)
log.info('sominion_setup_reactor: rc: %s' % rc)
+1
View File
@@ -17,6 +17,7 @@ include:
so-redis:
docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-redis:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- hostname: so-redis
- user: socore
- networks:
+3
View File
@@ -21,6 +21,9 @@ so-dockerregistry:
- networks:
- sobridge:
- ipv4_address: {{ DOCKERMERGED.containers['so-dockerregistry'].ip }}
# Intentionally `always` (not unless-stopped) -- registry is critical infra
# and must come back up even if it was manually stopped. Do not homogenize
# to unless-stopped; see the container auto-restart section of the plan.
- restart_policy: always
- port_bindings:
{% for BINDING in DOCKERMERGED.containers['so-dockerregistry'].port_bindings %}
+4 -3
View File
@@ -3,7 +3,7 @@
{% set SCHEDULE = salt['pillar.get']('healthcheck:schedule', 30) %}
include:
- salt
- salt.minion
{% if CHECKS and ENABLED %}
salt_beacons:
@@ -14,12 +14,13 @@ salt_beacons:
- defaults:
CHECKS: {{ CHECKS }}
SCHEDULE: {{ SCHEDULE }}
- watch_in:
- watch_in:
- service: salt_minion_service
{% else %}
salt_beacons:
file.absent:
- name: /etc/salt/minion.d/beacons.conf
- watch_in:
- watch_in:
- service: salt_minion_service
{% endif %}
@@ -27,6 +27,7 @@ sool9_{{host}}:
log_file: /opt/so/log/salt/minion
grains:
hypervisor_host: {{host ~ "_" ~ role}}
sosmodel: HVGUEST
preflight_cmds:
- |
{%- set hostnames = [MANAGERHOSTNAME] %}
+11
View File
@@ -0,0 +1,11 @@
reactor:
- 'salt/beacon/*/inotify//opt/so/saltstack/local/salt/suricata/rules':
- salt://reactor/push_suricata.sls
- 'salt/beacon/*/inotify//opt/so/saltstack/local/salt/suricata/rules/*':
- salt://reactor/push_suricata.sls
- 'salt/beacon/*/inotify//opt/so/saltstack/local/salt/strelka/rules/compiled':
- salt://reactor/push_strelka.sls
- 'salt/beacon/*/inotify//opt/so/saltstack/local/salt/strelka/rules/compiled/*':
- salt://reactor/push_strelka.sls
- 'salt/beacon/*/pillar_db/audit_settings':
- salt://reactor/push_pillar.sls
+8
View File
@@ -5,3 +5,11 @@ salt_bootstrap:
- source: salt://salt/scripts/bootstrap-salt.sh
- mode: 755
- show_changes: False
salt_sbin:
file.recurse:
- name: /usr/sbin
- source: salt://salt/tools/sbin
- user: 939
- group: 939
- file_mode: 755
+1 -1
View File
@@ -1,4 +1,4 @@
lasthighstate:
file.touch:
- name: /opt/so/log/salt/lasthighstate
- order: last
- order: 9001
+18 -1
View File
@@ -10,10 +10,12 @@
# software that is protected by the license key."
{% from 'allowed_states.map.jinja' import allowed_states %}
{% from 'global/map.jinja' import GLOBALMERGED %}
{% if sls in allowed_states %}
include:
- salt.minion
- salt.master.pyinotify
{% if 'vrt' in salt['pillar.get']('features', []) %}
- salt.cloud
- salt.cloud.reactor_config_hypervisor
@@ -62,6 +64,21 @@ engines_config:
- name: /etc/salt/master.d/engines.conf
- source: salt://salt/files/engines.conf
{% if GLOBALMERGED.push.enabled %}
reactor_pushstate_config:
file.managed:
- name: /etc/salt/master.d/reactor_pushstate.conf
- source: salt://salt/files/reactor_pushstate.conf
- watch_in:
- service: salt_master_service
{% else %}
reactor_pushstate_config:
file.absent:
- name: /etc/salt/master.d/reactor_pushstate.conf
- watch_in:
- service: salt_master_service
{% endif %}
# update the bootstrap script when used for salt-cloud
salt_bootstrap_cloud:
file.managed:
@@ -77,7 +94,7 @@ salt_master_service:
- file: checkmine_engine
- file: pillarWatch_engine
- file: engines_config
- order: last
- order: 9002
{% else %}
+20
View File
@@ -0,0 +1,20 @@
# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
# https://securityonion.net/license; you may not use this file except in compliance with the
# Elastic License 2.0.
pyinotify_module_package:
file.recurse:
- name: /opt/so/conf/salt/module_packages/pyinotify
- source: salt://salt/module_packages/pyinotify
- clean: True
- makedirs: True
pyinotify_python_module_install:
cmd.run:
- name: /opt/saltstack/salt/bin/python3.10 -m pip install pyinotify --no-index --find-links=/opt/so/conf/salt/module_packages/pyinotify/ --upgrade
- onchanges:
- file: pyinotify_module_package
- failhard: True
- watch_in:
- service: salt_minion_service
-1
View File
@@ -2,4 +2,3 @@
salt:
minion:
version: '3006.19'
check_threshold: 3600 # in seconds, threshold used for so-salt-minion-check. any value less than 600 seconds may cause a lot of salt-minion restarts since the job to touch the file occurs every 5-8 minutes by default
+20 -2
View File
@@ -88,13 +88,17 @@ enable_startup_states:
{% endif %}
# this has to be outside the if statement above since there are <requisite>_in calls to this state
# this has to be outside the if statement above since there are <requisite>_in calls to this state.
# uses watch (not listen) so the restart fires in-state and its result lands on this state's
# running entry; that is what lets wait_for_salt_minion_ready below detect any restart
# uniformly via onchanges, regardless of whether the trigger came from these files or from
# external watch_in's (e.g. beacons, master/pyinotify).
salt_minion_service:
service.running:
- name: salt-minion
- enable: True
- onlyif: test "{{INSTALLEDSALTVERSION}}" == "{{SALTVERSION}}"
- listen:
- watch:
- file: mine_functions
{% if INSTALLEDSALTVERSION|string == SALTVERSION|string %}
- file: set_log_levels
@@ -103,3 +107,17 @@ salt_minion_service:
- file: signing_policy
{% endif %}
- order: last
# block until the just-restarted salt-minion is back and can execute modules locally, so
# follow-on jobs and the next highstate iteration do not race the restart. onchanges +
# require on salt_minion_service catches every restart trigger uniformly because watch
# mod_watch results replace the service state's running entry. wait logic lives in
# /usr/sbin/so-salt-minion-wait (deployed by common_sbin from common/tools/sbin/).
wait_for_salt_minion_ready:
cmd.run:
- name: /usr/sbin/so-salt-minion-wait
- onchanges:
- service: salt_minion_service
- require:
- service: salt_minion_service
- order: last
+35
View File
@@ -0,0 +1,35 @@
#!/bin/bash
#
# Copyright Security Onion Solutions LLC and/or licensed to Security Onion Solutions LLC under one
# or more contributor license agreements. Licensed under the Elastic License 2.0 as shown at
# https://securityonion.net/license; you may not use this file except in compliance with the
# Elastic License 2.0.
# Block until the local salt-minion service is back up and can execute modules locally.
# Invoked from the wait_for_salt_minion_ready state in salt/minion/init.sls after
# salt_minion_service fires its watch-driven mod_watch (a non-blocking systemctl restart),
# so follow-on jobs and the next highstate iteration do not race the in-flight restart.
. /usr/sbin/so-common
# Initial sleep gives the systemctl restart (--no-block by default for salt-minion on
# >=3006.15) time to begin tearing down the old process before we probe for readiness.
INITIAL_SLEEP=3
TIMEOUT=120
PING_TIMEOUT=5
sleep "$INITIAL_SLEEP"
elapsed="$INITIAL_SLEEP"
while [ "$elapsed" -lt "$TIMEOUT" ]; do
if systemctl is-active --quiet salt-minion \
&& salt-call --local --timeout="$PING_TIMEOUT" --out=quiet test.ping >/dev/null 2>&1; then
echo "salt-minion ready after ${elapsed}s"
exit 0
fi
sleep 1
elapsed=$((elapsed + 1))
done
echo "salt-minion did not become ready within ${TIMEOUT}s" >&2
exit 1
+19 -3
View File
@@ -1,10 +1,26 @@
{% from 'vars/globals.map.jinja' import GLOBALS %}
{% from 'vars/globals.map.jinja' import GLOBALS %}
{% from 'global/map.jinja' import GLOBALMERGED %}
highstate_schedule:
schedule.present:
- function: state.highstate
- minutes: 15
- hours: {{ GLOBALMERGED.push.highstate_interval_hours }}
- maxrunning: 1
{% if not GLOBALS.is_manager %}
- splay: 120
- splay: 1800
{% endif %}
{% if GLOBALS.is_manager and GLOBALMERGED.push.enabled %}
push_drain_schedule:
schedule.present:
- function: cmd.run
- job_args:
- /usr/sbin/so-push-drainer
- seconds: {{ GLOBALMERGED.push.drain_interval }}
- maxrunning: 1
- return_job: False
{% elif GLOBALS.is_manager %}
push_drain_schedule:
schedule.absent:
- name: push_drain_schedule
{% endif %}
+1
View File
@@ -14,6 +14,7 @@ include:
so-sensoroni:
docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-soc:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- network_mode: host
- binds:
- /nsm/import:/nsm/import:rw
-5
View File
@@ -24,11 +24,6 @@
{% do SOCDEFAULTS.soc.config.server.modules.elastic.update({'username': GLOBALS.elasticsearch.auth.users.so_elastic_user.user, 'password': GLOBALS.elasticsearch.auth.users.so_elastic_user.pass}) %}
{% if GLOBALS.postgres is defined and GLOBALS.postgres.auth is defined %}
{% set PG_ADMIN_PASS = salt['pillar.get']('secrets:postgres_pass', '') %}
{% do SOCDEFAULTS.soc.config.server.modules.update({'postgres': {'hostUrl': GLOBALS.manager_ip, 'port': 5432, 'username': GLOBALS.postgres.auth.users.so_postgres_user.user, 'password': GLOBALS.postgres.auth.users.so_postgres_user.pass, 'adminUser': 'postgres', 'adminPassword': PG_ADMIN_PASS, 'dbname': 'securityonion', 'sslMode': 'require', 'assistantEnabled': true, 'esHostUrl': 'https://' ~ GLOBALS.manager_ip ~ ':9200', 'esUsername': GLOBALS.elasticsearch.auth.users.so_elastic_user.user, 'esPassword': GLOBALS.elasticsearch.auth.users.so_elastic_user.pass, 'esVerifyCert': false}}) %}
{% endif %}
{% do SOCDEFAULTS.soc.config.server.modules.influxdb.update({'hostUrl': 'https://' ~ GLOBALS.influxdb_host ~ ':8086'}) %}
{% do SOCDEFAULTS.soc.config.server.modules.influxdb.update({'token': INFLUXDB_TOKEN}) %}
{% for tool in SOCDEFAULTS.soc.config.server.client.tools %}
+6
View File
@@ -1519,6 +1519,12 @@ soc:
serviceAccountJSON: ""
serviceAccountLocation: ""
healthTimeoutSeconds: 5
onionconfig:
saltstackDir: /opt/so/saltstack
bypassEnabled: false
postgres:
host:
password:
salt:
queueDir: /opt/sensoroni/queue
timeoutMs: 45000
+1
View File
@@ -18,6 +18,7 @@ include:
so-soc:
docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-soc:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- hostname: soc
- name: so-soc
- networks:
+137 -1
View File
@@ -117,6 +117,121 @@ transformations:
- type: logsource
product: linux
service: auth
# Maps M365 audit rules to Elastic Agent O365 integration logs
- id: m365_audit_field_mappings
type: field_name_mapping
mapping:
Operation: event.action
ResultStatus: event.outcome
ApplicationId: o365.audit.ApplicationId
ObjectId: o365.audit.ObjectId
RequestType: o365.audit.RequestType
rule_conditions:
- type: logsource
product: m365
service: audit
- id: m365_audit_add-fields
type: add_condition
conditions:
event.dataset: 'o365.audit'
event.module: 'o365'
rule_conditions:
- type: logsource
product: m365
service: audit
# Maps M365 exchange rules to Elastic Agent O365 integration logs
- id: m365_exchange_field_mappings
type: field_name_mapping
mapping:
eventSource: event.provider
eventName: event.action
status: event.outcome
rule_conditions:
- type: logsource
product: m365
service: exchange
- id: m365_exchange_add-fields
type: add_condition
conditions:
event.dataset: 'o365.audit'
event.module: 'o365'
rule_conditions:
- type: logsource
product: m365
service: exchange
# Maps M365 threat_management rules to Elastic Agent O365 integration logs
- id: m365_threat_management_field_mappings
type: field_name_mapping
mapping:
eventSource: event.provider
eventName: event.action
status: event.outcome
rule_conditions:
- type: logsource
product: m365
service: threat_management
- id: m365_threat_management_add-fields
type: add_condition
conditions:
event.dataset: 'o365.audit'
event.module: 'o365'
rule_conditions:
- type: logsource
product: m365
service: threat_management
# Maps M365 threat_detection rules to Elastic Agent O365 integration logs
- id: m365_threat_detection_field_mappings
type: field_name_mapping
mapping:
eventSource: event.provider
eventName: event.action
status: event.outcome
rule_conditions:
- type: logsource
product: m365
service: threat_detection
- id: m365_threat_detection_add-fields
type: add_condition
conditions:
event.dataset: 'o365.audit'
event.module: 'o365'
rule_conditions:
- type: logsource
product: m365
service: threat_detection
# Maps FortiGate event rules to Elastic Agent Fortinet integration logs
- id: fortigate_event_field_mappings
type: field_name_mapping
mapping:
action: fortinet.firewall.action
cfgpath: fortinet.firewall.cfgpath
cfgobj: fortinet.firewall.cfgobj
cfgattr: fortinet.firewall.cfgattr
devname: observer.name
devid: observer.serial_number
logid: event.code
type: fortinet.firewall.type
subtype: fortinet.firewall.subtype
level: log.level
vd: fortinet.firewall.vd
logdesc: fortinet.firewall.desc
user: user.name
ui: fortinet.firewall.ui
cfgtid: fortinet.firewall.cfgtid
msg: message
rule_conditions:
- type: logsource
product: fortigate
service: event
- id: fortigate_event_add-fields
type: add_condition
conditions:
event.dataset: 'fortinet_fortigate.log'
event.module: 'fortinet_fortigate'
rule_conditions:
- type: logsource
product: fortigate
service: event
# event.code should always be a string
- id: convert_event_code_to_string
type: convert_type
@@ -126,15 +241,36 @@ transformations:
fields:
- event.code
# Maps process_creation rules to endpoint process creation logs
# This is an OS-agnostic mapping, to account for logs that don't specify source OS
- id: endpoint_process_create_windows_add-fields
type: add_condition
conditions:
event.category: 'process'
event.type: 'start'
host.os.type: 'windows'
rule_conditions:
- type: logsource
category: process_creation
product: windows
- id: endpoint_process_create_macos_add-fields
type: add_condition
conditions:
event.category: 'process'
event.type: 'start'
host.os.type: 'macos'
rule_conditions:
- type: logsource
category: process_creation
product: macos
- id: endpoint_process_create_linux_add-fields
type: add_condition
conditions:
event.category: 'process'
event.type: 'start'
host.os.type: 'linux'
rule_conditions:
- type: logsource
category: process_creation
product: linux
# Maps file_event rules to endpoint file creation logs
# This is an OS-agnostic mapping, to account for logs that don't specify source OS
- id: endpoint_file_create_add-fields
+7
View File
@@ -16,6 +16,13 @@
{% do SOCMERGED.config.server.update({'additionalCA': MANAGERMERGED.additionalCA}) %}
{% do SOCMERGED.config.server.update({'insecureSkipVerify': MANAGERMERGED.insecureSkipVerify}) %}
{% if not SOCMERGED.config.server.modules.postgres.host %}
{% do SOCMERGED.config.server.modules.postgres.update({'host': GLOBALS.manager}) %}
{% endif %}
{% if not SOCMERGED.config.server.modules.postgres.password %}
{% do SOCMERGED.config.server.modules.postgres.update({'password': salt['pillar.get']('secrets:postgres_pass', '')}) %}
{% endif %}
{# if SOCMERGED.config.server.modules.cases == httpcase details come from the soc pillar #}
{% if SOCMERGED.config.server.modules.cases != 'soc' %}
{% do SOCMERGED.config.server.modules.elastic.update({'casesEnabled': false}) %}
+26
View File
@@ -3,6 +3,7 @@ soc:
description: Enables or disables SOC. WARNING - Disabling this setting is unsupported and will cause the grid to malfunction. Re-enabling this setting is a manual effort via SSH.
forcedType: bool
advanced: True
readonly: True
telemetryEnabled:
title: SOC Telemetry
description: When this setting is enabled and the grid is not in airgap mode, SOC will provide feature usage data to the Security Onion development team via Google Analytics. This data helps Security Onion developers determine which product features are being used and can also provide insight into improving the user interface. When changing this setting, wait for the grid to fully synchronize and then perform a hard browser refresh on SOC, to force the browser cache to update and reflect the new setting.
@@ -452,6 +453,26 @@ soc:
description: Duration (in milliseconds) that must elapse after a grid node fails to check-in before the node will be marked offline (fault).
global: True
advanced: True
onionconfig:
saltstackDir:
description: Root directory containing the SaltStack tree that SOC reads and writes configuration from. Should not be changed under normal circumstances.
global: True
advanced: True
bypassEnabled:
description: When enabled, errors encountered while reading the SaltStack pillar tree (missing files, unreadable directories, etc.) are logged but do not prevent SOC from starting or serving settings. Intended for advanced troubleshooting and recovery scenarios when the pillar tree is partially unreadable.
global: True
advanced: True
forcedType: bool
postgres:
host:
description: Hostname or IP address of the PostgreSQL server used by SOC. Defaults to the manager hostname.
global: True
advanced: True
password:
description: Password used by SOC to authenticate to the PostgreSQL server. Defaults to the postgres superuser password seeded in the secrets pillar.
global: True
sensitive: True
advanced: True
salt:
longRelayTimeoutMs:
description: Duration (in milliseconds) to wait for a response from the Salt API when executing tasks known for being long running before giving up and showing an error on the SOC UI.
@@ -817,6 +838,7 @@ soc:
description: List of available external tools visible in the SOC UI. Each tool is defined in JSON object notation, and must include the "name" key and "link" key, where the link is the tool's URL.
global: True
advanced: True
multiline: True
forcedType: "[]{}"
exportNodeId:
description: The node ID on which export jobs will be executed.
@@ -890,12 +912,16 @@ soc:
suricata:
description: The template used when creating a new Suricata detection. [publicId] will be replaced with an unused Public Id.
multiline: True
forcedType: string
strelka:
description: The template used when creating a new Strelka detection.
multiline: True
forcedType: string
elastalert:
description: The template used when creating a new ElastAlert detection. [publicId] will be replaced with an unused Public Id.
multiline: True
forcedType: string
grid:
maxUploadSize:
description: The maximum number of bytes for an uploaded PCAP import file.
+4
View File
@@ -47,6 +47,10 @@ strelka_backend:
- {{ ULIMIT.name }}={{ ULIMIT.soft }}:{{ ULIMIT.hard }}
{% endfor %}
{% endif %}
# Intentionally `on-failure` (not unless-stopped) -- strelka backend shuts
# down cleanly during rule reloads and we do not want those clean exits to
# trigger an auto-restart. Do not homogenize; see the container
# auto-restart section of the plan.
- restart_policy: on-failure
- watch:
- file: strelkasensorcompiledrules
+1
View File
@@ -15,6 +15,7 @@ include:
strelka_coordinator:
docker_container.running:
- image: {{ GLOBALS.registry_host }}:5000/{{ GLOBALS.image_repo }}/so-redis:{{ GLOBALS.so_version }}
- restart_policy: unless-stopped
- name: so-strelka-coordinator
- networks:
- sobridge:
+1 -1
View File
@@ -261,7 +261,7 @@ strelka:
priority: 5
options:
limit: 1000
'ScanLNK':
'ScanLnk':
- positive:
flavors:
- 'lnk_file'

Some files were not shown because too many files have changed in this diff Show More