Compare commits

...

55 Commits

Author SHA1 Message Date
Mike Reeves 2131e7d450 Merge pull request #15937 from Security-Onion-Solutions/hotfix/3.1.0
Hotfix/3.1.0
2026-05-28 10:20:53 -04:00
Mike Reeves 2a2d853ac4 Merge pull request #15936 from Security-Onion-Solutions/hotfix310
3.1.0 hotfix
2026-05-28 09:53:00 -04:00
Mike Reeves 5abd6de4b5 3.1.0 hotfix 2026-05-28 09:34:17 -04:00
Jorge Reyes 5599cce22c Merge pull request #15934 from Security-Onion-Solutions/reyesj2-patch-1
keep logstash lumberjack pipeline name update unified
2026-05-27 13:37:41 -05:00
reyesj2 b2a82fec29 fix_logstash_0013_lumberjack_pipeline_name
Before removing from apply_hotfix function first verify that older installs < 3.1.0 are still upgradable when referencing 'so/0013_input_lumberjack_fleet.conf' via pillar. Failure to do so will prevent logstash from starting
2026-05-27 13:24:23 -05:00
reyesj2 613eca52fc update hotfix date 2026-05-27 13:24:10 -05:00
reyesj2 bf609a112e LF 2026-05-27 12:21:44 -05:00
reyesj2 0b4a4de609 always run logstash pipeline rename 2026-05-27 12:21:22 -05:00
Jorge Reyes ad376d2a43 Merge pull request #15930 from Security-Onion-Solutions/reyesj2-patch-1
check for stale logstash pipeline name in local pillar
2026-05-27 10:16:39 -05:00
reyesj2 0834998cca usuable for next soup 2026-05-27 09:52:29 -05:00
reyesj2 473f93f0ee check for stale logstash pipeline name in pillars 2026-05-27 09:33:15 -05:00
Jorge Reyes 7cc2e045fb Merge pull request #15925 from Security-Onion-Solutions/reyesj2/soup-heavynode
use multiple or combined input
2026-05-26 08:34:33 -05:00
Mike Reeves 6955ee73bf Merge pull request #15924 from Security-Onion-Solutions/TOoSmOotH-patch-3
Add version number to HOTFIX file
2026-05-26 09:28:41 -04:00
Mike Reeves c0272ddb81 Add version number to HOTFIX file 2026-05-26 09:24:10 -04:00
reyesj2 d72219c586 use multiple or combined input 2026-05-22 20:04:21 -05:00
Mike Reeves c1d187599b Merge pull request #15912 from Security-Onion-Solutions/3/dev
3.1.0
2026-05-21 15:41:50 -04:00
Mike Reeves d87313db27 Merge pull request #15911 from Security-Onion-Solutions/3.1.0
3.1.0
2026-05-21 13:50:23 -04:00
Mike Reeves 141a61f5b5 3.1.0 2026-05-21 13:47:03 -04:00
Jorge Reyes 901cbf03e4 Merge pull request #15907 from Security-Onion-Solutions/reyesj2/es-verify-compat
Verify compatibility for all ES nodes in the cluster
2026-05-20 14:16:41 -05:00
reyesj2 b485be4602 separate salt-key command from main es version compatiblity loop 2026-05-20 14:12:58 -05:00
reyesj2 7d13007aa9 block soup if all ES nodes are not online and reporting their ES version for compatibility check 2026-05-20 10:03:37 -05:00
reyesj2 d7a1b67095 use pipefail on heavynode versino command to pass through error 2026-05-20 09:16:57 -05:00
reyesj2 6c8997b28a verify all heavynodes and all searchnodes are at compatible ES version before attempting an elasticsearch upgrade 2026-05-19 22:27:31 -05:00
Jorge Reyes 58f1d08ebe Merge pull request #15902 from Security-Onion-Solutions/reyesj2/ea-fleet-sync
sync elastic agent packages to fleet nodes
2026-05-19 11:08:48 -05:00
reyesj2 d0aa33a255 sync elastic agent packages to fleet nodes 2026-05-19 10:50:17 -05:00
Jorge Reyes 74b50f6009 Merge pull request #15899 from Security-Onion-Solutions/revert-15895-reyesj2/agentinstall
Revert "use -verify flag during grid agent install to ensure agent health"
2026-05-16 10:01:58 -05:00
Jorge Reyes e89c820b65 Revert "use -verify flag during grid agent install to ensure agent health" 2026-05-16 09:59:14 -05:00
Jorge Reyes 9ac05a6ad1 Merge pull request #15895 from Security-Onion-Solutions/reyesj2/agentinstall
use -verify flag during grid agent install to ensure agent health
2026-05-15 12:58:09 -05:00
Jason Ertel 24ee3318bc Merge pull request #15898 from Security-Onion-Solutions/jertel/logcheck
exclude fps
2026-05-15 11:38:20 -04:00
Jason Ertel ce566ba174 exclude fps 2026-05-15 11:36:46 -04:00
Mike Reeves 2635a60a8c Merge pull request #15896 from Security-Onion-Solutions/quickfixes2
Make so-postgres-backup fail-safe against silent corruption
2026-05-15 09:32:15 -04:00
Mike Reeves 244a73b7a2 Make so-postgres-backup fail-safe against silent corruption
The dump pipeline returned gzip's exit status, so a pg_dumpall that
died mid-stream still produced a valid .gz holding a truncated dump,
written straight to the final filename. The idempotency check then
blocked retries for the day and the corrupt file counted toward
retention, evicting a good backup each day until none remained.

- set -o pipefail so a failed pg_dumpall fails the pipeline
- dump to a .tmp file and atomically rename only after success, so
  the final filename appears only for a complete backup
- gzip -t integrity check before publishing
- trap-based cleanup of the temp file; sweep stale temps at startup
- run retention only after a successful backup, with a glob
  restricted to finished backups
- log timestamped OK/ERROR outcomes to /opt/so/log/postgres/backup.log
2026-05-15 08:48:54 -04:00
Mike Reeves 1189621ec5 Merge pull request #15893 from Security-Onion-Solutions/quickfixes2 2026-05-14 18:21:30 -04:00
reyesj2 d2524a593f use -verify flag during grid agent install to ensure agent health 2026-05-14 17:12:02 -05:00
Josh Brower f2ab2354fd Merge pull request #15894 from Security-Onion-Solutions/3/nginx-fix
Tweak for nginx upgrade
2026-05-14 23:20:57 +02:00
Mike Reeves 64731c73ba Fix psql :var substitution in telegraf role and retention SQL
psql does not substitute :var references inside dollar-quoted strings,
so the DO blocks in the user and retention subcommands were receiving
literal colons and failing (silently for user, via hide_output: True).
Rewrite the conditional CREATE/ALTER ROLE with SELECT format(...) \\gexec
and guard the retention UPDATE with \\gset + \\if.
2026-05-14 17:17:49 -04:00
Josh Brower 024fece607 Tweak for nginx upgrade 2026-05-14 17:08:57 -04:00
Mike Reeves 249b126312 Quote telegraf role env vars to survive YAML-special chars in passwords 2026-05-14 17:08:51 -04:00
Mike Reeves 8e38bff0c3 Rename telegraf_postgres.sh to so-telegraf-postgres 2026-05-14 16:55:53 -04:00
Mike Reeves b9f2d56932 Consolidate telegraf postgres SQL into multi-mode script
Replace inline psql heredocs in telegraf_users.sls with subcommand
dispatcher telegraf_postgres.sh: create_db, group_role, user, retention.
2026-05-14 16:37:08 -04:00
Mike Reeves 03fa01a705 Move telegraf_role.sh to postgres tools/sbin 2026-05-14 16:18:01 -04:00
Mike Reeves 450eacca41 Move telegraf role provisioning to external script with env vars 2026-05-14 16:15:54 -04:00
Mike Reeves b7a13899f7 Suppress output logging for postgres telegraf role provisioning 2026-05-14 15:56:04 -04:00
Mike Reeves 6f273d7d97 Rename init-users.sh to init-db.sh and update all references 2026-05-14 15:53:00 -04:00
Josh Brower b328820c01 Merge pull request #15792 from Security-Onion-Solutions/3/strelkalnk
Fix module name
2026-05-14 13:06:26 +02:00
Jorge Reyes 638aca97c8 Merge pull request #15877 from Security-Onion-Solutions/reyesj2-patch-1
update redis index template
2026-05-13 13:44:04 -05:00
Jorge Reyes 74a5c895e8 Merge pull request #15889 from Security-Onion-Solutions/reyesj2/zeek-ja4d
add zeek.ja4d ingest pipeline
2026-05-13 13:43:56 -05:00
reyesj2 d56bf01823 add zeek.ja4d ingest pipeline 2026-05-13 12:32:54 -05:00
Mike Reeves d29267d9c2 Merge pull request #15888 from Security-Onion-Solutions/TOoSmOotH-patch-1
Change Telegraf output from BOTH to INFLUXDB
2026-05-13 12:47:55 -04:00
Mike Reeves 72327285b2 Change Telegraf output from BOTH to INFLUXDB 2026-05-13 11:58:21 -04:00
Josh Patterson cc7a237457 Merge pull request #15887 from Security-Onion-Solutions/m0duspwnens-patch-1
remove stig from hypervisor and managerhype
2026-05-13 10:57:58 -04:00
Josh Patterson b068ad2b35 remove stig from hypervisor and managerhype 2026-05-13 10:53:11 -04:00
Jorge Reyes 4a2177c827 update redis index template
missing redis integration component templates
2026-05-11 16:15:56 -05:00
Josh Brower affede7f0a Rename 'ScanLNK' to 'ScanLnk' in YAML config 2026-04-20 10:01:10 -04:00
Josh Brower 97366c0496 Rename 'ScanLNK' to 'ScanLnk' in defaults.yaml 2026-04-20 10:00:29 -04:00
20 changed files with 515 additions and 106 deletions
+11 -11
View File
@@ -1,17 +1,17 @@
### 3.0.0-20260331 ISO image released on 2026/03/31
### 3.1.0-20260528 ISO image released on 2026/05/28
### Download and Verify
3.0.0-20260331 ISO image:
https://download.securityonion.net/file/securityonion/securityonion-3.0.0-20260331.iso
3.1.0-20260528 ISO image:
https://download.securityonion.net/file/securityonion/securityonion-3.1.0-20260528.iso
MD5: ECD318A1662A6FDE0EF213F5A9BD4B07
SHA1: E55BE314440CCF3392DC0B06BC5E270B43176D9C
SHA256: 7FC47405E335CBE5C2B6C51FE7AC60248F35CBE504907B8B5A33822B23F8F4D5
MD5: 9D6FF58DEEE24089D722C73169765B3E
SHA1: 2B8B816B6CEC3B7F96B3C5E040EBF502DD2C412F
SHA256: 62FAB57E247C843D6A04F0796D8162C732B65D82FC3E4A59D087135B9FD32912
Signature for ISO image:
https://github.com/Security-Onion-Solutions/securityonion/raw/3/main/sigs/securityonion-3.0.0-20260331.iso.sig
https://github.com/Security-Onion-Solutions/securityonion/raw/3/main/sigs/securityonion-3.1.0-20260528.iso.sig
Signing key:
https://raw.githubusercontent.com/Security-Onion-Solutions/securityonion/3/main/KEYS
@@ -25,22 +25,22 @@ wget https://raw.githubusercontent.com/Security-Onion-Solutions/securityonion/3/
Download the signature file for the ISO:
```
wget https://github.com/Security-Onion-Solutions/securityonion/raw/3/main/sigs/securityonion-3.0.0-20260331.iso.sig
wget https://github.com/Security-Onion-Solutions/securityonion/raw/3/main/sigs/securityonion-3.1.0-20260528.iso.sig
```
Download the ISO image:
```
wget https://download.securityonion.net/file/securityonion/securityonion-3.0.0-20260331.iso
wget https://download.securityonion.net/file/securityonion/securityonion-3.1.0-20260528.iso
```
Verify the downloaded ISO image using the signature file:
```
gpg --verify securityonion-3.0.0-20260331.iso.sig securityonion-3.0.0-20260331.iso
gpg --verify securityonion-3.1.0-20260528.iso.sig securityonion-3.1.0-20260528.iso
```
The output should show "Good signature" and the Primary key fingerprint should match what's shown below:
```
gpg: Signature made Mon 30 Mar 2026 06:22:14 PM EDT using RSA key ID FE507013
gpg: Signature made Wed 27 May 2026 03:03:59 PM EDT using RSA key ID FE507013
gpg: Good signature from "Security Onion Solutions, LLC <info@securityonionsolutions.com>"
gpg: WARNING: This key is not certified with a trusted signature!
gpg: There is no indication that the signature belongs to the owner.
+1
View File
@@ -0,0 +1 @@
20260528
+2
View File
@@ -165,6 +165,8 @@ if [[ $EXCLUDE_FALSE_POSITIVE_ERRORS == 'Y' ]]; then
EXCLUDED_ERRORS="$EXCLUDED_ERRORS|upgrading component template" # false positive (elasticsearch index or template names contain 'error')
EXCLUDED_ERRORS="$EXCLUDED_ERRORS|upgrading composable template" # false positive (elasticsearch composable template names contain 'error')
EXCLUDED_ERRORS="$EXCLUDED_ERRORS|Error while parsing document for index \[.ds-logs-kratos-so-.*object mapping for \[file\]" # false positive (mapping error occuring BEFORE kratos index has rolled over in 2.4.210)
EXCLUDED_ERRORS="$EXCLUDED_ERRORS|No such container" # false positive (telegraf trying to run stats on an old container)
EXCLUDED_ERRORS="$EXCLUDED_ERRORS|passwords do not match" # false positive (automated hydra test)
fi
if [[ $EXCLUDE_KNOWN_ERRORS == 'Y' ]]; then
+2
View File
@@ -26,7 +26,9 @@ include:
wait_for_elasticsearch_elasticfleet:
cmd.run:
- name: so-elasticsearch-wait
{% endif %}
{% if GLOBALS.role == "so-fleet" %}
# Sync Elastic Agent artifacts to Fleet Node
elasticagent_syncartifacts:
file.recurse:
+4 -1
View File
@@ -3958,10 +3958,13 @@ elasticsearch:
- vulnerability-mappings
- common-settings
- common-dynamic-mappings
- logs-redis.log@package
- logs-redis.log@custom
data_stream:
allow_custom_routing: false
hidden: false
ignore_missing_component_templates: []
ignore_missing_component_templates:
- logs-redis.log@custom
index_patterns:
- logs-redis.log*
priority: 501
+71
View File
@@ -0,0 +1,71 @@
{
"description": "zeek.ja4d",
"processors": [
{
"set": {
"field": "event.dataset",
"value": "ja4d"
}
},
{
"remove": {
"field": [
"host"
],
"ignore_failure": true
}
},
{
"json": {
"field": "message",
"target_field": "message2",
"ignore_failure": true
}
},
{
"rename": {
"field": "message2.ja4d",
"target_field": "hash.ja4d",
"ignore_missing": true,
"if": "ctx?.message2?.ja4d != null && ctx.message2.ja4d.length() > 0"
}
},
{
"rename": {
"field": "message2.client_mac",
"target_field": "host.mac",
"ignore_missing": true,
"if": "ctx?.message2?.client_mac != null && ctx.message2.client_mac.length() > 0"
}
},
{
"rename": {
"field": "message2.hostname",
"target_field": "host.hostname",
"ignore_missing": true,
"if": "ctx?.message2?.hostname != null && ctx.message2.hostname.length() > 0"
}
},
{
"rename": {
"field": "message2.requested_ip",
"target_field": "dhcp.requested_address",
"ignore_missing": true,
"if": "ctx?.message2?.requested_ip != null && ctx.message2.requested_ip.length() > 0"
}
},
{
"rename": {
"field": "message2.vendor_class_id",
"target_field": "zeek.ja4d.vendor_class_id",
"ignore_missing": true,
"if": "ctx?.message2?.vendor_class_id != null && ctx.message2.vendor_class_id.length() > 0"
}
},
{
"pipeline": {
"name": "zeek.common"
}
}
]
}
+218 -6
View File
@@ -533,6 +533,23 @@ elasticfleet_set_agent_logging_level_warn() {
done <<< "$policies_to_update"
}
update_logstash_pipeline_name() {
local original_pipeline_name="$1"
local new_pipeline_name="$2"
echo "Checking for conflicting logstash defined_pipelines pillar value."
local LOGSTASH_FILE=/opt/so/saltstack/local/pillar/logstash/soc_logstash.sls
local MINIONDIR=/opt/so/saltstack/local/pillar/minions
for pillar_file in "$LOGSTASH_FILE" "$MINIONDIR"/*.sls; do
[[ -f "$pillar_file" ]] || continue
if grep -q "$original_pipeline_name$" "$pillar_file"; then
echo "Found conflicting defined_pipeline pillar value in $pillar_file. Updating to use the new logstash pipeline name."
sed -i "s#$original_pipeline_name\$#$new_pipeline_name#g" "$pillar_file"
chown socore:socore "$pillar_file"
fi
done
}
check_transform_health_and_reauthorize() {
. /usr/sbin/so-elastic-fleet-common
@@ -676,6 +693,10 @@ rename_strelka_scan_lnk() {
rm -f "$TMP_VALUE_FILE"
}
fix_logstash_0013_lumberjack_pipeline_name() {
update_logstash_pipeline_name "so/0013_input_lumberjack_fleet.conf" "so/0013_input_lumberjack_fleet.conf.jinja"
}
up_to_3.1.0() {
ensure_postgres_local_pillar
ensure_postgres_secret
@@ -684,6 +705,7 @@ up_to_3.1.0() {
# Clear existing component template state file.
rm -f /opt/so/state/esfleet_component_templates.json
rename_strelka_scan_lnk
fix_logstash_0013_lumberjack_pipeline_name
INSTALLEDVERSION=3.1.0
}
@@ -971,6 +993,9 @@ verify_es_version_compatibility() {
local is_active_intermediate_upgrade=1
# supported upgrade paths for SO-ES versions
declare -A es_upgrade_map=(
["8.18.4"]="8.18.6 8.18.8 9.0.8"
["8.18.6"]="8.18.8 9.0.8"
["8.18.8"]="9.0.8"
["9.0.8"]="9.3.3"
)
@@ -994,6 +1019,171 @@ verify_es_version_compatibility() {
exit 160
fi
compatible_es_versions="$target_es_version"
for current_version in "${!es_upgrade_map[@]}"; do
# shellcheck disable=SC2076
if [[ " ${es_upgrade_map[$current_version]} " =~ " $target_es_version " ]]; then
compatible_es_versions+=" $current_version"
fi
done
# Check if the given ES version can directly upgrade to the target ES version. Used to assist with catching lagging nodes during the upgrade process
es_version_can_upgrade_to_target() {
local current_version="$1"
# shellcheck disable=SC2076
if [[ -n "$current_version" && " $compatible_es_versions " =~ " $current_version " ]]; then
return 0
fi
return 1
}
# Gather Elasticsearch cluster version info and verify that each node in the cluster is running a version compatible with the target ES version.
verify_searchnodes_es_target_compatibility() {
local retries=20
local retry_count=0
local delay=180
local expected_es_nodes searchnode_minions attempt
local searchnode_discovery_success=false
SEARCHNODE_ES_VERSIONS=""
for attempt in {1..3}; do
if searchnode_minions=$(set -o pipefail; salt-key --out=json --list=accepted 2> /dev/null | jq -r '.minions[]? | select(endswith("searchnode"))'); then
searchnode_discovery_success=true
break
fi
echo "Failed to retrieve grid searchnodes via salt-key... Retrying in 30 seconds. Attempt $attempt of 3."
sleep 30
done
if [[ "$searchnode_discovery_success" != "true" ]]; then
echo "Failed to retrieve grid searchnodes via salt-key."
return 1
fi
# Always add node running soup to expected es nodes
expected_es_nodes="${MINIONID%_*}"
while IFS= read -r searchnode_minion; do
[[ -z "$searchnode_minion" ]] && continue
expected_es_nodes+=$'\n'"${searchnode_minion%_searchnode}"
done <<< "$searchnode_minions"
while [[ $retry_count -lt $retries ]]; do
SEARCHNODE_ES_VERSIONS=$(so-elasticsearch-query _nodes/_all/version --retry 5 --retry-delay 10 --fail 2>&1)
local exit_status=$?
if [[ $exit_status -ne 0 ]]; then
echo "Failed to retrieve Elasticsearch versions from searchnodes... Retrying in $delay seconds. Attempt $((retry_count + 1)) of $retries."
((retry_count++))
sleep $delay
continue
fi
local all_searchnodes_compatible=true
while IFS=$'\t' read -r node current_version; do
[[ -z "$node" ]] && continue
if ! es_version_can_upgrade_to_target "$current_version"; then
echo "Searchnode $node is running Elasticsearch $current_version, which is not directly upgradable to Elasticsearch $target_es_version."
all_searchnodes_compatible=false
fi
done < <(echo "$SEARCHNODE_ES_VERSIONS" | jq -r '.nodes | to_entries[] | [.value.name, .value.version] | @tsv')
while IFS= read -r expected_es_node; do
[[ -z "$expected_es_node" ]] && continue
if ! echo "$SEARCHNODE_ES_VERSIONS" | jq -e --arg node "$expected_es_node" '.nodes | to_entries | any(.value.name == $node)' > /dev/null; then
echo "Searchnode $expected_es_node did not report an Elasticsearch version. It may be offline or still upgrading."
all_searchnodes_compatible=false
fi
done <<< "$expected_es_nodes"
if [[ "$all_searchnodes_compatible" == true ]]; then
echo "All Searchnodes are upgradable to Elasticsearch $target_es_version."
return 0
fi
echo "One or more Searchnodes cannot upgrade directly to Elasticsearch $target_es_version. Rechecking in $delay seconds. Attempt $((retry_count + 1)) of $retries."
((retry_count++))
sleep $delay
done
return 1
}
# Gather heavynode version info and verify that each node is running a version compatible with the target ES version.
verify_heavynodes_es_target_compatibility() {
local heavynode_minions attempt
local retries=20
local retry_count=0
local delay=180
local heavynode_discovery_success=false
HEAVYNODE_ES_VERSIONS=""
for attempt in {1..3}; do
if heavynode_minions=$(set -o pipefail; salt-key --out=json --list=accepted 2> /dev/null | jq -r '.minions[]? | select(endswith("heavynode"))'); then
heavynode_discovery_success=true
break
fi
echo "Failed to retrieve grid heavynodes via salt-key... Retrying in 30 seconds. Attempt $attempt of 3."
sleep 30
done
if [[ "$heavynode_discovery_success" != "true" ]]; then
echo "Failed to retrieve grid heavynodes via salt-key."
return 1
fi
if [[ -z "$heavynode_minions" ]]; then
echo "No heavynodes detected. Skipping heavynode Elasticsearch version compatibility check."
return 0
fi
while [[ $retry_count -lt $retries ]]; do
HEAVYNODE_ES_VERSIONS=$(salt -C 'G@role:so-heavynode' cmd.run 'set -o pipefail; so-elasticsearch-query / --retry 5 --retry-delay 10 | jq -er ".version.number"' shell=/bin/bash --out=json 2> /dev/null)
local exit_status=$?
if [[ $exit_status -ne 0 ]]; then
echo "Failed to retrieve Elasticsearch version from one or more heavynodes... Retrying in $delay seconds. Attempt $((retry_count + 1)) of $retries."
((retry_count++))
sleep $delay
continue
fi
local all_heavynodes_compatible=true
while IFS=$'\t' read -r node current_version; do
[[ -z "$node" ]] && continue
if ! es_version_can_upgrade_to_target "$current_version"; then
echo "Heavynode $node is running Elasticsearch $current_version, which is not directly upgradable to Elasticsearch $target_es_version."
all_heavynodes_compatible=false
fi
done < <(echo "$HEAVYNODE_ES_VERSIONS" | jq -r 'to_entries[] | [.key, .value] | @tsv')
while IFS= read -r heavynode_minion; do
[[ -z "$heavynode_minion" ]] && continue
if ! echo "$HEAVYNODE_ES_VERSIONS" | jq -se --arg minion "$heavynode_minion" 'add | has($minion)' > /dev/null; then
echo "Heavynode $heavynode_minion did not report an Elasticsearch version. It may be offline or still upgrading."
all_heavynodes_compatible=false
fi
done <<< "$heavynode_minions"
if [[ "$all_heavynodes_compatible" == true ]]; then
echo -e "\nAll heavynodes can upgrade to Elasticsearch $target_es_version."
return 0
fi
echo "One or more heavynodes cannot upgrade directly to Elasticsearch $target_es_version. Rechecking in $delay seconds. Attempt $((retry_count + 1)) of $retries."
((retry_count++))
sleep $delay
done
return 1
}
if [[ ! -f "$es_verification_script" ]]; then
create_intermediate_upgrade_verification_script "$es_verification_script"
fi
for statefile in "${es_required_version_statefile_base}"-*; do
[[ -f $statefile ]] || continue
@@ -1012,10 +1202,6 @@ verify_es_version_compatibility() {
continue
fi
if [[ ! -f "$es_verification_script" ]]; then
create_intermediate_upgrade_verification_script "$es_verification_script"
fi
echo -e "\n##############################################################################################################################\n"
echo "A previously required intermediate Elasticsearch upgrade was detected. Verifying that all Searchnodes/Heavynodes have successfully upgraded Elasticsearch to $es_required_version_statefile_value before proceeding with soup to avoid potential data loss! This command can take up to an hour to complete."
if ! timeout --foreground 4000 bash "$es_verification_script" "$es_required_version_statefile_value" "$statefile"; then
@@ -1037,6 +1223,26 @@ verify_es_version_compatibility() {
# shellcheck disable=SC2076 # Do not want a regex here eg usage " 8.18.8 9.0.8 " =~ " 9.0.8 "
if [[ " ${es_upgrade_map[$es_version]} " =~ " $target_es_version " || "$es_version" == "$target_es_version" ]]; then
if ! verify_searchnodes_es_target_compatibility || ! verify_heavynodes_es_target_compatibility; then
echo -e "\n!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!\n"
echo "One or more Searchnode(s)/Heavynode(s) cannot upgrade directly to Elasticsearch $target_es_version. This can happen with soups that include Elasticsearch upgrades being run in quick succession. Typically, this will resolve itself as the grid synchronizes. Please allow time for all Searchnodes/Heavynodes to have upgraded Elasticsearch to a compatible version with $target_es_version before running soup again to avoid potential data loss!"
if [[ -n "$HEAVYNODE_ES_VERSIONS" ]]; then
echo "Current heavynode Elasticsearch versions:"
echo "$HEAVYNODE_ES_VERSIONS" | jq '.'
fi
if [[ -n "$SEARCHNODE_ES_VERSIONS" ]]; then
echo "Current searchnode Elasticsearch versions:"
echo "$SEARCHNODE_ES_VERSIONS" | jq '.nodes | to_entries | map({(.value.name): .value.version}) | sort | add'
fi
echo -e "\n!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!\n"
exit 161
fi
# supported upgrade
return 0
else
@@ -1322,7 +1528,13 @@ EOF
# Keeping this block in case we need to do a hotfix that requires salt update
apply_hotfix() {
echo "No actions required. ($INSTALLEDVERSION/$HOTFIXVERSION)"
if [[ "$INSTALLEDVERSION" == "3.1.0" ]] ; then
# Do not remove this fix_logstash_0013_lumberjack_pipeline_name in future hotfixes without first validating older
# installs referencing "so/0013_input_lumberjack_fleet.conf" via pillar are upgradable
fix_logstash_0013_lumberjack_pipeline_name
else
echo "No actions required. ($INSTALLEDVERSION/$HOTFIXVERSION)"
fi
}
failed_soup_restore_items() {
@@ -1394,7 +1606,7 @@ main() {
echo "Verifying we have the latest soup script."
verify_latest_update_script
echo "Verifying Elasticsearch version compatibility before upgrading."
echo "Verifying Elasticsearch version compatibility across the grid before upgrading."
verify_es_version_compatibility
echo "Let's see if we need to update Security Onion."
+2
View File
@@ -225,6 +225,7 @@ http {
limit_req zone=auth_throttle burst={{ NGINXMERGED.config.throttle_login_burst }} nodelay;
limit_req_status 429;
proxy_pass http://{{ GLOBALS.manager }}:4433;
proxy_set_header Connection "Close";
proxy_read_timeout 90;
proxy_connect_timeout 90;
proxy_set_header Host $host;
@@ -237,6 +238,7 @@ http {
location ~ ^/auth/.*?(whoami|logout|settings|errors|webauthn.js) {
rewrite /auth/(.*) /$1 break;
proxy_pass http://{{ GLOBALS.manager }}:4433;
proxy_set_header Connection "Close";
proxy_read_timeout 90;
proxy_connect_timeout 90;
proxy_set_header Host $host;
+3 -3
View File
@@ -46,10 +46,10 @@ postgresinitdir:
- require:
- file: postgresconfdir
postgresinitusers:
postgresinitdb:
file.managed:
- name: /opt/so/conf/postgres/init/init-users.sh
- source: salt://postgres/files/init-users.sh
- name: /opt/so/conf/postgres/init/init-db.sh
- source: salt://postgres/files/init-db.sh
- user: 939
- group: 939
- mode: 755
+4 -4
View File
@@ -31,7 +31,7 @@ so-postgres:
- POSTGRES_DB=securityonion
# Passwords are delivered via mounted 0600 secret files, not plaintext env vars.
# The upstream postgres image resolves POSTGRES_PASSWORD_FILE; entrypoint.sh and
# init-users.sh resolve SO_POSTGRES_PASS_FILE the same way.
# init-db.sh resolve SO_POSTGRES_PASS_FILE the same way.
- POSTGRES_PASSWORD_FILE=/run/secrets/postgres_password
- SO_POSTGRES_USER={{ SO_POSTGRES_USER }}
- SO_POSTGRES_PASS_FILE=/run/secrets/so_postgres_pass
@@ -46,7 +46,7 @@ so-postgres:
- /opt/so/conf/postgres/postgresql.conf:/conf/postgresql.conf:ro
- /opt/so/conf/postgres/pg_hba.conf:/conf/pg_hba.conf:ro
- /opt/so/conf/postgres/secrets:/run/secrets:ro
- /opt/so/conf/postgres/init/init-users.sh:/docker-entrypoint-initdb.d/init-users.sh:ro
- /opt/so/conf/postgres/init/init-db.sh:/docker-entrypoint-initdb.d/init-db.sh:ro
- /etc/pki/postgres.crt:/conf/postgres.crt:ro
- /etc/pki/postgres.key:/conf/postgres.key:ro
- /etc/pki/tls/certs/intca.crt:/conf/ca.crt:ro
@@ -70,7 +70,7 @@ so-postgres:
- watch:
- file: postgresconf
- file: postgreshba
- file: postgresinitusers
- file: postgresinitdb
- file: postgres_super_secret
- file: postgres_app_secret
- x509: postgres_crt
@@ -78,7 +78,7 @@ so-postgres:
- require:
- file: postgresconf
- file: postgreshba
- file: postgresinitusers
- file: postgresinitdb
- file: postgres_super_secret
- file: postgres_app_secret
- x509: postgres_crt
+16 -69
View File
@@ -39,17 +39,15 @@ postgres_wait_ready:
- require:
- docker_container: so-postgres
# Ensure the shared Telegraf database exists. init-users.sh only runs on a
# Ensure the shared Telegraf database exists. init-db.sh only runs on a
# fresh data dir, so hosts upgraded onto an existing /nsm/postgres volume
# would otherwise never get so_telegraf.
postgres_create_telegraf_db:
cmd.run:
- name: |
if ! docker exec so-postgres psql -U postgres -tAc "SELECT 1 FROM pg_database WHERE datname='so_telegraf'" | grep -q 1; then
docker exec so-postgres psql -v ON_ERROR_STOP=1 -U postgres -c "CREATE DATABASE so_telegraf"
fi
- name: /usr/sbin/so-telegraf-postgres create_db
- require:
- cmd: postgres_wait_ready
- file: postgres_sbin
# Provision the shared group role and schema once. Every per-minion role is a
# member of so_telegraf, and each Telegraf connection does SET ROLE so_telegraf
@@ -57,68 +55,26 @@ postgres_create_telegraf_db:
# on first write are owned by the group role and every member can INSERT/SELECT.
postgres_telegraf_group_role:
cmd.run:
- name: |
docker exec -i so-postgres psql -v ON_ERROR_STOP=1 -U postgres -d so_telegraf <<'EOSQL'
DO $$
BEGIN
IF NOT EXISTS (SELECT FROM pg_catalog.pg_roles WHERE rolname = 'so_telegraf') THEN
CREATE ROLE so_telegraf NOLOGIN;
END IF;
END
$$;
GRANT CONNECT ON DATABASE so_telegraf TO so_telegraf;
CREATE SCHEMA IF NOT EXISTS telegraf AUTHORIZATION so_telegraf;
GRANT USAGE, CREATE ON SCHEMA telegraf TO so_telegraf;
CREATE SCHEMA IF NOT EXISTS partman;
CREATE EXTENSION IF NOT EXISTS pg_partman SCHEMA partman;
CREATE EXTENSION IF NOT EXISTS pg_cron;
-- Telegraf (running as so_telegraf) calls partman.create_parent()
-- on first write of each metric, which needs USAGE on the partman
-- schema, EXECUTE on its functions/procedures, and write access to
-- partman.part_config so it can register new partitioned parents.
GRANT USAGE, CREATE ON SCHEMA partman TO so_telegraf;
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA partman TO so_telegraf;
GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA partman TO so_telegraf;
GRANT EXECUTE ON ALL PROCEDURES IN SCHEMA partman TO so_telegraf;
-- partman creates per-parent template tables (partman.template_*) at
-- runtime; default privileges extend DML/sequence access to them.
ALTER DEFAULT PRIVILEGES IN SCHEMA partman
GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO so_telegraf;
ALTER DEFAULT PRIVILEGES IN SCHEMA partman
GRANT USAGE, SELECT, UPDATE ON SEQUENCES TO so_telegraf;
-- Hourly partman maintenance. cron.schedule is idempotent by jobname.
SELECT cron.schedule(
'telegraf-partman-maintenance',
'17 * * * *',
'CALL partman.run_maintenance_proc()'
);
EOSQL
- name: /usr/sbin/so-telegraf-postgres group_role
- require:
- cmd: postgres_create_telegraf_db
- file: postgres_sbin
{% set creds = salt['pillar.get']('telegraf:postgres_creds', {}) %}
{% for mid, entry in creds.items() %}
{% if entry.get('user') and entry.get('pass') %}
{% set u = entry.user %}
{% set p = entry.pass | replace("'", "''") %}
{% set p = entry.pass %}
postgres_telegraf_role_{{ u }}:
cmd.run:
- name: |
docker exec -i so-postgres psql -v ON_ERROR_STOP=1 -U postgres -d so_telegraf <<'EOSQL'
DO $$
BEGIN
IF NOT EXISTS (SELECT FROM pg_catalog.pg_roles WHERE rolname = '{{ u }}') THEN
EXECUTE format('CREATE ROLE %I WITH LOGIN PASSWORD %L', '{{ u }}', '{{ p }}');
ELSE
EXECUTE format('ALTER ROLE %I WITH PASSWORD %L', '{{ u }}', '{{ p }}');
END IF;
END
$$;
GRANT CONNECT ON DATABASE so_telegraf TO "{{ u }}";
GRANT so_telegraf TO "{{ u }}";
EOSQL
- name: /usr/sbin/so-telegraf-postgres user
- env:
- ROLE_USER: {{ u | tojson }}
- ROLE_PASS: {{ p | tojson }}
- hide_output: True
- require:
- file: postgres_sbin
- cmd: postgres_telegraf_group_role
{% endif %}
@@ -130,21 +86,12 @@ postgres_telegraf_role_{{ u }}:
{% set retention = salt['pillar.get']('postgres:telegraf:retention_days', 14) | int %}
postgres_telegraf_retention_reconcile:
cmd.run:
- name: |
docker exec -i so-postgres psql -v ON_ERROR_STOP=1 -U postgres -d so_telegraf <<'EOSQL'
DO $$
BEGIN
IF EXISTS (SELECT 1 FROM pg_catalog.pg_extension WHERE extname = 'pg_partman') THEN
UPDATE partman.part_config
SET retention = '{{ retention }} days',
retention_keep_table = false
WHERE parent_table LIKE 'telegraf.%';
END IF;
END
$$;
EOSQL
- name: /usr/sbin/so-telegraf-postgres retention
- env:
- RETENTION_DAYS: {{ retention }}
- require:
- cmd: postgres_telegraf_group_role
- file: postgres_sbin
{% endif %}
+41 -7
View File
@@ -7,15 +7,29 @@
. /usr/sbin/so-common
# Without pipefail, a pipeline's exit status is gzip's. A failed pg_dumpall would
# otherwise be masked by a successful gzip, silently producing a valid .gz that
# holds a truncated dump.
set -o pipefail
# Backups contain role password hashes and full chat data; keep them 0600.
umask 0077
TODAY=$(date '+%Y_%m_%d')
BACKUPDIR=/nsm/backup
BACKUPFILE="$BACKUPDIR/so-postgres-backup-$TODAY.sql.gz"
TMPFILE="$BACKUPFILE.tmp"
MAXBACKUPS=7
LOGFILE=/opt/so/log/postgres/backup.log
mkdir -p $BACKUPDIR
log() {
echo "$(date '+%Y-%m-%d %H:%M:%S') $*" >> "$LOGFILE"
}
mkdir -p "$BACKUPDIR"
# Remove any temp files left behind by a previously crashed run
rm -f "$BACKUPDIR"/so-postgres-backup-*.sql.gz.tmp
# Skip if already backed up today
if [ -f "$BACKUPFILE" ]; then
@@ -27,13 +41,33 @@ if ! docker ps --format '{{.Names}}' | grep -q '^so-postgres$'; then
exit 0
fi
# Dump all databases and roles, compress
docker exec so-postgres pg_dumpall -U postgres | gzip > "$BACKUPFILE"
# Always clean up the temp file on exit; the success path clears this trap
# after the atomic rename so the finished backup is not deleted.
trap 'rm -f "$TMPFILE"' EXIT
# Retention cleanup
NUMBACKUPS=$(find $BACKUPDIR -type f -name "so-postgres-backup*" | wc -l)
# Dump all databases and roles, compress. Write to a temp file so the final
# filename only ever appears for a complete, verified backup.
if ! docker exec so-postgres pg_dumpall -U postgres | gzip > "$TMPFILE"; then
log "ERROR: pg_dumpall/gzip failed; backup aborted"
exit 1
fi
# Verify the compressed stream is intact before publishing it
if ! gzip -t "$TMPFILE"; then
log "ERROR: backup failed gzip integrity check; backup aborted"
exit 1
fi
# Atomically publish the verified backup
mv "$TMPFILE" "$BACKUPFILE"
trap - EXIT
log "OK: wrote $BACKUPFILE"
# Retention cleanup (only reached after a successful backup). The glob is
# restricted to finished backups so an in-progress .tmp can never be counted.
NUMBACKUPS=$(find "$BACKUPDIR" -type f -name "so-postgres-backup-*.sql.gz" | wc -l)
while [ "$NUMBACKUPS" -gt "$MAXBACKUPS" ]; do
OLDEST=$(find $BACKUPDIR -type f -name "so-postgres-backup*" -printf '%T+ %p\n' | sort | head -n 1 | awk -F" " '{print $2}')
OLDEST=$(find "$BACKUPDIR" -type f -name "so-postgres-backup-*.sql.gz" -printf '%T+ %p\n' | sort | head -n 1 | awk -F" " '{print $2}')
rm -f "$OLDEST"
NUMBACKUPS=$(find $BACKUPDIR -type f -name "so-postgres-backup*" | wc -l)
NUMBACKUPS=$(find "$BACKUPDIR" -type f -name "so-postgres-backup-*.sql.gz" | wc -l)
done
@@ -0,0 +1,110 @@
#!/bin/bash
set -e
# Provision Telegraf state inside the so-postgres container.
# Usage: so-telegraf-postgres <subcommand>
# create_db Ensure the so_telegraf database exists.
# group_role Provision the so_telegraf group role, telegraf/partman schemas,
# pg_partman, pg_cron, and the hourly partman maintenance job.
# user Create or update a per-minion login role granted to so_telegraf.
# Env: ROLE_USER, ROLE_PASS.
# retention Reconcile partman retention on telegraf parents.
# Env: RETENTION_DAYS.
cmd="${1:?subcommand required}"
case "$cmd" in
create_db)
if ! docker exec so-postgres psql -U postgres -tAc \
"SELECT 1 FROM pg_database WHERE datname='so_telegraf'" | grep -q 1; then
docker exec so-postgres psql -v ON_ERROR_STOP=1 -U postgres \
-c "CREATE DATABASE so_telegraf"
fi
;;
group_role)
docker exec -i so-postgres psql -v ON_ERROR_STOP=1 -U postgres -d so_telegraf <<'EOSQL'
DO $$
BEGIN
IF NOT EXISTS (SELECT FROM pg_catalog.pg_roles WHERE rolname = 'so_telegraf') THEN
CREATE ROLE so_telegraf NOLOGIN;
END IF;
END
$$;
GRANT CONNECT ON DATABASE so_telegraf TO so_telegraf;
CREATE SCHEMA IF NOT EXISTS telegraf AUTHORIZATION so_telegraf;
GRANT USAGE, CREATE ON SCHEMA telegraf TO so_telegraf;
CREATE SCHEMA IF NOT EXISTS partman;
CREATE EXTENSION IF NOT EXISTS pg_partman SCHEMA partman;
CREATE EXTENSION IF NOT EXISTS pg_cron;
-- Telegraf (running as so_telegraf) calls partman.create_parent()
-- on first write of each metric, which needs USAGE on the partman
-- schema, EXECUTE on its functions/procedures, and write access to
-- partman.part_config so it can register new partitioned parents.
GRANT USAGE, CREATE ON SCHEMA partman TO so_telegraf;
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA partman TO so_telegraf;
GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA partman TO so_telegraf;
GRANT EXECUTE ON ALL PROCEDURES IN SCHEMA partman TO so_telegraf;
-- partman creates per-parent template tables (partman.template_*) at
-- runtime; default privileges extend DML/sequence access to them.
ALTER DEFAULT PRIVILEGES IN SCHEMA partman
GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO so_telegraf;
ALTER DEFAULT PRIVILEGES IN SCHEMA partman
GRANT USAGE, SELECT, UPDATE ON SEQUENCES TO so_telegraf;
-- Hourly partman maintenance. cron.schedule is idempotent by jobname.
SELECT cron.schedule(
'telegraf-partman-maintenance',
'17 * * * *',
'CALL partman.run_maintenance_proc()'
);
EOSQL
;;
user)
: "${ROLE_USER:?ROLE_USER is required}"
: "${ROLE_PASS:?ROLE_PASS is required}"
# psql does not substitute :vars inside dollar-quoted strings, so the
# conditional CREATE/ALTER is built outside any DO block and dispatched
# with \gexec. format() handles identifier/literal quoting.
docker exec -i so-postgres psql \
-v ON_ERROR_STOP=1 \
-v role_user="$ROLE_USER" \
-v role_pass="$ROLE_PASS" \
-U postgres -d so_telegraf <<'EOSQL'
SELECT format(
CASE WHEN EXISTS (SELECT FROM pg_catalog.pg_roles WHERE rolname = :'role_user')
THEN 'ALTER ROLE %I WITH LOGIN PASSWORD %L'
ELSE 'CREATE ROLE %I WITH LOGIN PASSWORD %L'
END,
:'role_user',
:'role_pass'
) \gexec
GRANT CONNECT ON DATABASE so_telegraf TO :"role_user";
GRANT so_telegraf TO :"role_user";
EOSQL
;;
retention)
: "${RETENTION_DAYS:?RETENTION_DAYS is required}"
# \gset + \if guards against a missing pg_partman without using a DO
# block (psql :var substitution doesn't reach into dollar-quoted code).
docker exec -i so-postgres psql \
-v ON_ERROR_STOP=1 \
-v retention_days="$RETENTION_DAYS" \
-U postgres -d so_telegraf <<'EOSQL'
SELECT CASE WHEN EXISTS (SELECT 1 FROM pg_catalog.pg_extension WHERE extname = 'pg_partman')
THEN 'true' ELSE 'false' END AS has_partman \gset
\if :has_partman
UPDATE partman.part_config
SET retention = :'retention_days' || ' days',
retention_keep_table = false
WHERE parent_table LIKE 'telegraf.%';
\endif
EOSQL
;;
*)
echo "Unknown subcommand: $cmd" >&2
exit 1
;;
esac
+1 -1
View File
@@ -261,7 +261,7 @@ strelka:
priority: 5
options:
limit: 1000
'ScanLNK':
'ScanLnk':
- positive:
flavors:
- 'lnk_file'
+1 -1
View File
@@ -99,7 +99,7 @@ strelka:
'ScanJpeg': *scannerOptions
'ScanJson': *scannerOptions
'ScanLibarchive': *scannerOptions
'ScanLNK': *scannerOptions
'ScanLnk': *scannerOptions
'ScanLsb': *scannerOptions
'ScanLzma': *scannerOptions
'ScanMacho': *scannerOptions
+1 -1
View File
@@ -1,6 +1,6 @@
telegraf:
enabled: False
output: BOTH
output: INFLUXDB
config:
interval: '30s'
metric_batch_size: 1000
+27 -2
View File
@@ -119,7 +119,7 @@ base:
- kafka
- pcap.cleanup
'*_manager or *_managerhype and G@saltversion:{{saltversion}} and not I@node_data:False':
'*_manager and G@saltversion:{{saltversion}} and not I@node_data:False':
- match: compound
- salt.master
- registry
@@ -146,6 +146,32 @@ base:
- stig
- kafka
'*_managerhype and G@saltversion:{{saltversion}} and not I@node_data:False':
- match: compound
- salt.master
- registry
- nginx
- influxdb
- postgres
- strelka.manager
- soc
- kratos
- hydra
- firewall
- manager
- sensoroni
- telegraf
- backup.config_backup
- elasticsearch
- logstash
- redis
- elastic-fleet-package-registry
- kibana
- elastalert
- utility
- elasticfleet
- kafka
'*_managerhype and I@features:vrt and G@saltversion:{{saltversion}}':
- match: compound
- manager.hypervisor
@@ -286,7 +312,6 @@ base:
- libvirt
- libvirt.images
- elasticfleet.install_agent_grid
- stig
'*_desktop and G@saltversion:{{saltversion}}':
- sensoroni
Binary file not shown.
Binary file not shown.