18150 Commits

Author SHA1 Message Date
Mike Reeves f3181b204a Remove so-telegraf-trim and update retention description
pg_partman drops old partitions hourly; row-DELETE retention is
obsolete and a confusing emergency fallback on partitioned tables.
2026-04-17 19:06:16 -04:00
Mike Reeves dd39db4584 Drop so_telegraf_trim cron.absent tombstone
feature/postgres never shipped the original cron.present, so this
cleanup state is a no-op on every fresh install. The script itself
stays on disk for emergency use.
2026-04-17 18:59:39 -04:00
Mike Reeves 759880a800 Wait for TCP-ready postgres, not the init-phase Unix socket
docker-entrypoint.sh runs the init-scripts phase with listen_addresses=''
(Unix socket only). The old pg_isready check passed there and then raced
the docker_temp_server_stop shutdown before the final postgres started.
pg_isready -h 127.0.0.1 only returns success once the real CMD binds
TCP, so downstream psql execs never land during the shutdown window.
2026-04-17 16:43:41 -04:00
Jorge Reyes f5cd90d139 Merge pull request #15786 from Security-Onion-Solutions/reyesj2-es933
add wait_for_so-elasticsearch state and split elasticsearch cluster c…
2026-04-17 14:47:11 -05:00
Mike Reeves 31383bd9d0 Make Telegraf Postgres templates idempotent
Use CREATE TABLE IF NOT EXISTS and a WHERE-guarded create_parent() so
a Telegraf restart can re-run the templates safely after manual DB
surgery. Add an explicit tag_table_create_templates mirroring the
plugin default with IF NOT EXISTS for the same reason.
2026-04-17 15:43:50 -04:00
reyesj2 ebb93b4fa7 add wait_for_so-elasticsearch state and split elasticsearch cluster configuration out of enabled.sls 2026-04-17 14:43:07 -05:00
Mike Reeves 21076af01e Grant so_telegraf CREATE on partman schema
pg_partman 5.x's create_partition() creates a per-parent template
table inside the partman schema at runtime, which requires CREATE on
that schema. Also extend ALTER DEFAULT PRIVILEGES so the runtime-
created template tables are accessible to so_telegraf.
2026-04-17 15:34:19 -04:00
Mike Reeves f11e9da83a Mark time column NOT NULL before partman.create_parent
pg_partman 5.x requires the control column to be NOT NULL; Telegraf's
generated columns are nullable by default.
2026-04-17 15:27:06 -04:00
Mike Reeves 0fddcd8fe7 Pass unquoted schema.name to partman.create_parent
pg_partman 5.x splits p_parent_table on '.' and looks up the parts as
raw identifiers, so the literal must be 'schema.name' rather than the
double-quoted form quoteLiteral emits for .table.
2026-04-17 15:22:57 -04:00
Mike Reeves 927eba566c Grant so_telegraf access to partman schema
Telegraf calls partman.create_parent() on first write of each metric,
which needs USAGE on the partman schema, EXECUTE on its functions and
procedures, and DML on partman.part_config.
2026-04-17 15:13:08 -04:00
Mike Reeves af9330a9dd Escape Go-template placeholders from Jinja in telegraf.conf 2026-04-17 15:04:37 -04:00
Mike Reeves b3fbd5c7a4 Use Go-template placeholders and shell-guarded CREATE DATABASE
- Telegraf's outputs.postgresql plugin uses Go text/template syntax,
  not uppercase tokens. The {TABLE}/{COLUMNS}/{TABLELITERAL} strings
  were passed through to Postgres literally, producing syntax errors
  on every metric's first write. Switch to {{ .table }}, {{ .columns }},
  and {{ .table|quoteLiteral }} so partitioned parents and the partman
  create_parent() call succeed.
- Replace the \gexec "CREATE DATABASE ... WHERE NOT EXISTS" idiom in
  both init-users.sh and telegraf_users.sls with an explicit shell
  conditional. The prior idiom occasionally fired CREATE DATABASE even
  when so_telegraf already existed, producing duplicate-key failures.
2026-04-17 14:55:13 -04:00
Mike Reeves 5228668be0 Fix Telegraf→Postgres table creation and state.apply race
- Telegraf's partman template passed p_type:='native', which pg_partman
  5.x (the version shipped by postgresql-17-partman on Debian) rejects.
  Switched to 'range' so partman.create_parent() actually creates
  partitions and Telegraf's INSERTs succeed.
- Added a postgres_wait_ready gate in telegraf_users.sls so psql execs
  don't race the init-time restart that docker-entrypoint.sh performs.
- so-verify now ignores the literal "-v ON_ERROR_STOP=1" token in the
  setup log. Dropped the matching entry from so-log-check, which scans
  container stdout where that token never appears.
2026-04-17 13:00:12 -04:00
Mike Reeves 7d07f3c8fe Create so_telegraf DB from Salt and pin pg_partman schema
init-users.sh only runs on a fresh data dir, so upgrades onto an
existing /nsm/postgres volume never got so_telegraf. Pinning partman's
schema also makes partman.part_config reliably resolvable.
2026-04-17 10:51:08 -04:00
Mike Reeves d9a9029ce5 Adopt pg_partman + pg_cron for Telegraf metric tables
Every telegraf.* metric table is now a daily time-range partitioned
parent managed by pg_partman. Retention drops old partitions instead
of the row-by-row DELETE that so-telegraf-trim used to run nightly,
and dashboards will benefit from partition pruning at query time.

- Load pg_cron at server start via shared_preload_libraries and point
  cron.database_name at so_telegraf so job metadata lives alongside
  the metrics
- Telegraf create_templates override makes every new metric table a
  PARTITION BY RANGE (time) parent registered with partman.create_parent
  in one transaction (1 day interval, 3 premade)
- postgres_telegraf_group_role now also creates pg_partman and pg_cron
  extensions and schedules hourly partman.run_maintenance_proc
- New retention reconcile state updates partman.part_config.retention
  from postgres.telegraf.retention_days on every apply
- so_telegraf_trim cron is now unconditionally absent; script stays on
  disk as a manual fallback
2026-04-16 17:27:15 -04:00
Mike Reeves 9fe53d9ccc Use JSONB for Telegraf fields/tags to avoid 1600-column limit
High-cardinality inputs (docker, procstat, kafka) trigger ALTER TABLE
ADD COLUMN on every new field name, and with all minions writing into
a shared 'telegraf' schema the metric tables hit Postgres's 1600-column
per-table ceiling quickly. Setting fields_as_jsonb and tags_as_jsonb on
the postgresql output keeps metric tables fixed at (time, tag_id,
fields jsonb) and tag tables at (tag_id, tags jsonb).

- so-stats-show rewritten to use JSONB accessors
  ((fields->>'x')::numeric, tags->>'host', etc.) and cast memory/disk
  sizes to bigint so pg_size_pretty works
- Drop regex/regexFailureMessage from telegraf_output SOC UI entry to
  match the convention upstream used when removing them from
  mdengine/pcapengine/pipeline; options: list drives validation
2026-04-16 17:02:21 -04:00
Mike Reeves f7b80f5931 Merge branch '3/dev' into feature/postgres 2026-04-16 16:37:02 -04:00
Mike Reeves f11d315fea Fix soup 2026-04-16 16:35:24 -04:00
Mike Reeves 2013bf9e30 Fix soup 2026-04-16 16:20:25 -04:00
Mike Reeves a2ffb92b8d Fix soup 2026-04-16 16:19:53 -04:00
Jorge Reyes 8b6d11b118 Merge pull request #15780 from Security-Onion-Solutions/reyesj2-es932
supress noisy warning from ES 9.3.3
2026-04-16 14:42:46 -05:00
reyesj2 ba00ae8a7b supress noisy warning from ES 9.3.3 2026-04-16 14:41:25 -05:00
Mike Reeves 470b3bd4da Comingle Telegraf metrics into shared schema
Per-minion schemas cause table count to explode (N minions * M metrics)
and the per-minion revocation story isn't worth it when retention is
short. Move all minions to a shared 'telegraf' schema while keeping
per-minion login credentials for audit.

- New so_telegraf NOLOGIN group role owns the telegraf schema; each
  per-minion role is a member and inherits insert/select via role
  inheritance
- Telegraf connection string uses options='-c role=so_telegraf' so
  tables auto-created on first write belong to the group role
- so-telegraf-trim walks the flat telegraf.* table set instead of
  per-minion schemas
- so-stats-show filters by host tag; CLI arg is now the hostname as
  tagged by Telegraf rather than a sanitized schema suffix
- Also renames so-show-stats -> so-stats-show
2026-04-16 15:40:54 -04:00
Mike Reeves c124186989 so-log-check: exclude psql ON_ERROR_STOP flag
The psql invocation flag '-v ON_ERROR_STOP=1' used by the so-postgres
init script gets flagged by so-log-check because the token 'ERROR'
matches its error regex. Add to the exclusion list.
2026-04-15 19:45:42 -04:00
Mike Reeves d24808ff98 Fix so-show-stats tag column resolution
Telegraf's postgresql output stores tag values either as individual
columns on <metric>_tag or as a single JSONB 'tags' column, depending
on plugin version. Introspect information_schema.columns and build the
right accessor per tag instead of assuming one layout.
2026-04-15 19:28:10 -04:00
Jorge Reyes 7d22f7bd58 Merge pull request #15776 from Security-Onion-Solutions/foxtrot
ES 9.3.3
2026-04-15 16:29:34 -05:00
Jorge Reyes 88582c94e8 remove foxtrot version 2026-04-15 15:04:20 -05:00
Mike Reeves cefbe01333 Add telegraf_output selector for InfluxDB/Postgres dual-write
Introduces global.telegraf_output (INFLUXDB|POSTGRES|BOTH, default BOTH)
so Telegraf can write metrics to Postgres alongside or instead of
InfluxDB. Each minion authenticates with its own so_telegraf_<minion>
role and writes to a matching schema inside a shared so_telegraf
database, keeping blast radius per-credential to that minion's data.

- Per-minion credentials auto-generated and persisted in postgres/auth.sls
- postgres/telegraf_users.sls reconciles roles/schemas on every apply
- Firewall opens 5432 only to minion hostgroups when Postgres output is active
- Reactor on salt/auth + orch/telegraf_postgres_sync.sls provision new
  minions automatically on key accept
- soup post_to_3.1.0 backfills users for existing minions on upgrade
- so-show-stats prints latest CPU/mem/disk/load per minion for sanity checks
- so-telegraf-trim + nightly cron prune rows older than
  postgres.telegraf.retention_days (default 14)
2026-04-15 14:32:10 -04:00
Jorge Reyes 76a6997de2 Merge pull request #15775 from Security-Onion-Solutions/reyesj2-es932
check for addon-index templates dir before attempting to load addon i…
2026-04-14 19:27:02 -05:00
reyesj2 16a4a42faf check for addon-index templates dir before attempting to load addon index templates 2026-04-14 19:26:37 -05:00
Jorge Reyes 0e4623c728 Merge pull request #15772 from Security-Onion-Solutions/reyesj2-es932
soup to 3.1.0
2026-04-14 15:04:46 -05:00
reyesj2 d598e20fbb soup 3.1.0 2026-04-14 14:55:33 -05:00
Jason Ertel 8b0d4b2195 Merge pull request #15769 from Security-Onion-Solutions/jertel/wip
Improve test scenario for node descriptions
2026-04-13 18:43:01 -04:00
Jorge Reyes cf414423b1 Merge pull request #15770 from Security-Onion-Solutions/reyesj2-es932
enable elastic agent patch release for 9.3.3
2026-04-13 16:28:20 -05:00
reyesj2 0405a66c72 enable elastic agent patch release for 9.3.3 2026-04-13 16:27:28 -05:00
Jason Ertel da7c2995b0 include trailing numbers as an additional test 2026-04-13 17:09:10 -04:00
Jorge Reyes 696a1a729c Merge pull request #15768 from Security-Onion-Solutions/reyesj2-es932
ES 9.3.3
2026-04-13 15:02:19 -05:00
Jason Ertel 5fa7006f11 Merge pull request #15766 from Security-Onion-Solutions/jertel/wip
support minion node descriptions containing spaces
2026-04-13 15:24:45 -04:00
Jason Ertel 5634aed679 support minion node descriptions containing spaces 2026-04-13 15:19:39 -04:00
reyesj2 a232cd89cc ES 9.3.3 2026-04-13 13:36:51 -05:00
reyesj2 dd40e44530 show when addon integrations are already loaded 2026-04-13 12:36:42 -05:00
Jorge Reyes 47d226e189 Merge pull request #15765 from Security-Onion-Solutions/3/dev
3/dev
2026-04-13 10:40:38 -05:00
Jorge Reyes 440537140b Merge pull request #15764 from Security-Onion-Solutions/reyesj2-es932
elasticsearch ilm policy load script
2026-04-13 10:39:12 -05:00
reyesj2 29e13b2c0b elasticsearch ilm policy load script 2026-04-13 10:00:17 -05:00
Jorge Reyes 2006a07637 Merge pull request #15763 from Security-Onion-Solutions/reyesj2-es932
start loading addon integration index templates
2026-04-12 00:40:18 -05:00
reyesj2 abcad9fde0 addon statefile 2026-04-12 00:36:30 -05:00
reyesj2 a43947cca5 elasticsearch template load script -- for addon index templates 2026-04-12 00:23:26 -05:00
Jorge Reyes f51de6569f Merge pull request #15762 from Security-Onion-Solutions/reyesj2-es932
only append "-mappings" to component template names as needed
2026-04-11 15:42:33 -05:00
reyesj2 b0584a4dc5 only append "-mappings" to component template names as needed 2026-04-11 15:22:50 -05:00
Jorge Reyes 08f34d408f Merge pull request #15761 from Security-Onion-Solutions/reyesj2-es932
rework elasticsearch template load script -- for core templates
2026-04-11 04:42:45 -05:00