securityonion

mirror of https://github.com/Security-Onion-Solutions/securityonion.git synced 2026-05-09 12:52:38 +02:00

Author	SHA1	Message	Date
Mike Reeves	614f32c5e0	Split postgres auth from per-minion telegraf creds The old flow had two writers for each per-minion Telegraf password (so-minion wrote the minion pillar; postgres.auth regenerated any missing aggregate entries). They drifted on first-boot and there was no trigger to create DB roles when a new minion joined. Split responsibilities: - pillar/postgres/auth.sls (manager-scoped) keeps only the so_postgres admin cred. - pillar/telegraf/creds.sls (grid-wide) holds a {minion_id: {user, pass}} map, shadowed per-install by the local-pillar copy. - salt/manager/tools/sbin/so-telegraf-cred is the single writer: flock, atomic YAML write, PyYAML safe_dump so passwords never round-trip through so-yaml.py's type coercion. Idempotent add, quiet remove. - so-minion's add/remove hooks now shell out to so-telegraf-cred instead of editing pillar files directly. - postgres.telegraf_users iterates the new pillar key and CREATE/ALTERs roles from it; telegraf.conf reads its own entry via grains.id. - orch.deploy_newnode runs postgres.telegraf_users on the manager and refreshes the new minion's pillar before the new node highstates, so the DB role is in place the first time telegraf tries to connect. - soup's post_to_3.1.0 backfills the creds pillar from accepted salt keys (idempotent) and runs postgres.telegraf_users once to reconcile the DB.	2026-04-22 10:55:15 -04:00
Mike Reeves	d5dc28e526	Fan postgres telegraf cred for manager on every auth run The empty-pillar case produced a telegraf.conf with `user= password=` which libpq misparses ("password=" gets consumed as the user value), yielding `password authentication failed for user "password="` on every manager without a prior fan-out (fresh install, not the salt-key path the reactor handles). Two fixes: - salt/postgres/auth.sls: always fan for grains.id in addition to any postgres_fanout_minion from the reactor, so the manager's own pillar is populated on every postgres.auth run. The existing `unless` guard keeps re-runs idempotent. - salt/telegraf/etc/telegraf.conf: gate the [[outputs.postgresql]] block on PG_USER and PG_PASS being non-empty. If a minion hasn't received its pillar yet the output block simply isn't rendered — the next highstate picks up the creds once the fan-out completes, and in the meantime telegraf keeps running the other outputs instead of erroring with a malformed connection string.	2026-04-21 14:40:19 -04:00
Mike Reeves	bb71e44614	Write per-minion telegraf creds to each minion's own pillar file pillar/top.sls only distributes postgres.auth to manager-class roles, so sensors / heavynodes / searchnodes / receivers / fleet / idh / hypervisor / desktop minions never received the postgres telegraf password they need to write metrics. Broadcasting the aggregate postgres.auth pillar to every role would leak the so_postgres admin password and every other minion's cred. Fan out per-minion credentials into each minion's own pillar file at /opt/so/saltstack/local/pillar/minions/<id>.sls. That file is already distributed by pillar/top.sls exclusively to the matching minion via `- minions.{{ grains.id }}`, so each minion sees only its own postgres.telegraf.{user,pass} and nothing else. - salt/postgres/auth.sls: after writing the manager-scoped aggregate pillar, fan the per-minion creds out via so-yaml.py replace for every up-minion. Creates the minion pillar file if missing. Requires postgres_auth_pillar so the manager pillar lands first. - salt/telegraf/etc/telegraf.conf: consume postgres:telegraf:user and postgres:telegraf:pass directly from the minion's own pillar instead of walking postgres:auth:users which isn't visible off the manager.	2026-04-21 09:57:35 -04:00
Mike Reeves	3ecd19d085	Move telegraf_output from global pillar to telegraf pillar The Telegraf backend selector lived at global.telegraf_output but it is a Telegraf-scoped setting, not a cross-cutting grid global. Move both the value and the UI annotation under the telegraf pillar so it shows up alongside the other Telegraf tuning knobs in the Configuration UI. - salt/telegraf/defaults.yaml: add telegraf.output: BOTH - salt/telegraf/soc_telegraf.yaml: add telegraf.output annotation - salt/global/defaults.yaml: remove global.telegraf_output - salt/global/soc_global.yaml: remove global.telegraf_output annotation - salt/vars/globals.map.jinja: drop telegraf_output from GLOBALS - salt/firewall/map.jinja: read via pillar.get('telegraf:output') - salt/postgres/telegraf_users.sls: read via pillar.get('telegraf:output') - salt/telegraf/etc/telegraf.conf: read via TELEGRAFMERGED.output - salt/postgres/tools/sbin/so-stats-show: update user-facing docs No behavioral change — default stays BOTH.	2026-04-20 16:03:02 -04:00
Mike Reeves	31383bd9d0	Make Telegraf Postgres templates idempotent Use CREATE TABLE IF NOT EXISTS and a WHERE-guarded create_parent() so a Telegraf restart can re-run the templates safely after manual DB surgery. Add an explicit tag_table_create_templates mirroring the plugin default with IF NOT EXISTS for the same reason.	2026-04-17 15:43:50 -04:00
Mike Reeves	f11e9da83a	Mark time column NOT NULL before partman.create_parent pg_partman 5.x requires the control column to be NOT NULL; Telegraf's generated columns are nullable by default.	2026-04-17 15:27:06 -04:00
Mike Reeves	0fddcd8fe7	Pass unquoted schema.name to partman.create_parent pg_partman 5.x splits p_parent_table on '.' and looks up the parts as raw identifiers, so the literal must be 'schema.name' rather than the double-quoted form quoteLiteral emits for .table.	2026-04-17 15:22:57 -04:00
Mike Reeves	af9330a9dd	Escape Go-template placeholders from Jinja in telegraf.conf	2026-04-17 15:04:37 -04:00
Mike Reeves	b3fbd5c7a4	Use Go-template placeholders and shell-guarded CREATE DATABASE - Telegraf's outputs.postgresql plugin uses Go text/template syntax, not uppercase tokens. The {TABLE}/{COLUMNS}/{TABLELITERAL} strings were passed through to Postgres literally, producing syntax errors on every metric's first write. Switch to {{ .table }}, {{ .columns }}, and {{ .table\|quoteLiteral }} so partitioned parents and the partman create_parent() call succeed. - Replace the \gexec "CREATE DATABASE ... WHERE NOT EXISTS" idiom in both init-users.sh and telegraf_users.sls with an explicit shell conditional. The prior idiom occasionally fired CREATE DATABASE even when so_telegraf already existed, producing duplicate-key failures.	2026-04-17 14:55:13 -04:00
Mike Reeves	5228668be0	Fix Telegraf→Postgres table creation and state.apply race - Telegraf's partman template passed p_type:='native', which pg_partman 5.x (the version shipped by postgresql-17-partman on Debian) rejects. Switched to 'range' so partman.create_parent() actually creates partitions and Telegraf's INSERTs succeed. - Added a postgres_wait_ready gate in telegraf_users.sls so psql execs don't race the init-time restart that docker-entrypoint.sh performs. - so-verify now ignores the literal "-v ON_ERROR_STOP=1" token in the setup log. Dropped the matching entry from so-log-check, which scans container stdout where that token never appears.	2026-04-17 13:00:12 -04:00
Mike Reeves	d9a9029ce5	Adopt pg_partman + pg_cron for Telegraf metric tables Every telegraf.* metric table is now a daily time-range partitioned parent managed by pg_partman. Retention drops old partitions instead of the row-by-row DELETE that so-telegraf-trim used to run nightly, and dashboards will benefit from partition pruning at query time. - Load pg_cron at server start via shared_preload_libraries and point cron.database_name at so_telegraf so job metadata lives alongside the metrics - Telegraf create_templates override makes every new metric table a PARTITION BY RANGE (time) parent registered with partman.create_parent in one transaction (1 day interval, 3 premade) - postgres_telegraf_group_role now also creates pg_partman and pg_cron extensions and schedules hourly partman.run_maintenance_proc - New retention reconcile state updates partman.part_config.retention from postgres.telegraf.retention_days on every apply - so_telegraf_trim cron is now unconditionally absent; script stays on disk as a manual fallback	2026-04-16 17:27:15 -04:00
Mike Reeves	9fe53d9ccc	Use JSONB for Telegraf fields/tags to avoid 1600-column limit High-cardinality inputs (docker, procstat, kafka) trigger ALTER TABLE ADD COLUMN on every new field name, and with all minions writing into a shared 'telegraf' schema the metric tables hit Postgres's 1600-column per-table ceiling quickly. Setting fields_as_jsonb and tags_as_jsonb on the postgresql output keeps metric tables fixed at (time, tag_id, fields jsonb) and tag tables at (tag_id, tags jsonb). - so-stats-show rewritten to use JSONB accessors ((fields->>'x')::numeric, tags->>'host', etc.) and cast memory/disk sizes to bigint so pg_size_pretty works - Drop regex/regexFailureMessage from telegraf_output SOC UI entry to match the convention upstream used when removing them from mdengine/pcapengine/pipeline; options: list drives validation	2026-04-16 17:02:21 -04:00
Mike Reeves	470b3bd4da	Comingle Telegraf metrics into shared schema Per-minion schemas cause table count to explode (N minions * M metrics) and the per-minion revocation story isn't worth it when retention is short. Move all minions to a shared 'telegraf' schema while keeping per-minion login credentials for audit. - New so_telegraf NOLOGIN group role owns the telegraf schema; each per-minion role is a member and inherits insert/select via role inheritance - Telegraf connection string uses options='-c role=so_telegraf' so tables auto-created on first write belong to the group role - so-telegraf-trim walks the flat telegraf.* table set instead of per-minion schemas - so-stats-show filters by host tag; CLI arg is now the hostname as tagged by Telegraf rather than a sanitized schema suffix - Also renames so-show-stats -> so-stats-show	2026-04-16 15:40:54 -04:00
Mike Reeves	cefbe01333	Add telegraf_output selector for InfluxDB/Postgres dual-write Introduces global.telegraf_output (INFLUXDB\|POSTGRES\|BOTH, default BOTH) so Telegraf can write metrics to Postgres alongside or instead of InfluxDB. Each minion authenticates with its own so_telegraf_<minion> role and writes to a matching schema inside a shared so_telegraf database, keeping blast radius per-credential to that minion's data. - Per-minion credentials auto-generated and persisted in postgres/auth.sls - postgres/telegraf_users.sls reconciles roles/schemas on every apply - Firewall opens 5432 only to minion hostgroups when Postgres output is active - Reactor on salt/auth + orch/telegraf_postgres_sync.sls provision new minions automatically on key accept - soup post_to_3.1.0 backfills users for existing minions on upgrade - so-show-stats prints latest CPU/mem/disk/load per minion for sanity checks - so-telegraf-trim + nightly cron prune rows older than postgres.telegraf.retention_days (default 14)	2026-04-15 14:32:10 -04:00
Josh Patterson	2186872317	update telegraf lower true/false	2026-03-20 09:19:22 -04:00
Josh Patterson	7ece93d7e0	ensure bool sliders telegraf	2026-03-19 15:12:47 -04:00
Josh Patterson	c2c5aea244	ensure bool sliders for each state:enabled annotation	2026-03-19 12:35:38 -04:00
Josh Patterson	74ad2990a7	Merge remote-tracking branch 'origin/3/dev' into delta	2026-03-18 13:05:02 -04:00
Josh Patterson	e19e83bebb	allow user defined ulimits	2026-03-18 10:38:15 -04:00
Doug Burks	930985b770	update helpLink references for new documentation	2026-03-18 09:46:45 -04:00
Josh Patterson	341471d38e	DOCKER to DOCKERMERGED	2026-03-17 16:19:36 -04:00
Josh Patterson	00986dc2fd	Merge remote-tracking branch 'origin/delta' into customulimit	2026-03-17 16:04:09 -04:00
Mike Reeves	2d97dfc8a1	Add customizable ulimit settings for all Docker containers Add ulimits as a configurable advanced setting for every container, allowing customization through the web UI. Move hardcoded ulimits from elasticsearch and zeek into defaults.yaml and fix elasticsearch ulimits that were incorrectly nested under the environment key. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-17 15:10:42 -04:00
Josh Patterson	4dc377c99f	DOCKER to DOCKERMERGED	2026-03-17 15:06:06 -04:00
Josh Patterson	94f454c311	cleanup file.absent	2026-03-16 15:57:15 -04:00
Jason Ertel	71839bc87f	remove steno	2026-03-06 15:45:36 -05:00
Jason Ertel	2c4d833a5b	update 2.4 references to 3	2026-03-05 11:05:19 -05:00
reyesj2	12b3081a62	fix agentstatus script	2026-02-25 16:39:33 -06:00
reyesj2	a99c553ada	use logstash merged values for logstash metric collection	2026-01-30 11:40:12 -06:00
reyesj2	e5226b50ed	disable logstash metrics collection on nodes not running logstash + fleet nodes	2026-01-27 16:37:23 -06:00
Josh Patterson	00fbc1c259	add back individual signing policies	2026-01-12 09:25:15 -05:00
Josh Patterson	ee70d94e15	remove old key/crt used for telegraf on non managers	2026-01-08 17:15:35 -05:00
Josh Patterson	9960db200c	Merge remote-tracking branch 'origin/2.4/dev' into bravo	2025-12-11 17:30:43 -05:00
Josh Patterson	b9ff1704b0	the great ssl refactor	2025-12-11 17:30:06 -05:00
DefensiveDepth	5ab6bda639	Fixup logic	2025-12-10 17:16:35 -05:00
DefensiveDepth	9304513ce8	Add support for suricata rules load status	2025-12-04 12:26:13 -05:00
reyesj2	835b2609b6	telegraf - increase esindexsize.sh script timeout	2025-10-29 13:45:55 -05:00
Josh Patterson	b0a8191f59	Merge remote-tracking branch 'origin/2.4/dev' into vlb2	2025-05-19 10:02:26 -04:00
reyesj2	870a9ff80c	dedup	2025-05-16 10:24:09 -05:00
reyesj2	689db57f5f	logstash isn't running on receivers or manager when kafka is the global.pipeline	2025-05-16 10:05:38 -05:00
Josh Patterson	8c37a4454c	merge and fix conflicts	2025-05-06 11:55:42 -04:00
reyesj2	b4214f73f4	typo	2025-05-06 09:01:22 -05:00
reyesj2	b9da7eb35b	missing globals.is_manager swap	2025-05-06 08:58:47 -05:00
reyesj2	fd02950864	use globals.is_manager	2025-05-02 13:36:28 -05:00
reyesj2	044d230158	get 200 from es before collecting metrics	2025-04-30 13:05:36 -05:00
reyesj2	b918a5e256	old attempt	2025-04-29 16:05:55 -05:00
reyesj2	1ddc653a52	fix input error in agentstatus script	2025-04-29 13:40:39 -05:00
reyesj2	85f5f75c84	use salt location for es curl.config	2025-04-29 12:42:05 -05:00
reyesj2	3cb3281cd5	add metrics for es index sizes	2025-04-29 12:38:41 -05:00
Josh Patterson	142609ea67	Merge remote-tracking branch 'origin/2.4/dev' into vlb2	2025-04-24 09:41:27 -04:00

1 2 3 4 5 ...

276 Commits