securityonion

mirror of https://github.com/Security-Onion-Solutions/securityonion.git synced 2026-04-23 05:03:20 +02:00

Author	SHA1	Message	Date
Mike Reeves	614f32c5e0	Split postgres auth from per-minion telegraf creds The old flow had two writers for each per-minion Telegraf password (so-minion wrote the minion pillar; postgres.auth regenerated any missing aggregate entries). They drifted on first-boot and there was no trigger to create DB roles when a new minion joined. Split responsibilities: - pillar/postgres/auth.sls (manager-scoped) keeps only the so_postgres admin cred. - pillar/telegraf/creds.sls (grid-wide) holds a {minion_id: {user, pass}} map, shadowed per-install by the local-pillar copy. - salt/manager/tools/sbin/so-telegraf-cred is the single writer: flock, atomic YAML write, PyYAML safe_dump so passwords never round-trip through so-yaml.py's type coercion. Idempotent add, quiet remove. - so-minion's add/remove hooks now shell out to so-telegraf-cred instead of editing pillar files directly. - postgres.telegraf_users iterates the new pillar key and CREATE/ALTERs roles from it; telegraf.conf reads its own entry via grains.id. - orch.deploy_newnode runs postgres.telegraf_users on the manager and refreshes the new minion's pillar before the new node highstates, so the DB role is in place the first time telegraf tries to connect. - soup's post_to_3.1.0 backfills the creds pillar from accepted salt keys (idempotent) and runs postgres.telegraf_users once to reconcile the DB.	2026-04-22 10:55:15 -04:00
Mike Reeves	d5dc28e526	Fan postgres telegraf cred for manager on every auth run The empty-pillar case produced a telegraf.conf with `user= password=` which libpq misparses ("password=" gets consumed as the user value), yielding `password authentication failed for user "password="` on every manager without a prior fan-out (fresh install, not the salt-key path the reactor handles). Two fixes: - salt/postgres/auth.sls: always fan for grains.id in addition to any postgres_fanout_minion from the reactor, so the manager's own pillar is populated on every postgres.auth run. The existing `unless` guard keeps re-runs idempotent. - salt/telegraf/etc/telegraf.conf: gate the [[outputs.postgresql]] block on PG_USER and PG_PASS being non-empty. If a minion hasn't received its pillar yet the output block simply isn't rendered — the next highstate picks up the creds once the fan-out completes, and in the meantime telegraf keeps running the other outputs instead of erroring with a malformed connection string.	2026-04-21 14:40:19 -04:00
Mike Reeves	bb71e44614	Write per-minion telegraf creds to each minion's own pillar file pillar/top.sls only distributes postgres.auth to manager-class roles, so sensors / heavynodes / searchnodes / receivers / fleet / idh / hypervisor / desktop minions never received the postgres telegraf password they need to write metrics. Broadcasting the aggregate postgres.auth pillar to every role would leak the so_postgres admin password and every other minion's cred. Fan out per-minion credentials into each minion's own pillar file at /opt/so/saltstack/local/pillar/minions/<id>.sls. That file is already distributed by pillar/top.sls exclusively to the matching minion via `- minions.{{ grains.id }}`, so each minion sees only its own postgres.telegraf.{user,pass} and nothing else. - salt/postgres/auth.sls: after writing the manager-scoped aggregate pillar, fan the per-minion creds out via so-yaml.py replace for every up-minion. Creates the minion pillar file if missing. Requires postgres_auth_pillar so the manager pillar lands first. - salt/telegraf/etc/telegraf.conf: consume postgres:telegraf:user and postgres:telegraf:pass directly from the minion's own pillar instead of walking postgres:auth:users which isn't visible off the manager.	2026-04-21 09:57:35 -04:00
Mike Reeves	3ecd19d085	Move telegraf_output from global pillar to telegraf pillar The Telegraf backend selector lived at global.telegraf_output but it is a Telegraf-scoped setting, not a cross-cutting grid global. Move both the value and the UI annotation under the telegraf pillar so it shows up alongside the other Telegraf tuning knobs in the Configuration UI. - salt/telegraf/defaults.yaml: add telegraf.output: BOTH - salt/telegraf/soc_telegraf.yaml: add telegraf.output annotation - salt/global/defaults.yaml: remove global.telegraf_output - salt/global/soc_global.yaml: remove global.telegraf_output annotation - salt/vars/globals.map.jinja: drop telegraf_output from GLOBALS - salt/firewall/map.jinja: read via pillar.get('telegraf:output') - salt/postgres/telegraf_users.sls: read via pillar.get('telegraf:output') - salt/telegraf/etc/telegraf.conf: read via TELEGRAFMERGED.output - salt/postgres/tools/sbin/so-stats-show: update user-facing docs No behavioral change — default stays BOTH.	2026-04-20 16:03:02 -04:00
Mike Reeves	31383bd9d0	Make Telegraf Postgres templates idempotent Use CREATE TABLE IF NOT EXISTS and a WHERE-guarded create_parent() so a Telegraf restart can re-run the templates safely after manual DB surgery. Add an explicit tag_table_create_templates mirroring the plugin default with IF NOT EXISTS for the same reason.	2026-04-17 15:43:50 -04:00
Mike Reeves	f11e9da83a	Mark time column NOT NULL before partman.create_parent pg_partman 5.x requires the control column to be NOT NULL; Telegraf's generated columns are nullable by default.	2026-04-17 15:27:06 -04:00
Mike Reeves	0fddcd8fe7	Pass unquoted schema.name to partman.create_parent pg_partman 5.x splits p_parent_table on '.' and looks up the parts as raw identifiers, so the literal must be 'schema.name' rather than the double-quoted form quoteLiteral emits for .table.	2026-04-17 15:22:57 -04:00
Mike Reeves	af9330a9dd	Escape Go-template placeholders from Jinja in telegraf.conf	2026-04-17 15:04:37 -04:00
Mike Reeves	b3fbd5c7a4	Use Go-template placeholders and shell-guarded CREATE DATABASE - Telegraf's outputs.postgresql plugin uses Go text/template syntax, not uppercase tokens. The {TABLE}/{COLUMNS}/{TABLELITERAL} strings were passed through to Postgres literally, producing syntax errors on every metric's first write. Switch to {{ .table }}, {{ .columns }}, and {{ .table\|quoteLiteral }} so partitioned parents and the partman create_parent() call succeed. - Replace the \gexec "CREATE DATABASE ... WHERE NOT EXISTS" idiom in both init-users.sh and telegraf_users.sls with an explicit shell conditional. The prior idiom occasionally fired CREATE DATABASE even when so_telegraf already existed, producing duplicate-key failures.	2026-04-17 14:55:13 -04:00
Mike Reeves	5228668be0	Fix Telegraf→Postgres table creation and state.apply race - Telegraf's partman template passed p_type:='native', which pg_partman 5.x (the version shipped by postgresql-17-partman on Debian) rejects. Switched to 'range' so partman.create_parent() actually creates partitions and Telegraf's INSERTs succeed. - Added a postgres_wait_ready gate in telegraf_users.sls so psql execs don't race the init-time restart that docker-entrypoint.sh performs. - so-verify now ignores the literal "-v ON_ERROR_STOP=1" token in the setup log. Dropped the matching entry from so-log-check, which scans container stdout where that token never appears.	2026-04-17 13:00:12 -04:00
Mike Reeves	d9a9029ce5	Adopt pg_partman + pg_cron for Telegraf metric tables Every telegraf.* metric table is now a daily time-range partitioned parent managed by pg_partman. Retention drops old partitions instead of the row-by-row DELETE that so-telegraf-trim used to run nightly, and dashboards will benefit from partition pruning at query time. - Load pg_cron at server start via shared_preload_libraries and point cron.database_name at so_telegraf so job metadata lives alongside the metrics - Telegraf create_templates override makes every new metric table a PARTITION BY RANGE (time) parent registered with partman.create_parent in one transaction (1 day interval, 3 premade) - postgres_telegraf_group_role now also creates pg_partman and pg_cron extensions and schedules hourly partman.run_maintenance_proc - New retention reconcile state updates partman.part_config.retention from postgres.telegraf.retention_days on every apply - so_telegraf_trim cron is now unconditionally absent; script stays on disk as a manual fallback	2026-04-16 17:27:15 -04:00
Mike Reeves	9fe53d9ccc	Use JSONB for Telegraf fields/tags to avoid 1600-column limit High-cardinality inputs (docker, procstat, kafka) trigger ALTER TABLE ADD COLUMN on every new field name, and with all minions writing into a shared 'telegraf' schema the metric tables hit Postgres's 1600-column per-table ceiling quickly. Setting fields_as_jsonb and tags_as_jsonb on the postgresql output keeps metric tables fixed at (time, tag_id, fields jsonb) and tag tables at (tag_id, tags jsonb). - so-stats-show rewritten to use JSONB accessors ((fields->>'x')::numeric, tags->>'host', etc.) and cast memory/disk sizes to bigint so pg_size_pretty works - Drop regex/regexFailureMessage from telegraf_output SOC UI entry to match the convention upstream used when removing them from mdengine/pcapengine/pipeline; options: list drives validation	2026-04-16 17:02:21 -04:00
Mike Reeves	470b3bd4da	Comingle Telegraf metrics into shared schema Per-minion schemas cause table count to explode (N minions * M metrics) and the per-minion revocation story isn't worth it when retention is short. Move all minions to a shared 'telegraf' schema while keeping per-minion login credentials for audit. - New so_telegraf NOLOGIN group role owns the telegraf schema; each per-minion role is a member and inherits insert/select via role inheritance - Telegraf connection string uses options='-c role=so_telegraf' so tables auto-created on first write belong to the group role - so-telegraf-trim walks the flat telegraf.* table set instead of per-minion schemas - so-stats-show filters by host tag; CLI arg is now the hostname as tagged by Telegraf rather than a sanitized schema suffix - Also renames so-show-stats -> so-stats-show	2026-04-16 15:40:54 -04:00
Mike Reeves	cefbe01333	Add telegraf_output selector for InfluxDB/Postgres dual-write Introduces global.telegraf_output (INFLUXDB\|POSTGRES\|BOTH, default BOTH) so Telegraf can write metrics to Postgres alongside or instead of InfluxDB. Each minion authenticates with its own so_telegraf_<minion> role and writes to a matching schema inside a shared so_telegraf database, keeping blast radius per-credential to that minion's data. - Per-minion credentials auto-generated and persisted in postgres/auth.sls - postgres/telegraf_users.sls reconciles roles/schemas on every apply - Firewall opens 5432 only to minion hostgroups when Postgres output is active - Reactor on salt/auth + orch/telegraf_postgres_sync.sls provision new minions automatically on key accept - soup post_to_3.1.0 backfills users for existing minions on upgrade - so-show-stats prints latest CPU/mem/disk/load per minion for sanity checks - so-telegraf-trim + nightly cron prune rows older than postgres.telegraf.retention_days (default 14)	2026-04-15 14:32:10 -04:00
Josh Patterson	2186872317	update telegraf lower true/false	2026-03-20 09:19:22 -04:00
Josh Patterson	7ece93d7e0	ensure bool sliders telegraf	2026-03-19 15:12:47 -04:00
reyesj2	a99c553ada	use logstash merged values for logstash metric collection	2026-01-30 11:40:12 -06:00
reyesj2	e5226b50ed	disable logstash metrics collection on nodes not running logstash + fleet nodes	2026-01-27 16:37:23 -06:00
reyesj2	835b2609b6	telegraf - increase esindexsize.sh script timeout	2025-10-29 13:45:55 -05:00
reyesj2	870a9ff80c	dedup	2025-05-16 10:24:09 -05:00
reyesj2	689db57f5f	logstash isn't running on receivers or manager when kafka is the global.pipeline	2025-05-16 10:05:38 -05:00
reyesj2	fd02950864	use globals.is_manager	2025-05-02 13:36:28 -05:00
reyesj2	b918a5e256	old attempt	2025-04-29 16:05:55 -05:00
reyesj2	3cb3281cd5	add metrics for es index sizes	2025-04-29 12:38:41 -05:00
reyesj2	400739736d	add monitored mounts, ignores docker overlays	2025-04-23 15:02:23 -05:00
reyesj2	80b1d51f76	wrong location for global.pipeline check Signed-off-by: reyesj2 <94730068+reyesj2@users.noreply.github.com>	2024-06-13 08:50:53 -04:00
reyesj2	9c31622598	telegraft should only include jolokia config when Kafka is set as the global.pipeline Signed-off-by: reyesj2 <94730068+reyesj2@users.noreply.github.com>	2024-06-12 15:42:00 -04:00
reyesj2	59097070ef	Revert "Remove unneeded jolokia aggregate metrics to reduce data ingested to influx" This reverts commit `1c1a1a1d3f`.	2024-05-28 12:17:43 -04:00
reyesj2	1c1a1a1d3f	Remove unneeded jolokia aggregate metrics to reduce data ingested to influx Signed-off-by: reyesj2 <94730068+reyesj2@users.noreply.github.com>	2024-05-28 11:14:19 -04:00
reyesj2	15a0b959aa	Add jolokia metrics for influxdb dashboard Signed-off-by: reyesj2 <94730068+reyesj2@users.noreply.github.com>	2024-05-28 10:51:39 -04:00
reyesj2	dff609d829	Add basic read-only metric collection from Kafka Signed-off-by: reyesj2 <94730068+reyesj2@users.noreply.github.com>	2024-05-08 16:13:09 -04:00
Jason Ertel	25c39540c8	fix import stats	2023-12-11 14:48:46 -05:00
m0duspwnens	5278601e5d	manage telegraf scripts with a defaults file assigned per node type	2023-08-07 11:18:35 -04:00
Jason Ertel	46371aaaf5	Monitor all mount points for simplicity	2023-06-09 09:14:36 -04:00
m0duspwnens	20f706f165	enable/disable telegraf in ui	2023-05-11 12:12:25 -04:00
m0duspwnens	b6d55bedc8	make influxdb token accessible to all nodes	2023-03-06 13:50:17 -05:00
Jason Ertel	0eec8b22a2	influx upgrade	2023-02-09 18:27:14 -05:00
Jason Ertel	0e50d36da6	upgrade influx	2023-02-09 16:18:04 -05:00
m0duspwnens	88107fe0df	remove filebeat and redis(commented out) from telegraf config	2023-01-24 08:59:51 -05:00
Mike Reeves	308228620a	Specify Influxdb host	2022-12-22 13:05:33 -05:00
Mike Reeves	676aec7576	Add config map	2022-12-16 11:22:53 -05:00
Mike Reeves	5badfb9cf5	Fix pillar	2022-12-16 08:38:31 -05:00
Mike Reeves	8a0991afd0	Fix pillar	2022-12-15 15:05:57 -05:00
Mike Reeves	121d07733f	Merge the defaults and pillar for telegraf	2022-12-15 13:29:31 -05:00
Mike Reeves	e55086230d	Merge the defaults and pillar for telegraf	2022-12-15 13:28:29 -05:00
Mike Reeves	d37a4b14ca	Spelling error	2022-12-15 12:02:01 -05:00
Mike Reeves	fd27044471	Spelling error	2022-12-15 11:57:06 -05:00
Mike Reeves	ed87b08fc1	Spelling error	2022-12-15 10:59:07 -05:00
Mike Reeves	28e8c54443	Wire telegraf initial commit	2022-12-15 10:43:58 -05:00
m0duspwnens	b526532ab6	use global vars in states	2022-10-11 11:57:15 -04:00

1 2 3

106 Commits