securityonion

mirror of https://github.com/Security-Onion-Solutions/securityonion.git synced 2026-05-09 12:52:38 +02:00

Author	SHA1	Message	Date
Mike Reeves	614f32c5e0	Split postgres auth from per-minion telegraf creds The old flow had two writers for each per-minion Telegraf password (so-minion wrote the minion pillar; postgres.auth regenerated any missing aggregate entries). They drifted on first-boot and there was no trigger to create DB roles when a new minion joined. Split responsibilities: - pillar/postgres/auth.sls (manager-scoped) keeps only the so_postgres admin cred. - pillar/telegraf/creds.sls (grid-wide) holds a {minion_id: {user, pass}} map, shadowed per-install by the local-pillar copy. - salt/manager/tools/sbin/so-telegraf-cred is the single writer: flock, atomic YAML write, PyYAML safe_dump so passwords never round-trip through so-yaml.py's type coercion. Idempotent add, quiet remove. - so-minion's add/remove hooks now shell out to so-telegraf-cred instead of editing pillar files directly. - postgres.telegraf_users iterates the new pillar key and CREATE/ALTERs roles from it; telegraf.conf reads its own entry via grains.id. - orch.deploy_newnode runs postgres.telegraf_users on the manager and refreshes the new minion's pillar before the new node highstates, so the DB role is in place the first time telegraf tries to connect. - soup's post_to_3.1.0 backfills the creds pillar from accepted salt keys (idempotent) and runs postgres.telegraf_users once to reconcile the DB.	2026-04-22 10:55:15 -04:00
Mike Reeves	f72c30abd0	Have postgres.telegraf_users include postgres.enabled postgres_wait_ready requires docker_container: so-postgres, which is declared in postgres.enabled. Running postgres.telegraf_users on its own — as the reactor orch and the soup post-upgrade step both do — errored because Salt couldn't resolve the require. Include postgres.enabled from postgres.telegraf_users so the container state is always in the render. postgres.enabled already includes telegraf_users; Salt de-duplicates the circular include and the included states are all idempotent, so repeated application is a no-op.	2026-04-21 09:35:59 -04:00
Mike Reeves	b69e50542a	Use TELEGRAFMERGED for telegraf.output and de-jinja pg_hba.conf - firewall/map.jinja and postgres/telegraf_users.sls now pull the telegraf output selector through TELEGRAFMERGED so the defaults.yaml value (BOTH) is the source of truth and pillar overrides merge in cleanly. pillar.get with a hardcoded fallback was brittle and would disagree with defaults.yaml if the two ever diverged. - Rename salt/postgres/files/pg_hba.conf.jinja to pg_hba.conf and drop template: jinja from config.sls — the file has no jinja besides the comment header.	2026-04-20 16:06:01 -04:00
Mike Reeves	3ecd19d085	Move telegraf_output from global pillar to telegraf pillar The Telegraf backend selector lived at global.telegraf_output but it is a Telegraf-scoped setting, not a cross-cutting grid global. Move both the value and the UI annotation under the telegraf pillar so it shows up alongside the other Telegraf tuning knobs in the Configuration UI. - salt/telegraf/defaults.yaml: add telegraf.output: BOTH - salt/telegraf/soc_telegraf.yaml: add telegraf.output annotation - salt/global/defaults.yaml: remove global.telegraf_output - salt/global/soc_global.yaml: remove global.telegraf_output annotation - salt/vars/globals.map.jinja: drop telegraf_output from GLOBALS - salt/firewall/map.jinja: read via pillar.get('telegraf:output') - salt/postgres/telegraf_users.sls: read via pillar.get('telegraf:output') - salt/telegraf/etc/telegraf.conf: read via TELEGRAFMERGED.output - salt/postgres/tools/sbin/so-stats-show: update user-facing docs No behavioral change — default stays BOTH.	2026-04-20 16:03:02 -04:00
Mike Reeves	8225d41661	Harden postgres secrets, TLS enforcement, and admin tooling - Deliver postgres super and app passwords via mounted 0600 secret files (POSTGRES_PASSWORD_FILE, SO_POSTGRES_PASS_FILE) instead of plaintext env vars visible in docker inspect output - Mount a managed pg_hba.conf that only allows local trust and hostssl scram-sha-256 so TCP clients cannot negotiate cleartext sessions - Restrict postgres.key to 0400 and ensure owner/group 939 - Set umask 0077 on so-postgres-backup output - Validate host values in so-stats-show against [A-Za-z0-9._-] before SQL interpolation so a compromised minion cannot inject SQL via a tag value - Coerce postgres:telegraf:retention_days to int before rendering into SQL - Escape single quotes when rendering pillar values into postgresql.conf - Own postgres tooling in /usr/sbin as root:root so a container escape cannot rewrite admin scripts - Gate ES migration TLS verification on esVerifyCert (default false, matching the elastic module's existing pattern)	2026-04-20 12:36:17 -04:00
Mike Reeves	759880a800	Wait for TCP-ready postgres, not the init-phase Unix socket docker-entrypoint.sh runs the init-scripts phase with listen_addresses='' (Unix socket only). The old pg_isready check passed there and then raced the docker_temp_server_stop shutdown before the final postgres started. pg_isready -h 127.0.0.1 only returns success once the real CMD binds TCP, so downstream psql execs never land during the shutdown window.	2026-04-17 16:43:41 -04:00
Mike Reeves	21076af01e	Grant so_telegraf CREATE on partman schema pg_partman 5.x's create_partition() creates a per-parent template table inside the partman schema at runtime, which requires CREATE on that schema. Also extend ALTER DEFAULT PRIVILEGES so the runtime- created template tables are accessible to so_telegraf.	2026-04-17 15:34:19 -04:00
Mike Reeves	927eba566c	Grant so_telegraf access to partman schema Telegraf calls partman.create_parent() on first write of each metric, which needs USAGE on the partman schema, EXECUTE on its functions and procedures, and DML on partman.part_config.	2026-04-17 15:13:08 -04:00
Mike Reeves	b3fbd5c7a4	Use Go-template placeholders and shell-guarded CREATE DATABASE - Telegraf's outputs.postgresql plugin uses Go text/template syntax, not uppercase tokens. The {TABLE}/{COLUMNS}/{TABLELITERAL} strings were passed through to Postgres literally, producing syntax errors on every metric's first write. Switch to {{ .table }}, {{ .columns }}, and {{ .table\|quoteLiteral }} so partitioned parents and the partman create_parent() call succeed. - Replace the \gexec "CREATE DATABASE ... WHERE NOT EXISTS" idiom in both init-users.sh and telegraf_users.sls with an explicit shell conditional. The prior idiom occasionally fired CREATE DATABASE even when so_telegraf already existed, producing duplicate-key failures.	2026-04-17 14:55:13 -04:00
Mike Reeves	5228668be0	Fix Telegraf→Postgres table creation and state.apply race - Telegraf's partman template passed p_type:='native', which pg_partman 5.x (the version shipped by postgresql-17-partman on Debian) rejects. Switched to 'range' so partman.create_parent() actually creates partitions and Telegraf's INSERTs succeed. - Added a postgres_wait_ready gate in telegraf_users.sls so psql execs don't race the init-time restart that docker-entrypoint.sh performs. - so-verify now ignores the literal "-v ON_ERROR_STOP=1" token in the setup log. Dropped the matching entry from so-log-check, which scans container stdout where that token never appears.	2026-04-17 13:00:12 -04:00
Mike Reeves	7d07f3c8fe	Create so_telegraf DB from Salt and pin pg_partman schema init-users.sh only runs on a fresh data dir, so upgrades onto an existing /nsm/postgres volume never got so_telegraf. Pinning partman's schema also makes partman.part_config reliably resolvable.	2026-04-17 10:51:08 -04:00
Mike Reeves	d9a9029ce5	Adopt pg_partman + pg_cron for Telegraf metric tables Every telegraf.* metric table is now a daily time-range partitioned parent managed by pg_partman. Retention drops old partitions instead of the row-by-row DELETE that so-telegraf-trim used to run nightly, and dashboards will benefit from partition pruning at query time. - Load pg_cron at server start via shared_preload_libraries and point cron.database_name at so_telegraf so job metadata lives alongside the metrics - Telegraf create_templates override makes every new metric table a PARTITION BY RANGE (time) parent registered with partman.create_parent in one transaction (1 day interval, 3 premade) - postgres_telegraf_group_role now also creates pg_partman and pg_cron extensions and schedules hourly partman.run_maintenance_proc - New retention reconcile state updates partman.part_config.retention from postgres.telegraf.retention_days on every apply - so_telegraf_trim cron is now unconditionally absent; script stays on disk as a manual fallback	2026-04-16 17:27:15 -04:00
Mike Reeves	470b3bd4da	Comingle Telegraf metrics into shared schema Per-minion schemas cause table count to explode (N minions * M metrics) and the per-minion revocation story isn't worth it when retention is short. Move all minions to a shared 'telegraf' schema while keeping per-minion login credentials for audit. - New so_telegraf NOLOGIN group role owns the telegraf schema; each per-minion role is a member and inherits insert/select via role inheritance - Telegraf connection string uses options='-c role=so_telegraf' so tables auto-created on first write belong to the group role - so-telegraf-trim walks the flat telegraf.* table set instead of per-minion schemas - so-stats-show filters by host tag; CLI arg is now the hostname as tagged by Telegraf rather than a sanitized schema suffix - Also renames so-show-stats -> so-stats-show	2026-04-16 15:40:54 -04:00
Mike Reeves	cefbe01333	Add telegraf_output selector for InfluxDB/Postgres dual-write Introduces global.telegraf_output (INFLUXDB\|POSTGRES\|BOTH, default BOTH) so Telegraf can write metrics to Postgres alongside or instead of InfluxDB. Each minion authenticates with its own so_telegraf_<minion> role and writes to a matching schema inside a shared so_telegraf database, keeping blast radius per-credential to that minion's data. - Per-minion credentials auto-generated and persisted in postgres/auth.sls - postgres/telegraf_users.sls reconciles roles/schemas on every apply - Firewall opens 5432 only to minion hostgroups when Postgres output is active - Reactor on salt/auth + orch/telegraf_postgres_sync.sls provision new minions automatically on key accept - soup post_to_3.1.0 backfills users for existing minions on upgrade - so-show-stats prints latest CPU/mem/disk/load per minion for sanity checks - so-telegraf-trim + nightly cron prune rows older than postgres.telegraf.retention_days (default 14)	2026-04-15 14:32:10 -04:00

14 Commits