securityonion

mirror of https://github.com/Security-Onion-Solutions/securityonion.git synced 2026-06-22 10:18:09 +02:00

Author	SHA1	Message	Date
Mike Reeves	d5dc28e526	Fan postgres telegraf cred for manager on every auth run The empty-pillar case produced a telegraf.conf with `user= password=` which libpq misparses ("password=" gets consumed as the user value), yielding `password authentication failed for user "password="` on every manager without a prior fan-out (fresh install, not the salt-key path the reactor handles). Two fixes: - salt/postgres/auth.sls: always fan for grains.id in addition to any postgres_fanout_minion from the reactor, so the manager's own pillar is populated on every postgres.auth run. The existing `unless` guard keeps re-runs idempotent. - salt/telegraf/etc/telegraf.conf: gate the [[outputs.postgresql]] block on PG_USER and PG_PASS being non-empty. If a minion hasn't received its pillar yet the output block simply isn't rendered — the next highstate picks up the creds once the fan-out completes, and in the meantime telegraf keeps running the other outputs instead of erroring with a malformed connection string.	2026-04-21 14:40:19 -04:00
Mike Reeves	05f6503d61	Gate postgres telegraf fan-out on reactor-provided minion id postgres.auth was running an `unless` shell check per up-minion on every manager highstate, even when nothing had changed — N fork+python starts of so-yaml.py add up on large grids. The work is only needed when a specific minion's key is accepted. - salt/postgres/auth.sls: fan out only when postgres_fanout_minion pillar is set (targets that single minion). Manager highstates with no pillar take a zero-N code path. - salt/reactor/telegraf_user_sync.sls: re-pass the accepted minion id as postgres_fanout_minion to the orch. - salt/orch/telegraf_postgres_sync.sls: forward the pillar to the salt.state invocation so the state render sees it. - salt/manager/tools/sbin/soup: for the one-time 3.1.0 backfill, drop the per-minion state.apply and do an in-shell loop over the minion pillar files using so-yaml.py directly. Skips minions that already have postgres.telegraf.user set.	2026-04-21 10:05:08 -04:00
Mike Reeves	a149ea7e8f	Skip per-minion pillar fan-out when cred is already in place Every postgres.auth run was rewriting every minion pillar file via two so-yaml.py replace calls, even when nothing had changed. Passwords are only generated on first encounter (see the `if key not in telegraf_users` guard) and never rotate, so re-writing the same values on every apply is wasted work and noisy state output. Add an `unless:` check that compares the already-written postgres.telegraf.user to the one we'd set. If they match, skip the fan-out entirely. On first apply for a new minion the key isn't there, so the replace runs; on subsequent applies it's a no-op.	2026-04-21 09:59:46 -04:00
Mike Reeves	bb71e44614	Write per-minion telegraf creds to each minion's own pillar file pillar/top.sls only distributes postgres.auth to manager-class roles, so sensors / heavynodes / searchnodes / receivers / fleet / idh / hypervisor / desktop minions never received the postgres telegraf password they need to write metrics. Broadcasting the aggregate postgres.auth pillar to every role would leak the so_postgres admin password and every other minion's cred. Fan out per-minion credentials into each minion's own pillar file at /opt/so/saltstack/local/pillar/minions/<id>.sls. That file is already distributed by pillar/top.sls exclusively to the matching minion via `- minions.{{ grains.id }}`, so each minion sees only its own postgres.telegraf.{user,pass} and nothing else. - salt/postgres/auth.sls: after writing the manager-scoped aggregate pillar, fan the per-minion creds out via so-yaml.py replace for every up-minion. Creates the minion pillar file if missing. Requires postgres_auth_pillar so the manager pillar lands first. - salt/telegraf/etc/telegraf.conf: consume postgres:telegraf:user and postgres:telegraf:pass directly from the minion's own pillar instead of walking postgres:auth:users which isn't visible off the manager.	2026-04-21 09:57:35 -04:00
Mike Reeves	84197fb33b	Move postgres backup script and cron to the postgres states The so-postgres-backup script and its cron were living under salt/backup/config_backup.sls, which meant the backup script and cron were deployed independently of whether postgres was enabled/disabled. - Relocate salt/backup/tools/sbin/so-postgres-backup to salt/postgres/tools/sbin/so-postgres-backup so the existing postgres_sbin file.recurse in postgres/config.sls picks it up with everything else — no separate file.managed needed. - Remove postgres_backup_script and so_postgres_backup from salt/backup/config_backup.sls. - Add cron.present for so_postgres_backup to salt/postgres/enabled.sls and the matching cron.absent to salt/postgres/disabled.sls so the cron follows the container's lifecycle.	2026-04-21 09:42:41 -04:00
Mike Reeves	89a6e7c0dd	Tidy config.sls makedirs and postgres helpLinks - config.sls: postgresconfdir creates /opt/so/conf/postgres, so the two subdirectories under it (postgressecretsdir, postgresinitdir) don't need their own makedirs — require the parent instead. - soc_postgres.yaml: helpLink for every annotated key now points to 'postgres' instead of the carried-over 'influxdb' slug.	2026-04-21 09:39:58 -04:00
Mike Reeves	a902f667ba	Target manager by role grain in telegraf_postgres_sync orch The previous MANAGER resolution used pillar.get('setup:manager') with a fallback to grains.get('master'). Neither works from the reactor: setup:manager is only populated by the setup workflow (not by reactor runs), and grains.master returns the minion's master-hostname setting, not a targetable minion id. Match the pattern used by orch/delete_hypervisor.sls: compound-target whichever minion is the manager via role grain.	2026-04-21 09:37:35 -04:00
Mike Reeves	f72c30abd0	Have postgres.telegraf_users include postgres.enabled postgres_wait_ready requires docker_container: so-postgres, which is declared in postgres.enabled. Running postgres.telegraf_users on its own — as the reactor orch and the soup post-upgrade step both do — errored because Salt couldn't resolve the require. Include postgres.enabled from postgres.telegraf_users so the container state is always in the render. postgres.enabled already includes telegraf_users; Salt de-duplicates the circular include and the included states are all idempotent, so repeated application is a no-op.	2026-04-21 09:35:59 -04:00
Mike Reeves	37e9257698	Change so-postgres final_octet to 47	2026-04-21 09:33:47 -04:00
Mike Reeves	72105f1f2f	Drop telegraf push from new-minion orch; highstate covers it New minions run highstate as part of onboarding, which already applies the telegraf state with the fresh pillar entry we just wrote. Pushing telegraf a second time from the reactor is redundant. - Remove the MINION-scoped salt.state block from the orch; keep only the manager-side postgres.auth + postgres.telegraf_users provisioning. - Stop passing minion_id as pillar in the reactor; the orch doesn't reference it anymore.	2026-04-21 09:31:45 -04:00
Mike Reeves	ee89b78751	Fire telegraf user sync on salt/key accept, not salt/auth salt/auth fires on every minion authentication — including every minion restart and every master restart — so the reactor was re-running the postgres.auth + postgres.telegraf_users + telegraf orchestration for every already-accepted minion on every reconnect. The underlying states are idempotent, so this was wasted work and log noise, not a correctness issue. Switch the subscription to salt/key, which fires only when the master actually changes a key's state (accept / reject / delete). Match the pattern used by salt/reactor/check_hypervisor.sls (registered in salt/salt/cloud/reactor_config_hypervisor.sls) and add the result==True guard so half-failed key operations don't trigger the orchestration.	2026-04-20 19:54:06 -04:00
Mike Reeves	80bf07ffd8	Flesh out soc_postgres.yaml annotations Add Configuration-UI annotations for every postgres pillar key defined in defaults.yaml, not just telegraf.retention_days: - postgres.enabled — readonly; admin-visible but toggled via state - postgres.telegraf.retention_days — drop advanced so user-tunable knobs surface in the default view - postgres.config.max_connections, shared_buffers, log_min_messages — user-tunable performance/verbosity knobs, not advanced - postgres.config.listen_addresses, port, ssl, ssl_cert_file, ssl_key_file, ssl_ca_file, hba_file, log_destination, logging_collector, shared_preload_libraries, cron.database_name — infra/Salt-managed, marked advanced so they're visible but out of the way No defaults.yaml change; value-side stays the same.	2026-04-20 16:36:37 -04:00
Mike Reeves	b69e50542a	Use TELEGRAFMERGED for telegraf.output and de-jinja pg_hba.conf - firewall/map.jinja and postgres/telegraf_users.sls now pull the telegraf output selector through TELEGRAFMERGED so the defaults.yaml value (BOTH) is the source of truth and pillar overrides merge in cleanly. pillar.get with a hardcoded fallback was brittle and would disagree with defaults.yaml if the two ever diverged. - Rename salt/postgres/files/pg_hba.conf.jinja to pg_hba.conf and drop template: jinja from config.sls — the file has no jinja besides the comment header.	2026-04-20 16:06:01 -04:00
Mike Reeves	3ecd19d085	Move telegraf_output from global pillar to telegraf pillar The Telegraf backend selector lived at global.telegraf_output but it is a Telegraf-scoped setting, not a cross-cutting grid global. Move both the value and the UI annotation under the telegraf pillar so it shows up alongside the other Telegraf tuning knobs in the Configuration UI. - salt/telegraf/defaults.yaml: add telegraf.output: BOTH - salt/telegraf/soc_telegraf.yaml: add telegraf.output annotation - salt/global/defaults.yaml: remove global.telegraf_output - salt/global/soc_global.yaml: remove global.telegraf_output annotation - salt/vars/globals.map.jinja: drop telegraf_output from GLOBALS - salt/firewall/map.jinja: read via pillar.get('telegraf:output') - salt/postgres/telegraf_users.sls: read via pillar.get('telegraf:output') - salt/telegraf/etc/telegraf.conf: read via TELEGRAFMERGED.output - salt/postgres/tools/sbin/so-stats-show: update user-facing docs No behavioral change — default stays BOTH.	2026-04-20 16:03:02 -04:00
Mike Reeves	b6a3d1889c	Fix soup state.apply args for postgres provisioning state.apply takes a single mods argument; space-separated names are not a list, so `state.apply postgres.auth postgres.telegraf_users` was only applying postgres.auth and silently dropping the telegraf_users state. Use comma-separated mods and add queue=True to match the rest of soup.	2026-04-20 14:40:32 -04:00
Mike Reeves	1cb34b089c	Restore 3/dev soup and add postgres users to post_to_3.1.0 feature/postgres had rewritten the 3.1.0 upgrade block, dropping the elastic upgrade work 3/dev landed for 9.0.8→9.3.3: elasticsearch_backup_index_templates, the component template state cleanup, and the /usr/sbin/so-kibana-space-defaults post-upgrade call. It also carried an older ES upgrade mapping (8.18.8→9.0.8) that was superseded on 3/dev (9.0.8→9.3.3 for 3.0.0-20260331), and a handful of latent shell-quoting regressions in verify_es_version_compatibility and the intermediate-upgrade helpers. Adopt the 3/dev soup verbatim and only add the new Telegraf Postgres provisioning to post_to_3.1.0 on top of so-kibana-space-defaults.	2026-04-20 14:38:55 -04:00
Mike Reeves	1537ba5031	Merge remote-tracking branch 'origin/3/dev' into feature/postgres	2026-04-20 14:32:05 -04:00
Mike Reeves	8225d41661	Harden postgres secrets, TLS enforcement, and admin tooling - Deliver postgres super and app passwords via mounted 0600 secret files (POSTGRES_PASSWORD_FILE, SO_POSTGRES_PASS_FILE) instead of plaintext env vars visible in docker inspect output - Mount a managed pg_hba.conf that only allows local trust and hostssl scram-sha-256 so TCP clients cannot negotiate cleartext sessions - Restrict postgres.key to 0400 and ensure owner/group 939 - Set umask 0077 on so-postgres-backup output - Validate host values in so-stats-show against [A-Za-z0-9._-] before SQL interpolation so a compromised minion cannot inject SQL via a tag value - Coerce postgres:telegraf:retention_days to int before rendering into SQL - Escape single quotes when rendering pillar values into postgresql.conf - Own postgres tooling in /usr/sbin as root:root so a container escape cannot rewrite admin scripts - Gate ES migration TLS verification on esVerifyCert (default false, matching the elastic module's existing pattern)	2026-04-20 12:36:17 -04:00
Mike Reeves	3f46caaf02	Revoke PUBLIC CONNECT on securityonion database Per-minion telegraf roles inherit CONNECT via PUBLIC by default and could open sessions to the SOC database (though they have no readable grants inside). Close the soft edge by revoking PUBLIC's CONNECT and re-granting it to so_postgres only.	2026-04-17 19:10:07 -04:00
Mike Reeves	f3181b204a	Remove so-telegraf-trim and update retention description pg_partman drops old partitions hourly; row-DELETE retention is obsolete and a confusing emergency fallback on partitioned tables.	2026-04-17 19:06:16 -04:00
Mike Reeves	dd39db4584	Drop so_telegraf_trim cron.absent tombstone feature/postgres never shipped the original cron.present, so this cleanup state is a no-op on every fresh install. The script itself stays on disk for emergency use.	2026-04-17 18:59:39 -04:00
Mike Reeves	759880a800	Wait for TCP-ready postgres, not the init-phase Unix socket docker-entrypoint.sh runs the init-scripts phase with listen_addresses='' (Unix socket only). The old pg_isready check passed there and then raced the docker_temp_server_stop shutdown before the final postgres started. pg_isready -h 127.0.0.1 only returns success once the real CMD binds TCP, so downstream psql execs never land during the shutdown window.	2026-04-17 16:43:41 -04:00
Mike Reeves	31383bd9d0	Make Telegraf Postgres templates idempotent Use CREATE TABLE IF NOT EXISTS and a WHERE-guarded create_parent() so a Telegraf restart can re-run the templates safely after manual DB surgery. Add an explicit tag_table_create_templates mirroring the plugin default with IF NOT EXISTS for the same reason.	2026-04-17 15:43:50 -04:00
reyesj2	ebb93b4fa7	add wait_for_so-elasticsearch state and split elasticsearch cluster configuration out of enabled.sls	2026-04-17 14:43:07 -05:00
Mike Reeves	21076af01e	Grant so_telegraf CREATE on partman schema pg_partman 5.x's create_partition() creates a per-parent template table inside the partman schema at runtime, which requires CREATE on that schema. Also extend ALTER DEFAULT PRIVILEGES so the runtime- created template tables are accessible to so_telegraf.	2026-04-17 15:34:19 -04:00
Mike Reeves	f11e9da83a	Mark time column NOT NULL before partman.create_parent pg_partman 5.x requires the control column to be NOT NULL; Telegraf's generated columns are nullable by default.	2026-04-17 15:27:06 -04:00
Mike Reeves	0fddcd8fe7	Pass unquoted schema.name to partman.create_parent pg_partman 5.x splits p_parent_table on '.' and looks up the parts as raw identifiers, so the literal must be 'schema.name' rather than the double-quoted form quoteLiteral emits for .table.	2026-04-17 15:22:57 -04:00
Mike Reeves	927eba566c	Grant so_telegraf access to partman schema Telegraf calls partman.create_parent() on first write of each metric, which needs USAGE on the partman schema, EXECUTE on its functions and procedures, and DML on partman.part_config.	2026-04-17 15:13:08 -04:00
Mike Reeves	af9330a9dd	Escape Go-template placeholders from Jinja in telegraf.conf	2026-04-17 15:04:37 -04:00
Mike Reeves	b3fbd5c7a4	Use Go-template placeholders and shell-guarded CREATE DATABASE - Telegraf's outputs.postgresql plugin uses Go text/template syntax, not uppercase tokens. The {TABLE}/{COLUMNS}/{TABLELITERAL} strings were passed through to Postgres literally, producing syntax errors on every metric's first write. Switch to {{ .table }}, {{ .columns }}, and {{ .table\|quoteLiteral }} so partitioned parents and the partman create_parent() call succeed. - Replace the \gexec "CREATE DATABASE ... WHERE NOT EXISTS" idiom in both init-users.sh and telegraf_users.sls with an explicit shell conditional. The prior idiom occasionally fired CREATE DATABASE even when so_telegraf already existed, producing duplicate-key failures.	2026-04-17 14:55:13 -04:00
Mike Reeves	5228668be0	Fix Telegraf→Postgres table creation and state.apply race - Telegraf's partman template passed p_type:='native', which pg_partman 5.x (the version shipped by postgresql-17-partman on Debian) rejects. Switched to 'range' so partman.create_parent() actually creates partitions and Telegraf's INSERTs succeed. - Added a postgres_wait_ready gate in telegraf_users.sls so psql execs don't race the init-time restart that docker-entrypoint.sh performs. - so-verify now ignores the literal "-v ON_ERROR_STOP=1" token in the setup log. Dropped the matching entry from so-log-check, which scans container stdout where that token never appears.	2026-04-17 13:00:12 -04:00
Mike Reeves	7d07f3c8fe	Create so_telegraf DB from Salt and pin pg_partman schema init-users.sh only runs on a fresh data dir, so upgrades onto an existing /nsm/postgres volume never got so_telegraf. Pinning partman's schema also makes partman.part_config reliably resolvable.	2026-04-17 10:51:08 -04:00
Mike Reeves	d9a9029ce5	Adopt pg_partman + pg_cron for Telegraf metric tables Every telegraf.* metric table is now a daily time-range partitioned parent managed by pg_partman. Retention drops old partitions instead of the row-by-row DELETE that so-telegraf-trim used to run nightly, and dashboards will benefit from partition pruning at query time. - Load pg_cron at server start via shared_preload_libraries and point cron.database_name at so_telegraf so job metadata lives alongside the metrics - Telegraf create_templates override makes every new metric table a PARTITION BY RANGE (time) parent registered with partman.create_parent in one transaction (1 day interval, 3 premade) - postgres_telegraf_group_role now also creates pg_partman and pg_cron extensions and schedules hourly partman.run_maintenance_proc - New retention reconcile state updates partman.part_config.retention from postgres.telegraf.retention_days on every apply - so_telegraf_trim cron is now unconditionally absent; script stays on disk as a manual fallback	2026-04-16 17:27:15 -04:00
Mike Reeves	9fe53d9ccc	Use JSONB for Telegraf fields/tags to avoid 1600-column limit High-cardinality inputs (docker, procstat, kafka) trigger ALTER TABLE ADD COLUMN on every new field name, and with all minions writing into a shared 'telegraf' schema the metric tables hit Postgres's 1600-column per-table ceiling quickly. Setting fields_as_jsonb and tags_as_jsonb on the postgresql output keeps metric tables fixed at (time, tag_id, fields jsonb) and tag tables at (tag_id, tags jsonb). - so-stats-show rewritten to use JSONB accessors ((fields->>'x')::numeric, tags->>'host', etc.) and cast memory/disk sizes to bigint so pg_size_pretty works - Drop regex/regexFailureMessage from telegraf_output SOC UI entry to match the convention upstream used when removing them from mdengine/pcapengine/pipeline; options: list drives validation	2026-04-16 17:02:21 -04:00
Mike Reeves	f7b80f5931	Merge branch '3/dev' into feature/postgres	2026-04-16 16:37:02 -04:00
Mike Reeves	f11d315fea	Fix soup	2026-04-16 16:35:24 -04:00
Mike Reeves	2013bf9e30	Fix soup	2026-04-16 16:20:25 -04:00
Mike Reeves	a2ffb92b8d	Fix soup	2026-04-16 16:19:53 -04:00
Jorge Reyes	8b6d11b118	Merge pull request #15780 from Security-Onion-Solutions/reyesj2-es932 supress noisy warning from ES 9.3.3	2026-04-16 14:42:46 -05:00
reyesj2	ba00ae8a7b	supress noisy warning from ES 9.3.3	2026-04-16 14:41:25 -05:00
Mike Reeves	470b3bd4da	Comingle Telegraf metrics into shared schema Per-minion schemas cause table count to explode (N minions * M metrics) and the per-minion revocation story isn't worth it when retention is short. Move all minions to a shared 'telegraf' schema while keeping per-minion login credentials for audit. - New so_telegraf NOLOGIN group role owns the telegraf schema; each per-minion role is a member and inherits insert/select via role inheritance - Telegraf connection string uses options='-c role=so_telegraf' so tables auto-created on first write belong to the group role - so-telegraf-trim walks the flat telegraf.* table set instead of per-minion schemas - so-stats-show filters by host tag; CLI arg is now the hostname as tagged by Telegraf rather than a sanitized schema suffix - Also renames so-show-stats -> so-stats-show	2026-04-16 15:40:54 -04:00
Mike Reeves	c124186989	so-log-check: exclude psql ON_ERROR_STOP flag The psql invocation flag '-v ON_ERROR_STOP=1' used by the so-postgres init script gets flagged by so-log-check because the token 'ERROR' matches its error regex. Add to the exclusion list.	2026-04-15 19:45:42 -04:00
Mike Reeves	d24808ff98	Fix so-show-stats tag column resolution Telegraf's postgresql output stores tag values either as individual columns on <metric>_tag or as a single JSONB 'tags' column, depending on plugin version. Introspect information_schema.columns and build the right accessor per tag instead of assuming one layout.	2026-04-15 19:28:10 -04:00
Jorge Reyes	7d22f7bd58	Merge pull request #15776 from Security-Onion-Solutions/foxtrot ES 9.3.3	2026-04-15 16:29:34 -05:00
Mike Reeves	cefbe01333	Add telegraf_output selector for InfluxDB/Postgres dual-write Introduces global.telegraf_output (INFLUXDB\|POSTGRES\|BOTH, default BOTH) so Telegraf can write metrics to Postgres alongside or instead of InfluxDB. Each minion authenticates with its own so_telegraf_<minion> role and writes to a matching schema inside a shared so_telegraf database, keeping blast radius per-credential to that minion's data. - Per-minion credentials auto-generated and persisted in postgres/auth.sls - postgres/telegraf_users.sls reconciles roles/schemas on every apply - Firewall opens 5432 only to minion hostgroups when Postgres output is active - Reactor on salt/auth + orch/telegraf_postgres_sync.sls provision new minions automatically on key accept - soup post_to_3.1.0 backfills users for existing minions on upgrade - so-show-stats prints latest CPU/mem/disk/load per minion for sanity checks - so-telegraf-trim + nightly cron prune rows older than postgres.telegraf.retention_days (default 14)	2026-04-15 14:32:10 -04:00
Jorge Reyes	76a6997de2	Merge pull request #15775 from Security-Onion-Solutions/reyesj2-es932 check for addon-index templates dir before attempting to load addon i…	2026-04-14 19:27:02 -05:00
reyesj2	16a4a42faf	check for addon-index templates dir before attempting to load addon index templates	2026-04-14 19:26:37 -05:00
Jorge Reyes	0e4623c728	Merge pull request #15772 from Security-Onion-Solutions/reyesj2-es932 soup to 3.1.0	2026-04-14 15:04:46 -05:00
reyesj2	d598e20fbb	soup 3.1.0	2026-04-14 14:55:33 -05:00
Jorge Reyes	cf414423b1	Merge pull request #15770 from Security-Onion-Solutions/reyesj2-es932 enable elastic agent patch release for 9.3.3	2026-04-13 16:28:20 -05:00

1 2 3 4 5 ...

11678 Commits