10 Commits

Author SHA1 Message Date
Mike Reeves a433e9524d Move onionconfig writes out of so-yaml 2026-05-12 16:05:55 -04:00
Mike Reeves 3d11694d51 make so-yaml PG-canonical and add pillar-change reactor stack
Two coupled changes that together let so_pillar.* be the canonical
config store, with config edits driving service reloads automatically:

so-yaml PG-canonical mode
- Adds /opt/so/conf/so-yaml/mode (and SO_YAML_BACKEND env override) with
  three values: dual (legacy), postgres (PG-only for managed paths),
  disk (emergency rollback). Bootstrap files (secrets.sls, ca/init.sls,
  *.nodes.sls, top.sls, ...) stay disk-only regardless via the existing
  SkipPath allowlist in so_yaml_postgres.locate.
- loadYaml/writeYaml/purgeFile now route to so_pillar.* in postgres
  mode: replace/add/get all read+write the database with no disk file
  ever appearing. PG failure is fatal in postgres mode (no silent
  fallback); dual mode preserves the prior best-effort mirror.
- so_yaml_postgres gains read_yaml(path), is_pg_managed(path), and
  is_enabled() so so-yaml can answer "is this path PG-managed and is
  PG up" without reaching into private helpers.
- schema_pillar.sls writes /opt/so/conf/so-yaml/mode = postgres after
  the importer succeeds, so flipping postgres:so_pillar:enabled flips
  so-yaml's behavior in lockstep with the schema being live.

pg_notify-driven change fan-out
- 008_change_notify.sql adds so_pillar.change_queue + an AFTER trigger
  on pillar_entry that enqueues the locator and pg_notifies
  'so_pillar_change'. Queue is drained at-least-once so engine restarts
  don't lose events; pg_notify is just the wakeup signal.
- New salt-master engine pg_notify_pillar.py LISTENs on the channel,
  drains the queue with FOR UPDATE SKIP LOCKED, debounces bursts, and
  fires 'so/pillar/changed' events grouped by (scope, role, minion).
- Reactor so_pillar_changed.sls catches the tag and dispatches to
  orch.so_pillar_reload, which carries a DISPATCH map of pillar-path
  prefix -> (state sls, role grain set) so adding a new service to
  the auto-reload list is a one-line edit instead of a new reactor.
- Engine + reactor wiring is gated on the same postgres:so_pillar:enabled
  flag as the schema and ext_pillar config so the whole stack flips
  on/off together.

Tests: 21 new cases (112 total, all passing) covering mode resolution,
PG-managed detection, and PG-canonical read/write/purge routing with
the PG client stubbed.
2026-05-01 09:31:48 -04:00
Mike Reeves 3fad895d6a add so_pillar schema + ext_pillar wiring (postsalt foundation)
Lays the database-backed pillar foundation for the postsalt branch. Salt
continues to read on-disk SLS first; the new ext_pillar config overlays
values from the so_pillar.* schema in so-postgres.

- salt/postgres/files/schema/pillar/00{1..7}_*.sql: idempotent DDL for
  scope/role/role_member/minion/pillar_entry/pillar_entry_history/
  drift_log, secret pgcrypto helpers, RLS, pg_cron retention.
- salt/postgres/schema_pillar.sls: applies the SQL files inside the
  so-postgres container after it's healthy, configures the master_key
  GUC, and runs so-pillar-import once. Gated on
  postgres:so_pillar:enabled feature flag (default false).
- salt/salt/master/ext_pillar_postgres.{sls,conf.jinja}: drops
  /etc/salt/master.d/ext_pillar_postgres.conf with list-form ext_pillar
  queries (global/role/minion/secrets) and ext_pillar_first: False so
  bootstrap pillars on disk render before the PG overlay.
- salt/postgres/init.sls + salt/salt/master.sls: include the new states.

Both new state branches are guarded so a default install with the flag
off is a no-op.
2026-04-30 16:30:57 -04:00
Mike Reeves b69e50542a Use TELEGRAFMERGED for telegraf.output and de-jinja pg_hba.conf
- firewall/map.jinja and postgres/telegraf_users.sls now pull the
  telegraf output selector through TELEGRAFMERGED so the defaults.yaml
  value (BOTH) is the source of truth and pillar overrides merge in
  cleanly. pillar.get with a hardcoded fallback was brittle and would
  disagree with defaults.yaml if the two ever diverged.
- Rename salt/postgres/files/pg_hba.conf.jinja to pg_hba.conf and drop
  template: jinja from config.sls — the file has no jinja besides the
  comment header.
2026-04-20 16:06:01 -04:00
Mike Reeves 8225d41661 Harden postgres secrets, TLS enforcement, and admin tooling
- Deliver postgres super and app passwords via mounted 0600 secret files
  (POSTGRES_PASSWORD_FILE, SO_POSTGRES_PASS_FILE) instead of plaintext env
  vars visible in docker inspect output
- Mount a managed pg_hba.conf that only allows local trust and hostssl
  scram-sha-256 so TCP clients cannot negotiate cleartext sessions
- Restrict postgres.key to 0400 and ensure owner/group 939
- Set umask 0077 on so-postgres-backup output
- Validate host values in so-stats-show against [A-Za-z0-9._-] before SQL
  interpolation so a compromised minion cannot inject SQL via a tag value
- Coerce postgres:telegraf:retention_days to int before rendering into SQL
- Escape single quotes when rendering pillar values into postgresql.conf
- Own postgres tooling in /usr/sbin as root:root so a container escape
  cannot rewrite admin scripts
- Gate ES migration TLS verification on esVerifyCert (default false,
  matching the elastic module's existing pattern)
2026-04-20 12:36:17 -04:00
Mike Reeves 3f46caaf02 Revoke PUBLIC CONNECT on securityonion database
Per-minion telegraf roles inherit CONNECT via PUBLIC by default and
could open sessions to the SOC database (though they have no readable
grants inside). Close the soft edge by revoking PUBLIC's CONNECT and
re-granting it to so_postgres only.
2026-04-17 19:10:07 -04:00
Mike Reeves b3fbd5c7a4 Use Go-template placeholders and shell-guarded CREATE DATABASE
- Telegraf's outputs.postgresql plugin uses Go text/template syntax,
  not uppercase tokens. The {TABLE}/{COLUMNS}/{TABLELITERAL} strings
  were passed through to Postgres literally, producing syntax errors
  on every metric's first write. Switch to {{ .table }}, {{ .columns }},
  and {{ .table|quoteLiteral }} so partitioned parents and the partman
  create_parent() call succeed.
- Replace the \gexec "CREATE DATABASE ... WHERE NOT EXISTS" idiom in
  both init-users.sh and telegraf_users.sls with an explicit shell
  conditional. The prior idiom occasionally fired CREATE DATABASE even
  when so_telegraf already existed, producing duplicate-key failures.
2026-04-17 14:55:13 -04:00
Mike Reeves cefbe01333 Add telegraf_output selector for InfluxDB/Postgres dual-write
Introduces global.telegraf_output (INFLUXDB|POSTGRES|BOTH, default BOTH)
so Telegraf can write metrics to Postgres alongside or instead of
InfluxDB. Each minion authenticates with its own so_telegraf_<minion>
role and writes to a matching schema inside a shared so_telegraf
database, keeping blast radius per-credential to that minion's data.

- Per-minion credentials auto-generated and persisted in postgres/auth.sls
- postgres/telegraf_users.sls reconciles roles/schemas on every apply
- Firewall opens 5432 only to minion hostgroups when Postgres output is active
- Reactor on salt/auth + orch/telegraf_postgres_sync.sls provision new
  minions automatically on key accept
- soup post_to_3.1.0 backfills users for existing minions on upgrade
- so-show-stats prints latest CPU/mem/disk/load per minion for sanity checks
- so-telegraf-trim + nightly cron prune rows older than
  postgres.telegraf.retention_days (default 14)
2026-04-15 14:32:10 -04:00
Mike Reeves da1045e052 Fix init-users.sh password escaping for special characters
Use format() with %L for SQL literal escaping instead of raw
string interpolation. Also ALTER ROLE if user already exists
to keep password in sync with pillar.
2026-04-09 21:52:20 -04:00
Mike Reeves 868cd11874 Add so-postgres Salt states and integration wiring
Phase 1 of the PostgreSQL central data platform:
- Salt states: init, enabled, disabled, config, ssl, auth, sostatus
- TLS via SO CA-signed certs with postgresql.conf template
- Two-tier auth: postgres superuser + so_postgres application user
- Firewall restricts port 5432 to manager-only (HA-ready)
- Wired into top.sls, pillar/top.sls, allowed_states, firewall
  containers map, docker defaults, CA signing policies, and setup
  scripts for all manager-type roles
2026-04-08 10:58:52 -04:00