init-users.sh only runs on a fresh data dir, so upgrades onto an
existing /nsm/postgres volume never got so_telegraf. Pinning partman's
schema also makes partman.part_config reliably resolvable.
Every telegraf.* metric table is now a daily time-range partitioned
parent managed by pg_partman. Retention drops old partitions instead
of the row-by-row DELETE that so-telegraf-trim used to run nightly,
and dashboards will benefit from partition pruning at query time.
- Load pg_cron at server start via shared_preload_libraries and point
cron.database_name at so_telegraf so job metadata lives alongside
the metrics
- Telegraf create_templates override makes every new metric table a
PARTITION BY RANGE (time) parent registered with partman.create_parent
in one transaction (1 day interval, 3 premade)
- postgres_telegraf_group_role now also creates pg_partman and pg_cron
extensions and schedules hourly partman.run_maintenance_proc
- New retention reconcile state updates partman.part_config.retention
from postgres.telegraf.retention_days on every apply
- so_telegraf_trim cron is now unconditionally absent; script stays on
disk as a manual fallback
High-cardinality inputs (docker, procstat, kafka) trigger ALTER TABLE
ADD COLUMN on every new field name, and with all minions writing into
a shared 'telegraf' schema the metric tables hit Postgres's 1600-column
per-table ceiling quickly. Setting fields_as_jsonb and tags_as_jsonb on
the postgresql output keeps metric tables fixed at (time, tag_id,
fields jsonb) and tag tables at (tag_id, tags jsonb).
- so-stats-show rewritten to use JSONB accessors
((fields->>'x')::numeric, tags->>'host', etc.) and cast memory/disk
sizes to bigint so pg_size_pretty works
- Drop regex/regexFailureMessage from telegraf_output SOC UI entry to
match the convention upstream used when removing them from
mdengine/pcapengine/pipeline; options: list drives validation
Per-minion schemas cause table count to explode (N minions * M metrics)
and the per-minion revocation story isn't worth it when retention is
short. Move all minions to a shared 'telegraf' schema while keeping
per-minion login credentials for audit.
- New so_telegraf NOLOGIN group role owns the telegraf schema; each
per-minion role is a member and inherits insert/select via role
inheritance
- Telegraf connection string uses options='-c role=so_telegraf' so
tables auto-created on first write belong to the group role
- so-telegraf-trim walks the flat telegraf.* table set instead of
per-minion schemas
- so-stats-show filters by host tag; CLI arg is now the hostname as
tagged by Telegraf rather than a sanitized schema suffix
- Also renames so-show-stats -> so-stats-show
The psql invocation flag '-v ON_ERROR_STOP=1' used by the so-postgres
init script gets flagged by so-log-check because the token 'ERROR'
matches its error regex. Add to the exclusion list.
Telegraf's postgresql output stores tag values either as individual
columns on <metric>_tag or as a single JSONB 'tags' column, depending
on plugin version. Introspect information_schema.columns and build the
right accessor per tag instead of assuming one layout.
Introduces global.telegraf_output (INFLUXDB|POSTGRES|BOTH, default BOTH)
so Telegraf can write metrics to Postgres alongside or instead of
InfluxDB. Each minion authenticates with its own so_telegraf_<minion>
role and writes to a matching schema inside a shared so_telegraf
database, keeping blast radius per-credential to that minion's data.
- Per-minion credentials auto-generated and persisted in postgres/auth.sls
- postgres/telegraf_users.sls reconciles roles/schemas on every apply
- Firewall opens 5432 only to minion hostgroups when Postgres output is active
- Reactor on salt/auth + orch/telegraf_postgres_sync.sls provision new
minions automatically on key accept
- soup post_to_3.1.0 backfills users for existing minions on upgrade
- so-show-stats prints latest CPU/mem/disk/load per minion for sanity checks
- so-telegraf-trim + nightly cron prune rows older than
postgres.telegraf.retention_days (default 14)
Postgres module now queries Elasticsearch directly via HTTP
for the chat migration (bypasses RBAC that needs user context).
Pass esHostUrl, esUsername, esPassword alongside postgres creds.
Injects the postgres superuser password from secrets pillar so
SOC can run schema migrations as admin before switching to the
app user for normal operations.
Use format() with %L for SQL literal escaping instead of raw
string interpolation. Also ALTER ROLE if user already exists
to keep password in sync with pillar.
Removed postgres from soc/defaults.yaml (shared by all nodes)
and moved it entirely into defaults.map.jinja, which only injects
the config when postgres auth pillar exists (manager-type nodes).
Sensors and other non-manager nodes will not have a postgres module
section in their sensoroni.json, so sensoroni won't try to connect.
- Create vars/postgres.map.jinja for postgres auth globals
- Add POSTGRES_GLOBALS to all manager-type role vars
(manager, eval, standalone, managersearch, import)
- Add postgres module config to soc/defaults.yaml
- Inject so_postgres credentials from auth pillar into
soc/defaults.map.jinja (conditional on auth pillar existing)