Clean up postgres telegraf cred on so-minion delete

Paired with the add path in add_telegraf_to_minion: when a minion is
removed, drop its entry from the aggregate postgres pillar and drop the
matching so_telegraf_<safe> role from the database. Without this, stale
entries and DB roles accumulate over time.

Makes rotate-password and compromise-recovery both a clean delete+add:

  so-minion -o=delete -m=<id>
  so-minion -o=add    -m=<id>

The first call drops the role and clears the aggregate pillar; the
second generates a brand-new password.

The cleanup is best-effort — if so-postgres isn't running or the DROP
ROLE fails (e.g., the role owns unexpected objects), we log a warning
and continue so the minion delete itself never gets blocked by postgres
state. Admins can mop up stray roles manually if that happens.
This commit is contained in:
Mike Reeves
2026-04-21 15:43:01 -04:00
parent 5f28e9b191
commit dbf4fb66a4
+39 -1
View File
@@ -273,7 +273,7 @@ function deleteMinionFiles () {
log "ERROR" "Failed to delete $PILLARFILE"
return 1
fi
rm -f $ADVPILLARFILE
if [ $? -ne 0 ]; then
log "ERROR" "Failed to delete $ADVPILLARFILE"
@@ -281,6 +281,43 @@ function deleteMinionFiles () {
fi
}
# Remove this minion's postgres Telegraf credential from both the aggregate
# pillar and the postgres database. Paired with add_telegraf_to_minion:
# add/delete cycle both here and in the DB. Always returns 0 so a dead or
# unreachable so-postgres doesn't block minion deletion — in that case we
# log a warning and leave the role behind for manual cleanup.
function remove_postgres_telegraf_from_minion() {
local MINION_SAFE
MINION_SAFE=$(echo "$MINION_ID" | tr '.-' '__' | tr '[:upper:]' '[:lower:]')
local PG_USER="so_telegraf_${MINION_SAFE}"
local AGGREGATE=/opt/so/saltstack/local/pillar/postgres/auth.sls
log "INFO" "Removing postgres telegraf cred for $MINION_ID"
if [[ -f "$AGGREGATE" ]]; then
so-yaml.py remove "$AGGREGATE" "postgres.auth.users.telegraf_${MINION_SAFE}" >/dev/null 2>&1 || true
fi
if docker ps --format '{{.Names}}' 2>/dev/null | grep -q '^so-postgres$'; then
if ! docker exec -i so-postgres psql -v ON_ERROR_STOP=1 -U postgres -d so_telegraf >/dev/null 2>&1 <<EOSQL
DO \$\$
BEGIN
IF EXISTS (SELECT FROM pg_catalog.pg_roles WHERE rolname = '$PG_USER') THEN
EXECUTE format('REASSIGN OWNED BY %I TO so_telegraf', '$PG_USER');
EXECUTE format('DROP OWNED BY %I', '$PG_USER');
EXECUTE format('DROP ROLE %I', '$PG_USER');
END IF;
END
\$\$;
EOSQL
then
log "WARN" "Failed to drop postgres role $PG_USER; aggregate pillar entry was removed — drop manually if the role persists"
fi
else
log "WARN" "so-postgres container is not running; skipping DB role cleanup for $PG_USER"
fi
}
# Create the minion file
function ensure_socore_ownership() {
log "INFO" "Setting socore ownership on minion files"
@@ -1100,6 +1137,7 @@ case "$OPERATION" in
"delete")
log "INFO" "Removing minion $MINION_ID"
remove_postgres_telegraf_from_minion
deleteMinionFiles || {
log "ERROR" "Failed to delete minion files for $MINION_ID"
exit 1