so-repo-sync only populates /nsm/kernelrepo after the highstate, so on a
manager the file:///nsm/kernelrepo repo could be assigned before any
repodata exists, failing every dnf op. Run createrepo on the dir when
repodata/repomd.xml is missing, leaving a synced repo untouched.
During soup, so-repo-sync runs before the highstate deploys the new
repodownload.conf. On the first upgrade to a kernel-aware version the
on-disk config lacks the [securityonionkernel] section, so dnf aborts
with "Unknown repo: 'securityonionkernel'" (set -e kills soup). Guard
the kernel reposync on the section being present; the next sync after
the highstate deploys it picks it up.
Mirror the kernel repo to full parity with the main package repo so the
grid can pull the Oracle UEK8 kernel:
- setup/so-functions: securityonion_repo() emits a [securityonionkernel]
section in every branch (mirrorlist on non-airgap, https://$MSRV/kernelrepo
for airgap/minion, file:///nsm/kernelrepo/ for manager); repo_sync_local()
and create_repo() sync and build /nsm/kernelrepo.
- manager/init.sls: create /nsm/kernelrepo and deploy mirror-kernel.txt.
- nginx/enabled.sls: serve /nsm/kernelrepo at https://<repo_host>/kernelrepo.
- repo/client/oracle.sls: add so_kernel_repo, gated by
onlyif test -e /opt/so/state/nic_names_pinned so the kernel repo is only
assigned once NICs are pinned by MAC.
- update_packages(): run so-nic-pin before the dnf update that pulls the
kernel, freezing interface names and dropping the pin marker so the kernel
isn't downgraded then re-upgraded on the first highstate.
A complete mine is not enough: elasticsearch:nodes, redis:nodes,
logstash:nodes (tgt_type=pillar) and hypervisor:nodes (tgt_type=compound)
resolve their target against the master's per-minion data cache
(grains+pillar in data.p), which is populated only when a minion's pillar
is recompiled -- separately from the mine. After a reboot a node can be in
the mine (so node_data/glob sees it) yet absent from that cache, so it
fails the elasticsearch:enabled:true pillar match and is dropped from
elasticsearch:nodes -> so-elasticsearch ExtraHosts -> container recreate.
After the mine-completeness wait, run salt '*' saltutil.refresh_pillar
wait=True to synchronously cache every up node's pillar (the same lever
deploy_newnode.sls uses), then verify with salt-run cache.pillar and retry
stragglers, bounded by MINE_UPDATE_MAX_WAIT. Also log elasticsearch:nodes
alongside node_data for inspection.
Mine-backed pillars (node_data, elasticsearch:nodes, redis:nodes,
logstash:nodes, hypervisor:nodes) include a node only if it returned an
IP from the mine, and the configs they build are rebuilt fresh every
highstate. After a manager reboot with a flushed mine, the first boot
highstate could run before an up node re-reported network.ip_addrs,
dropping it from e.g. so-elasticsearch ExtraHosts and forcing a
container recreate.
After the initial broad mine.update, poll until every currently-up
minion actually has network.ip_addrs in the mine, re-pushing mine.update
to stragglers, before releasing the boot highstate. Shares the existing
MINE_UPDATE_MAX_WAIT backstop so a slow/down node never blocks boot, and
still logs the rendered node_data for inspection.
Dump the actual rendered node_data pillar (pretty-printed JSON) to the
journal instead of just a rendered/empty verdict, so the boot-time render
attempt is fully inspectable. Empty renders print false/null and still
emit the WARNING.
After the boot-time mine.update, have the manager actually render the
node_data pillar and log whether it came back populated. node_data: False
makes salt/top.sls apply the bootstrap recovery branch instead of the
manager's real config, so surfacing this in the journal makes the
condition visible before so-boot-highstate runs. Best-effort and
non-blocking: always exits 0 so highstate proceeds regardless.
so-boot-mine-update.service is a manager-only Type=oneshot unit that runs
once per boot after salt-master/salt-minion start and before
so-boot-highstate.service. It pushes mine.update to all reachable minions
so mine-backed pillars (node IPs, ES/Redis/Logstash discovery) are fresh
before the boot highstate renders them.
The helper waits for the responsive minion set to settle (plateau) rather
than for every accepted key to report up, so an intentionally powered-off
minion doesn't block the update; MAX_WAIT remains as a backstop.
The setup-complete marker is a runtime-state file, not config, so move it
to /opt/so/state/setup-complete. Updates both writers (mark_setup_complete
in setup/so-functions and the upgrade-path state in minion/init.sls) and the
three readers (so-boot-highstate.service ConditionPathExists, boot_highstate.sls
enable gate, and the so-user_sync cron gate).
Before removing from apply_hotfix function first verify that older installs < 3.1.0 are still upgradable when referencing 'so/0013_input_lumberjack_fleet.conf' via pillar. Failure to do so will prevent logstash from starting