The /nsm/kernelrepo bind mount exposed the files, but without a matching
location block external requests to /kernelrepo/ fell through to the SOC
app and returned HTML, so minions hit 'repomd.xml parser error'. Add a
/kernelrepo/ location mirroring /repo/.
During soup the grid is mid-salt-upgrade. Only assign the UEK8 kernel
repo once the node's grains.saltversion matches salt.minion.version from
minion.defaults.yaml, so the kernel repo and the update it enables don't
activate until the node is fully on the target salt.
Installing kernel-uek-core adds a UEK8 (6.x) boot entry but doesn't make
it the default, because grubby only auto-promotes within the running
kernel's flavor lineage and we cross from a 5.x kernel to the new UEK8
flavor. so-kernel-upgrade finds the newest installed 6.x UEK kernel and
grubby --set-default's it (idempotent, verifies the change, no reboot).
so-repo-sync only populates /nsm/kernelrepo after the highstate, so on a
manager the file:///nsm/kernelrepo repo could be assigned before any
repodata exists, failing every dnf op. Run createrepo on the dir when
repodata/repomd.xml is missing, leaving a synced repo untouched.
When the kernel repo is assigned but /nsm/kernelrepo isn't populated
yet, its missing repomd.xml makes every dnf/pkg operation fail (e.g.
pkg.held for salt during highstate). The kernel repo is supplementary,
so set skip_if_unavailable=1 in both the salt-managed client repo and
the four install-time bootstrap repo files; dnf ignores it until it is
populated instead of aborting. The main repo stays strict.
During soup, so-repo-sync runs before the highstate deploys the new
repodownload.conf. On the first upgrade to a kernel-aware version the
on-disk config lacks the [securityonionkernel] section, so dnf aborts
with "Unknown repo: 'securityonionkernel'" (set -e kills soup). Guard
the kernel reposync on the section being present; the next sync after
the highstate deploys it picks it up.
The install bootstrap appended the [securityonionkernel] section to the
shared /etc/yum.repos.d/securityonion.repo, but the salt state so_kernel_repo
(name securityonionkernel) manages its own canonical file
/etc/yum.repos.d/securityonionkernel.repo. At highstate both files defined the
same repo id, so dnf failed with "repository securityonionkernel is listed
more than 1 time".
Write the bootstrap kernel repo to /etc/yum.repos.d/securityonionkernel.repo
in all four securityonion_repo() branches so the id lives in exactly one file
and salt edits it in place. Mirrors how the main repo's runtime id matches its
file name.
Mirror the kernel repo to full parity with the main package repo so the
grid can pull the Oracle UEK8 kernel:
- setup/so-functions: securityonion_repo() emits a [securityonionkernel]
section in every branch (mirrorlist on non-airgap, https://$MSRV/kernelrepo
for airgap/minion, file:///nsm/kernelrepo/ for manager); repo_sync_local()
and create_repo() sync and build /nsm/kernelrepo.
- manager/init.sls: create /nsm/kernelrepo and deploy mirror-kernel.txt.
- nginx/enabled.sls: serve /nsm/kernelrepo at https://<repo_host>/kernelrepo.
- repo/client/oracle.sls: add so_kernel_repo, gated by
onlyif test -e /opt/so/state/nic_names_pinned so the kernel repo is only
assigned once NICs are pinned by MAC.
- update_packages(): run so-nic-pin before the dnf update that pulls the
kernel, freezing interface names and dropping the pin marker so the kernel
isn't downgraded then re-upgraded on the first highstate.
elastic_fleet_load_integrations_dir now buffers each concurrent job's
output (header + API response) to a per-job file and prints them in
submission order after wait, restoring the readable serial-style output
while keeping concurrent writes.
Add --retry-all-errors to the integration create/update curl calls so
transient 409 conflicts from concurrent writes to the same agent policy
are retried (curl --retry alone does not retry 409).
Fetch each agent policy once and extract integration name/package/version/id
locally via a single jq pass instead of re-fetching the identical policy JSON
1+3N times. Memoize epm/packages latest-version lookups so each package is
queried once instead of per (policy, integration). Dispatch the per-integration
dry-run+upgrade as throttled background jobs (MAX_FLEET_JOBS) with
flock-serialized output and a FAIL_FILE marker, mirroring
elastic_fleet_load_integrations_dir.
Behavior preserved: same elastic-defend-endpoints/fleet_server skips, same
AUTO_UPGRADE_INTEGRATIONS default-package gating (moved into jq, using $defaults
to avoid the jq $def keyword collision), and exit 1 on any failure so salt
retries.
Load component and index templates as throttled background jobs (max 10
concurrent) instead of sequential curl PUTs, matching the bounded-concurrency
+ flock-serialized-output pattern used by the fleet/ILM load scripts. Keeps a
wait barrier between the component phase and the index phase so index
templates never load before their referenced component templates. Failures are
tracked via per-job marker files since counter increments can't escape
background subshells.
Fetch each agent policy once per group instead of refetching the full
policy (plus a fresh Kibana session cookie) for every integration file,
and dispatch the create/update writes as throttled background jobs.
Adds elastic_fleet_load_integrations_dir and elastic_fleet_throttle to
so-elastic-fleet-common, reusing the bounded-concurrency pattern from
so-elasticsearch-ilm-policy-load. Replaces the four serial loops in the
loader with one call per agent policy.
Add so-nic-pin, which writes by-MAC persistent-net udev rules pinning each
physical NIC to its current name so a kernel upgrade can't renumber the
interfaces Security Onion binds by name (host:mainint, sensor:mainint, bond0).
Gated by the drop file /opt/so/state/nic_names_pinned: run-once on highstate,
and an admin can pre-create the marker to opt out. Wired into common/init.sls
as pin_nic_names, guarded by a matching unless.
The agent-policy enumeration passed --argjson def, creating a jq
variable $def. 'def' is a reserved keyword in jq and the deployed jq
version rejects it, so the program failed to compile and
in_use_integrations was left empty (silently disabling the in-use
upgrade guard). Rename the arg to $defaults.
A single printf per block was not actually one write() call, so
concurrent jobs still occasionally interleaved their label and response
lines. Hold an flock around just the printf (curl still runs in
parallel) so each policy's block prints intact, keeping live
completion-order streaming.
Replace the per-package decision loop (which forked ~10 processes per
package and rebuilt a growing JSON file on every add -> O(n^2)) with two
jq passes: one prints the status messages, one builds the bulk install
list. A vnum/needs() jq definition reproduces the previous
version_conversion/compare_versions and excluded/subscription/installed/
upgrade/in-use logic exactly. Also fetch each agent policy once and
extract non-default package names locally instead of re-fetching the
policy per integration (1+K -> 1 GET per policy). Install behavior is
unchanged.