Elemental
Elemental is a software stack enabling a centralized, full cloud-native OS management solution with Kubernetes.
Cluster Node OSes are built and maintained via container images through the Elemental Toolkit and installed on new hosts using the Elemental CLI.
The Elemental Operator and the Rancher System Agent enable Rancher Manager to fully control Elemental clusters, from the installation and management of the OS on the Nodes to the provisioning of new K3s or RKE2 clusters in a centralized way.
Follow our Quickstart or see the full docs for more info.
Support for secure boot
I believe we are missing the shim image in our base image. This is something we need to support secure boot, also some adjustments on grub2 installation for secure boot might be required.
Hostname is changed upon each reboot
After installation I set my hostname to
ros-node-02
and configured a K3s cluster on it with rancherd/Rancher. All was fine but after a reboot the hostname has been changed to something likerancher-XXXXX
. I tried to reset the hostname but same issue after a reboot. I'm able to reproduce this every time.I have a DHCP/DNS server that fix the IP address, so it's not an IP change upon each reboot. I attached journalctl logs on this issue. journalctl.log.gz
Teal
selinux
as SLE Micro for Rancher has supports for it, but as we don't have profiles for it, fails bootingIt is better to browse it directly: https://github.com/rancher-sandbox/os2/tree/teal as most of things got simplified and dropped
Draft as gotta test this locally still and trying to wire up the CI
Supersedes #115 Part of https://github.com/rancher-sandbox/os2/issues/94
nomodeset required for booting certain hardware
What steps did you take and what happened:
I'm reporting on behalf of a user who is having issues booting Elemental Teal. They were able to work around the issue by pressing
e
and addingnomodeset
to the kernel params.They are using an AMD Ryzen-based SimplyNUC (https://simplynuc.com/product/llm2v8cy-full/)
What did you expect to happen:
The machine should boot normally without manual intervention.
Environment: (Asking for details and will fill in)
cat /etc/os-release
):kubectl version
):Consistent CI environment
Our current CI / workers / runners setup is somewhat 'spread' across internal and AWS machines. We should try to have it all in one place and properly documented.
Empty `/etc/resolv.conf`
Since the latest release (11th of September 2022) elemental does not properly set
/etc/resolv.conf
file at boot. In fact, sle-micro-for-rancher introducedNetworkManager
in this release.Image names and labels for elemental stack
In our current built artifacts we have the following image names and labels.
<NUM>
is the OBS build ID and<registry>
is the registry domain and base repository where OBS pushes to.HELM CHART:
TEAL IMAGE:
BUILDER IMAGE:
OPERATOR IMAGE:
I see a couple of little issues here, I'd say they should be fixed before the release:
rancher
repository like all others, I doubt this was on purpose.Beyond this two little issues I am wondering if the schema of
/rancher/<image>/5.3
repository is what we want. Also it is unclear to me why do we have two different repositories for the teal image, it doesn't hurt, but I don't see how this could be used.@kkaempf @agracey @rancher/elemental are you fine with the current tags? Anything you think it should be arranged?
build-iso still pulls from quay.io
elemental build-iso
should run offline when the base image is available in the local image cache. Currently it still pulls data from quay.io (grub2 files apparently).Not sure why this happens. Either
elemental
looks at the wrong places in the image (or not at all :wink:)Everything needed for ISO building should be provided by node-image and builder-image.
Receiving 503 when reinstalling Elemental Operator
What steps did you take and what happened: I reinstalled the Elemental Operator to fix a different problem.
Then I installed the operator
And installed selector, cluster and registration. When I ran wget to get the machine registration information rancher returned a 503:
I receive a proper URL from the
kubectl get machineregistration
but the wget doesn't get the registration information.What did you expect to happen: I should receive a yaml file with the registration token and cert.
Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]
Environment:
cat /etc/os-release
): N/Akubectl version
): RKE2:Docs migration to Docusaurus
Docusaurus installed and initial configuration done:
The docs have been migrated 1:1 without any adaptation, so expect view glitches.
Split ros-operator into its own repo
It makes no sense to have all this mixed together with the os2 files. First, everything is clumped into a ton of files to setup like the CI, the makefile, the scripts, etc... while ros-operator is very simple and could do with a simple repo in which we control everything and its much clearer where to change things.
Plus we can release it separately instead of in a big release with the installer, the chart, the iso, etc..
ros-operator and its chart should live in its own repo for ease of releasing, testing and updating. We also dont need os2 to test the ros-operator so that makes it simpler.
A test repo is available at https://github.com/Itxaka/ros-operator with the full package, CI stuff, release stuff.
Action items
e2e CI: Run nightly tests both on Rancher Manager stable and latest devel versions
Right now, we only run our test on top of latest devel Rancher Manager version. For instance:
But I think we also need to test latest stable version:
Failure in e2e UI tests - element .xterm-cursor-layer not found
The CI is failing since yesterday about a missing element, Cypress does not see the xterm cursor.
Failure: https://github.com/rancher/elemental/actions/runs/3835177495/jobs/6531093559
Wrong shutdown order (on aarch64?)
What steps did you take and what happened: [A clear and concise description of what the bug is.]
Run
on a node's root prompt.
This fails to umount
/var/lib/rancher
and subsequently/var
:and then it hangs in stopping a container(?)
and only continues after systemd's timeout is reached.
What did you expect to happen:
An orderly and quick shutdown.
Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]
Environment:
cat /etc/os-release
): HEAD as of today.kubectl version
):e2e: test new autogenerated seed for emulated TPM
elemental-operator v1.0.x allows only one node with emulated TPM per MachineRegistration, but new operator v1.1.x allows more if
emulatedSeed
is set to-1
.A test for this should be added as soon as we don't need to keep CI test for operator v1.0.x.
Revisit elemental release procedures
Few things to improve and elaborate around releases:
dist/obs/
folder within the repos to include those. Indist/obs
there could be a README.md explaining the files in there are OBS specific and mostly used for SUSE's builds.