Elemental is an immutable Linux distribution built to run Rancher and it's corresponding Kubernetes distributions RKE2 and k3s. It is built using the Elemental-toolkit

  • By Rancher
  • Last update: Dec 27, 2022
  • Comments: 16

Elemental

Elemental End-To-End tests with Rancher Manager

Elemental UI End-To-End tests

Elemental is a software stack enabling a centralized, full cloud-native OS management solution with Kubernetes.

Cluster Node OSes are built and maintained via container images through the Elemental Toolkit and installed on new hosts using the Elemental CLI.

The Elemental Operator and the Rancher System Agent enable Rancher Manager to fully control Elemental clusters, from the installation and management of the OS on the Nodes to the provisioning of new K3s or RKE2 clusters in a centralized way.

Follow our Quickstart or see the full docs for more info.

Download

elemental.zip

Comments(16)

  • 1

    Support for secure boot

    I believe we are missing the shim image in our base image. This is something we need to support secure boot, also some adjustments on grub2 installation for secure boot might be required.

  • 2

    Hostname is changed upon each reboot

    After installation I set my hostname to ros-node-02 and configured a K3s cluster on it with rancherd/Rancher. All was fine but after a reboot the hostname has been changed to something like rancher-XXXXX. I tried to reset the hostname but same issue after a reboot. I'm able to reproduce this every time.

    I have a DHCP/DNS server that fix the IP address, so it's not an IP change upon each reboot. I attached journalctl logs on this issue. journalctl.log.gz

  • 3

    Teal

    • Don't be scared from the diff, it's mostly drops
    • Splits os2 into ros-installer, golang code is gone
    • Base image switched to sle micro for rancher + elemental binaries included
    • framework files are now tracked individually and statically (we could go with git submodules, but wanted to keep it simple for now) allowing sandboxed builds
    • Adds a CI workflow which keeps the framework static files mentioned above in sync with cos-toolkit, opening up PRs
    • Should be ready to go to be built with obs/ibs @kkaempf - it also replaces the os2-framework package, with a unique Dockerfile that can be built directly from obs
    • Drop temporarly selinux as SLE Micro for Rancher has supports for it, but as we don't have profiles for it, fails booting
    • Might need a follow-up, the PR pipeline should work, but yeah :)
    • All binaries are implied to be provided as part of the base image. now this repo will be the "end" dockerfile which just applies the customizations from the framework - so for testing, it is enough to provide a different base image with different binaries (e.g. a pinpointed elemental-cli version)

    It is better to browse it directly: https://github.com/rancher-sandbox/os2/tree/teal as most of things got simplified and dropped

    Draft as gotta test this locally still and trying to wire up the CI

    Supersedes #115 Part of https://github.com/rancher-sandbox/os2/issues/94

  • 4

    nomodeset required for booting certain hardware

    What steps did you take and what happened:

    I'm reporting on behalf of a user who is having issues booting Elemental Teal. They were able to work around the issue by pressing e and adding nomodeset to the kernel params.

    They are using an AMD Ryzen-based SimplyNUC (https://simplynuc.com/product/llm2v8cy-full/)

    What did you expect to happen:

    The machine should boot normally without manual intervention.

    Environment: (Asking for details and will fill in)

    • Elemental release version (use cat /etc/os-release):
    • Rancher version:
    • Kubernetes version (use kubectl version):
    • Cloud provider or hardware configuration:
  • 5

    Consistent CI environment

    Our current CI / workers / runners setup is somewhat 'spread' across internal and AWS machines. We should try to have it all in one place and properly documented.

    • Paul, Itxaka, Julien, and Loic - phrase this issue correctly and add acceptance criterias
  • 6

    Empty `/etc/resolv.conf`

    Since the latest release (11th of September 2022) elemental does not properly set /etc/resolv.conf file at boot. In fact, sle-micro-for-rancher introduced NetworkManager in this release.

  • 7

    Image names and labels for elemental stack

    In our current built artifacts we have the following image names and labels. <NUM> is the OBS build ID and <registry> is the registry domain and base repository where OBS pushes to.

    HELM CHART:

    <registry>/elemental/elemental-operator:
       * latest
       * 1.0.0
       * 1.0.0-build<NUM>
    

    TEAL IMAGE:

    <registry>/rancher/elemental-teal/5.3:
       * 1.0.0
       * 1.0.0-<NUM>
      
    <registry>/rancher/elemental-node-image/5.3:
       * latest
       * 1.0.0
    

    BUILDER IMAGE:

    <registry>/rancher/elemental-builder-image/5.3:
       * latest
       * 0.1.3
       * 0.1.3-<NUM>
    

    OPERATOR IMAGE:

    <registry>/rancher/elemental-operator/5.3:
       * latest
       * 1.0.0
       * 1.0.0-<NUM>
    

    I see a couple of little issues here, I'd say they should be fixed before the release:

    • Helm chart is not under rancher repository like all others, I doubt this was on purpose.
    • Teal image has two different repositories but they do not follow the tree tags approach like other images, note that including a tag with the build ID might be relevant for base OS upgrades not coming from us (CVEs in the base image)

    Beyond this two little issues I am wondering if the schema of /rancher/<image>/5.3 repository is what we want. Also it is unclear to me why do we have two different repositories for the teal image, it doesn't hurt, but I don't see how this could be used.

    @kkaempf @agracey @rancher/elemental are you fine with the current tags? Anything you think it should be arranged?

  • 8

    build-iso still pulls from quay.io

    elemental build-iso should run offline when the base image is available in the local image cache. Currently it still pulls data from quay.io (grub2 files apparently).

    Not sure why this happens. Either elemental looks at the wrong places in the image (or not at all :wink:)

    Everything needed for ISO building should be provided by node-image and builder-image.

  • 9

    Receiving 503 when reinstalling Elemental Operator

    What steps did you take and what happened: I reinstalled the Elemental Operator to fix a different problem.

    helm uninstall -n cattle-elemental-system elemental-operator
    kubectl delete -f registration.yaml
    kubectl delete -f selector.yaml
    kubectl delete -f cluster.yaml
    

    Then I installed the operator

    helm upgrade --create-namespace -n cattle-elemental-system --install elemental-operator oci://registry.opensuse.org/isv/rancher/elemental/stable/charts/rancher/elemental-operator-chart
    

    And installed selector, cluster and registration. When I ran wget to get the machine registration information rancher returned a 503:

    wget --no-check-certificate `kubectl get machineregistration -n fleet-default your-machine -o jsonpath="{.status.registrationURL}"` -O initial-registration.yaml
    --2022-12-20 15:14:50--  https://<omitted>
    Resolving <omitted>
    Connecting to <omitted>|:443... connected.
    WARNING: cannot verify <omitted>'s certificate, issued by ‘CN=dynamiclistener-ca,O=dynamiclistener-org’:
      Unable to locally verify the issuer's authority.
    HTTP request sent, awaiting response... 503 Service Unavailable
    2022-12-20 15:15:05 ERROR 503: Service Unavailable.
    

    I receive a proper URL from the kubectl get machineregistration but the wget doesn't get the registration information.

    What did you expect to happen: I should receive a yaml file with the registration token and cert.

    Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

    Environment:

    • Elemental release version (use cat /etc/os-release): N/A
    • Rancher version: 2.6.6
    • Kubernetes version (use kubectl version): RKE2:
    Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.8+rke2r1", GitCommit:"fdc77503e954d1ee641c0e350481f7528e8d068b", GitTreeState:"clean", BuildDate:"2022-11-10T17:56:22Z", GoVersion:"go1.18.8b7", Compiler:"gc", Platform:"linux/amd64"
    
    • Cloud provider or hardware configuration: self-hosted hardware
  • 10

    Docs migration to Docusaurus

    Docusaurus installed and initial configuration done:

    • Elemental sidebar created to match the mkdocs side bar.
    • Header links set and logo is missing
    • Footer created with same links from mkdocs

    The docs have been migrated 1:1 without any adaptation, so expect view glitches.

  • 11

    Split ros-operator into its own repo

    It makes no sense to have all this mixed together with the os2 files. First, everything is clumped into a ton of files to setup like the CI, the makefile, the scripts, etc... while ros-operator is very simple and could do with a simple repo in which we control everything and its much clearer where to change things.

    Plus we can release it separately instead of in a big release with the installer, the chart, the iso, etc..

    ros-operator and its chart should live in its own repo for ease of releasing, testing and updating. We also dont need os2 to test the ros-operator so that makes it simpler.

    A test repo is available at https://github.com/Itxaka/ros-operator with the full package, CI stuff, release stuff.

    Action items

    • [x] Split the operator code to https://github.com/rancher-sandbox/rancheros-operator with all the pipelines, tests, etc.
    • [x] Test releasing and QA
    • [x] Update os2 to drop the ros-operator code counterpart (adapt pipelines and CI accordingly)
  • 12

    e2e CI: Run nightly tests both on Rancher Manager stable and latest devel versions

    Right now, we only run our test on top of latest devel Rancher Manager version. For instance:

    root@ubuntu:~# helm search repo --devel
    rancher-latest/rancher                 	2.7.2-rc1      	v2.7.2-rc1     	Install Rancher Server to manage Kubernetes clu...
    

    But I think we also need to test latest stable version:

    root@ubuntu:~# helm search repo
    rancher-latest/rancher                 	2.7.0        	v2.7.0     	Install Rancher Server to manage Kubernetes clu...
    
  • 13

    Failure in e2e UI tests - element .xterm-cursor-layer not found

    The CI is failing since yesterday about a missing element, Cypress does not see the xterm cursor.

    image

    Failure: https://github.com/rancher/elemental/actions/runs/3835177495/jobs/6531093559

  • 14

    Wrong shutdown order (on aarch64?)

    What steps did you take and what happened: [A clear and concise description of what the bug is.]

    Run

    # halt
    

    on a node's root prompt.

    This fails to umount /var/lib/rancher and subsequently /var:

    [  OK  ] Unmounted /var/lib/kubelet342ojected/kube-api-access-g7zn8.
    [  OK  ] Unmounted /var/lib/longhorn.
    [FAILED] Failed unmounting /var/lib/rancher.
    [  OK  ] Stopped Flush Journal to Persistent Storage.
             Unmounting /etc...
             Unmounting /var/lib/kubelet...
             Unmounting /var/log...
    [  OK  ] Unmounted /etc.
    [  OK  ] Unmounted /var/lib/kubelet.
    [  OK  ] Unmounted /var/log.
             Unmounting /usr/local...
             Unmounting /var...
    [  OK  ] Unmounted /usr/local.
    [FAILED] Failed unmounting /var.
             Unmounting /run/overlay...
    [  OK  ] Unmounted /run/overlay.
    [  OK  ] Stopped target Preparation for Local File Systems.
    [  OK  ] Stopped target Swaps.
    [  OK  ] Reached target Unmount All Filesystems.
             Stopping Monitoring of LVM342meventd or progress polling...
    [  OK  ] Stopped Create Static Device Nodes in /dev.
    [  OK  ] Stopped Create System Users.
    [  OK  ] Stopped Remount Root and Kernel File Systems.
    [  OK  ] Stopped Monitoring of LVM2… dmeventd or progress polling.
    

    and then it hangs in stopping a container(?)

    [   ***] A stop job is running for libcontai342c0b054e2b43bc51a3 (29s / 1min 30s)
    [  *** ] A stop job is running for libcontai…c0b054e2b43bc51a3 (29s / 1min 30s)
    [ ***  ] A stop job is running for libcontai342c0b054e2b43bc51a3 (30s / 1min 30s)
    [***   ] A stop job is running for libcontai342c0b054e2b43bc51a3 (30s / 1min 30s)
    [**    ] A stop job is running for libcontai342c0b054e2b43bc51a3 (31s / 1min 30s)
    [*     ] A stop job is running for libcontai342c0b054e2b43bc51a3 (31s / 1min 30s)
    [**    ] A stop job is running for libcontai342c0b054e2b43bc51a3 (32s / 1min 30s)
    [***   ] A stop job is running for libcontai342c0b054e2b43bc51a3 (32s / 1min 30s)
    [ ***  ] A stop job is running for libcontai342c0b054e2b43bc51a3 (33s / 1min 30s)
    [  *** ] A stop job is running for libcontai342c0b054e2b43bc51a3 (33s / 1min 30s)
    [   ***] A stop job is running for libcontai342c0b054e2b43bc51a3 (34s / 1min 30s)
    [    **] A stop job is running for libcontai342c0b054e2b43bc51a3 (34s / 1min 30s)
    [     *] A stop job is running for libcontai342c0b054e2b43bc51a3 (35s / 1min 30s)
    [    **] A stop job is running for libcontai342c0b054e2b43bc51a3 (35s / 1min 30s)
    

    and only continues after systemd's timeout is reached.

    What did you expect to happen:

    An orderly and quick shutdown.

    Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

    Environment:

    • Elemental release version (use cat /etc/os-release): HEAD as of today.
    • Rancher version: 2.7.0
    • Kubernetes version (use kubectl version):
    • Cloud provider or hardware configuration:
  • 15

    e2e: test new autogenerated seed for emulated TPM

    elemental-operator v1.0.x allows only one node with emulated TPM per MachineRegistration, but new operator v1.1.x allows more if emulatedSeed is set to -1.

    A test for this should be added as soon as we don't need to keep CI test for operator v1.0.x.

  • 16

    Revisit elemental release procedures

    Few things to improve and elaborate around releases:

    • [x] Not all sources are in github (specs, dockerfiles, etc.)
      • I'd suggest adding a some sort of dist/obs/ folder within the repos to include those. In dist/obs there could be a README.md explaining the files in there are OBS specific and mostly used for SUSE's builds.
    • [x] Rebuild of RPMs on PRs
    • [ ] Release candidate tags are useless, probably we should simply consider Stable as our RC once artifacts are in registry.suse.com
    • [ ] Linked packaged diffs are reversed
    • [ ] Arrange IBS project to built properly, SR is accepted, but images are not building
    • [ ] A bot user should be used in OBS instead of my own user