embedshim
The embedshim is the kind of task runtime implementation, which can be used as plugin in containerd.
With current shim design, it is used to manage the lifecycle of container process and allow to be reconnected after containerd restart. The one of the key design elements of a small shim is to be a container process monitoring, at least it is important to containerd created by runC-like runtime.
Without pidfd and ebpf trace point feature, it is unlikely to receive exit notification in time and receive exit code correctly as non-parents after shim dies. And in kubernetes infra, even if the containers in pod can share one shim, the VmRSS of shim(Go Runtime) is still about 8MB.
So, this plugin aims to provide task runtime implementation with pidfd and eBPF sched_process_exit tracepoint to manage deamonless container with low overhead.
Build/Install
The embedshim needs to compile bpf with clang/llvm. So install clang/llvm as first.
$ echo "deb http://apt.llvm.org/focal/ llvm-toolchain-focal main" | sudo tee -a /etc/apt/sources.lis
$ wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add -
$ sudo apt-get update -y
$ sudo apt-get install -y g++ libelf-dev clang lld llvm
And then pull the repo and build it.
$ git clone https://github.com/fuweid/embedshim.git
$ cd embedshim
$ git submodule update --init --recursive
$ make
$ sudo make install
The binary is named by embedshim-containerd
which has full functionality in linux. You can just replace your local containerd with it.
$ sudo install bin/embedshim-containerd $(command -v containerd)
$ sudo systemctl restart containerd
And check plugin with ctr
$ ctr plugin ls | grep embed
io.containerd.runtime.v1 embed linux/amd64 ok
Status
The embedshim supports to run container in headless or with input. But it still works in progress, do not use in production.
- Support Pause/Resume
- Task Event(Create/Start/Exit/Delete/OOM) support
Requirements
- raw tracepoint bpf >= kernel v4.18
- CO-RE BTF vmlinux support >= kernel v5.4
- pidfd polling >= kernel v5.3
License
- The user space components are licensed under the Apache License, Version 2.0.
- The BPF code are under the General Public License, Version 2.0.
Support ExecProcess in shim
bug: fd leaky when delete created container
critest will call CreateContainer and delete it. And then the fifo will be leaky.
The case name is
runtime should support removing created container [Conformance]
.reproduce:
The result is from v1.5.11 containerd (using runc-v2 shim). It is upstream issue. But block v0.1.0 release.
pkg/exitsnoop: hold raw_tp link to prevent from GC
The cilium/ebpf defines sys.FD's SetFinalizer to close fd when GC. Since the exec process's exit code needs memory-type exitsnoop, we should keep the reference on the raw_tp link. Otherwise, the exitsnoop will be gone and process's exit code will be wrong.
Signed-off-by: Wei Fu [email protected]
LICENSE/README.md: Add LICENSE
According to https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/bpf/bpf_licensing.rst#n87
Signed-off-by: Wei Fu [email protected]
fix: exitCode needs to be translated before use
Current:
After:
Signed-off-by: Wei Fu [email protected]
Feature: support exec API
The runC-like command doesn't support create-start two steps like init. There needs a wrapper to support exec by pidfd and exitsnoop.
And maybe draft propose two steps in runc community.
rewrite embedshim's task manager
Signed-off-by: Wei Fu [email protected]
.github/.golangci.yml/.go: fix linter issue
Copy .golangci.yml from containerd/containerd repo
.github:
Fixed the linter issues created by golangci-linter
Signed-off-by: Wei Fu [email protected]
.github: update ci.yaml
.github: update ci.yaml
.github/bpf/pkg: fix Linter issue
bpf: change monitor.bpf.c to pid_monitor.bpf.c and remove example
pkg/ebpf: update go:generate
.github: support pull requests
Signed-off-by: Wei Fu [email protected]
Feature: support basic task events