Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

we run ~10k agent pods on k3s and went with gvisor over microvms purely for density. the memory overhead of a dedicated kernel per tenant just doesn't scale when you're trying to pack thousands of instances onto a few nodes. strict network policies and pid limits cover most of the isolation gaps anyway.
 help



Yeah, when you run ≈10k agents instead of ≈10, you need a different solution :)

I’m curious what gVisor is getting you in your setup — of course gVisor is good for running untrusted code, but would you say that gVisor prevents issues that would otherwise make the agent break out of the kubernetes pod? Like, do you have examples you’ve observed where gVisor has saved the day?


I've used both gVisor and microvms for this (at very large scales), and there are various tradeoffs between the two.

The huge gVisor drawback is that it __drastically_ slows down applications (despite startup time being faster.)

For agents, the startup time latency is less of an issue than the runtime cost, so microvms perform a lot better. If you're doing this in kube, then there's a bunch of other challenges to deal with if you want standard k8s features, but if you're just looking for isolated sandboxes for agents, microvms work really well.


since we allow agents to execute arbitrary python, we treat every container as hostile. we've definitely seen logs of agents trying to crawl /proc or hit the k8s metadata api. gvisor intercepts those syscalls so they never actually reach the host kernel.

The reason why virtualization approaches with true Linux kernels is still important is what you do allow via syscalls ultimately does result in a syscall on the host system, even if through layers of indirection. Ultimately, if you fork() in gVisor, that calls fork() on the host (btw fork() execve() is expensive on gVisor still).

The middle ground we've built is that a real Linux kernel interfaces with your application in the VM (we call it a zone), but that kernel then can make specialized and specific interface calls to the host system.

For example with NVIDIA on gVisor, the ioctl()'s are passed through directly, with NVIDIA driver vulnerabilities that can cause memory corruption, it leads directly into corruption in the host kernel. With our platform at Edera (https://edera.dev), the NVIDIA driver runs in the VM itself, so a memory corruption bug doesn't percolate to other systems.


> Ultimately, if you fork() in gVisor, that calls fork() on the host

This isn't true. You can look at the code right here[1], there is no code path in gVisor that calls fork() on the host. In fact, the only syscalls gVisor is allowed to make to the host are listed right here in their seccomp filters[2].

[1] https://github.com/google/gvisor/blob/master/pkg/sentry/sysc...

[2] https://github.com/google/gvisor/tree/master/runsc/boot/filt...


I was more specifically referring to the fact that to implement threads in gVisor, it calls to the go runtime, which does make calls to clone() (not fork()), but I see the pushback :)

I think it's a small distinction. fork() itself isn't all that useful anyways.

However, consider reading a file in gVisor. This passes through the IO layers, which ultimately will end up a read in the kernel, through one of the many interfaces to do so.


And you see no problem in that at all? Just “throw a box around it and let the potentially malicious code run”?

Wait until they find a hole. Then good luck.


This is why you can't build these microVM systems to just do isolation, it has to provide more value than that. Observability, policy, etc.

This is a big reason for our strategy at Edera (https://edera.dev) of building hypervisor technology that eliminates the standard x86/ARM kernel overhead in favor of deep para-virtualization.

The performance of gVisor is often a big limiting factor in deployment.


Edera looks very cool! Awesome team too.

I read the thesis on arxiv. Do you see any limitations from using Xen instead of KVM? I think that was the biggest surprise for me as I have very rarely seen teams build on Xen.


I'd say the limitation has been that sometimes we have to implement things by hand. But it has enabled us to do things that others can't achieve since KVM is a singular stack in many ways. For example, VFIO-PCI is largely the same across all VMMs, but we have true full control over the PCI passthrough on our platform which has allowed us to do things KVM VMMs can't.

How do you compete with Nitro-based VMs on AWS with 0.5% overhead?

When running on bare metal, the CPU performance is within 1%, so usually quite well! Hardest thing is I/O, but we do a lot to help with that too.

Hey @clawsyndicate I'd love to learn more about your use case. We are working on a product that would potentially get you the best of both worlds (microVM security and containers/gVisor scalability). My email is in my profile.

This is the thesis of our research paper here, a good middle ground is necessary: https://arxiv.org/abs/2501.04580

LXC containers inside a VM scales. bonus point that LXC containers feel like a VM.

I started this with same idea:

https://github.com/jgbrwn/vibebin




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: