Firecracker is a high-performance virtualization solution built to run Amazon’s serverless applications securely and with minimal resources. It now does so at immense scale.
Background
Virtualization
Initially, a separate VM per Lambda customer, but existing VM solutions required significant resources, hence resulting in non-optimal utilization.
2 types of hypervisors
Type 1
directly integrated in the hardware
Type 2
run an operating system on top of the hardware, then run the hypervisor on top of that operating system
Linux has a hypervisor built into the kernel - Kernel Virtual Machine, arguably a Type 1 hypervisor.
virtio (linux provided interface) allows the user space kernel components to interact with the host OS. Rather than passing all interactions with a guest kernel directly to the host kernel, some functions (particularly, device interactions) go from a guest kernel to a virtual machine monitor (a.k.a VMM) (a popular example: QEMU)
When a lambda is invoked, the ensuing HTTP request hits an AWS Load Balancer.
4 main components:
Workers
the component running lambda’s code
each runs many MicroVMs in “slots” and other services schedule code to be run in the MicoVMs when a lambda is invoked
Frontend
entrance into the lambda system
receives invoke requests and communicate with Worker Manager to determine where to run the lambda. then directly communicates with the Workers
Worker Manager
ensures that the same lambda is routed to the same set of Workers
keep tracks of where a lambda has been scheduled previously
Placement service
makes scheduling decisions to assign a lambda invocation to a worker
Lambda Worker Architecture
Firecracker VM
Shim process
process inside of the VM that communicates with an external side car called the Micro Manager
Micro Manager
a sidecar communicating over TCP with a Shim process
reports metadata received back to the Placement service
can be called from the Frontend to invoke a specific function
on the function completion, receives the response from the shim process passing back to the client as needed
Interesting points
Only IO performance was inferior, and they argue that causes are no flushing to disk and an implementation of block IOs which performs IO serially; async IO support with io_uring can resolve it, there is an issue about it in github.