Brandon Philips: How the CoreOS Linux Distro Uses Cgroups
CoreOS is a new Linux distribution for servers aimed at giving all data centers the same automation capabilities and efficiencies as those seen in the massive server farms run by Google or Facebook. Their technology, combined with the upstart package manager Docker, is popularizing the idea that the Linux operating system itself can serve as a hypervisor. Lending credibility to the approach is Linux kernel developer Greg Kroah-Hartman, a CoreOS advisor.
"Kroah-Hartman says he’s been wanting to build something like CoreOS for over half a decade," writes Cade Metz in a recent Wired article featuring CoreOS.
At the heart of this functionality is cgroup -- the Linux kernel subsystem that allows process containers for resource partitioning. CoreOS CTO Brandon Philips will speak at LinuxCon and CloudOpen in New Orleans next week about cgroup.
"Until recently the only true way to isolate Linux apps from each other was with a hypervisor like KVM," Philips said. "With containers we get that isolation and ability to programmatically turn on and off virtual machines. But, they come online faster and use far fewer resources."
Here he previews his talk and discusses the benefits of CoreOS for sysadmins and developers; how CoreOS uses cgroups and systemd; how the cgroup redesign might affect CoreOS; and poses his questions for Linux kernel developers in advance of their panel discussion at LinuxCon and CloudOpen.
What is CoreOS in a nutshell, for those who are unfamiliar (or didn’t read the Wired article)?
It’s a new Linux operating system focused on creating a Linux that’s more tuned for people with up to hundreds of thousands of servers. A lot of the problems they encounter are those that Facebook or Google have already solved. They have lots of machines and developers and need to be really efficient and automate tasks. CoreOS does things a lot differently than other Linux distros.
For example: /etc is where configuration files are stored on a Linux machine. We have created etcd, a dameon that runs across all machines to share configuration data. It’s a simple API and operationally simple.
What are the benefits of CoreOS for a sysadmin? For an application developer?
For a lot of teams already doing distributed systems, this isn’t new to them. But having a dynamic registry running across your machines instead of running a bunch of static config files means you can just write into the service registry and all the machines in the cluster have access to that.
How does CoreOS utilize cgroups and systemd?
Really the fundamental difference between CoreOS and what people are used to, is that we don’t have a package manager. Instead we use containers. What makes it possible are cgroups and namespaces. Cgroups has the ability to meter and isolate the amount of hardware resources the individual container is able to use. And with cgroups we can run production and development software at the same time because dev can have a lot lower priority. We can safely deploy containers across machines that aren’t necessarily production.
How will the cgroup redesign affect what you’re trying to accomplish with CoreOS?
In the long run it will probably help us. CoreOS is built on top of systemd, and essentially the big change in cgroups is moving away from a file system where anybody can manipulate that file system from having a gateway API. The current implementation that exists is systemd and we’ve bought into systemd. We had no intentions of using anything other than systemd.
Why do you need a new operating system to accomplish this for application servers? Why not build within another distro?
A big piece that caused us to take pause and create our own distro is we wanted to do updates quite a bit differently. Taking inspiration from Chrome OS, we have two file systems A and B. While a Linux system is running on A you can make offline changes in the B system. As soon as you’re ready to upgrade you reboot the machine. This double buffered update gives you a couple of advantages. Things are completely atomic, you don’t want a server to ever be in a state where things are unknown. A classic package manager will modify files all over your system while you’re doing your upgrade, this means many daemons could be in any unstable state at any point.
You want to make sure code is up to date and control rollouts so they don’t happen during some critical time for the application. It shouldn’t be something a sysadmin should worry about every day. We wanted to make that a core piece of the distro.
If you could ask the kernel developers on stage at LinuxCon anything, what would it be?
I really think dbus is an interesting thing happening within both kernel and user space. Cgroups started as a file system and you realized a file system isn’t going to cut it. A lot of the new cgroups functionality is exposed to dbus in its first implementation. There’s talk of adding dbus into the kernel. I’m interested to hear where people think file systems today aren’t cutting it and where userspace dbus enabled daemons might be useful for future APIs.
Also, now that we are having to invent new APIs and syscalls that haven't existed in other unix-like systems how do you feel about versioning APIs/ABIs or providing API previews to application/library developers to make sure we are on the right path with a given design?
Can you give us a preview of your talk at LinuxCon?
My talk won’t be about CoreOS at all, it’s just the product I’m working on. It’s familiarizing people and giving an overview of technologies that have made it possible to think of Linux and Linux servers as a hypervisor for containers and what functionality that gives you.
Until recently the only true way to isolate Linux apps from each other was with a hypervisor like KVM. With containers we get that isolation and ability to programmatically turn on and off virtual machines. But, they come online faster and use far fewer resources. I’ll give people practical examples of how to use cgroups to monitor apps or isolate them or use namespaces to increase security and isolate things from the network and the file system. It’s more of a practical guide to these newer APIs and functionality.