A Conversation with Chris Mason on BTRfs: the next generation file system for Linux
If you run your data center on Linux you have likely heard of BTRfs, the next generation file system that was recently merged into the kernel. If you haven’t heard of it, you should, as it stands to make your life, and all those who handle large amounts of data on Linux, much easier, more reliable and more scalable.
While BTRfs isn’t ready for production yet, I think it’s one of the most exciting and important developments in Linux today. I recently sat down with Oracle developer Chris Mason to discuss the file system, how he corrupted Linus’ root filesystem with his first patch (and lived to tell about it) and just how you pronounce the name of the project.
Amanda: Can you describe BTRfs? What is it and why should users care?
Chris: Btrfs is a new, next generation filesystem designed from the ground up for Linux. It aims to solve scalability problems for larger and faster storage, while also adding features that existing Linux filesystems lack.
Amanda: There is a lot of choice of file systems in Linux. Some people might say there is too much choice. What do you think?
Chris: Linux has grown a rich infrastructure for filesystems, making it very easy to experiment and innovate with different storage technologies. So, it isn’t surprising that many different filesystem projects have found their way into the kernel.
One of the reasons we are able to sustain these projects is because Linux is used with so many different workloads and types of storage.
Amanda: What’s the status of BTRfs? I know it was merged in January 09 in kernel 2.6.29; when will it be ready for users to use in production?
Chris: One of the earliest goals of Btrfs was to attract other companies and developers interested in working on the project. This has helped build a strong group of contributors, and we’re concentrating on stability and performance.
We have most of the features we need today for Btrfs to be usable, including the core of multi-device support, checksumming and snapshotting that are crucial because other Linux filesystems don’t provide them today. After the 2.6.32 kernel release, I expect to have things in a state where we can start collecting early adopters for heavy
Amanda: Why did you start this project? Why is Oracle supporting this project so prominently?
Chris: I started Btrfs soon after joining Oracle. I had a unique opportunity to take a detailed look at the features missing from Linux, and felt that Btrfs was the best way to solve them.
Linux is a very important platform for Oracle. We use it heavily for our internal operations, and it has a broad customer base for us. We want to keep Linux strong as a data center operating system, and innovating in storage is a natural way for Oracle to contribute.
Amanda: What are some of the key features/improvments of Btrfs over existing file systems today? What file system would you compare it to?
Chris: Btrfs integrates multi-device management at the filesystem level. The devices can be mixed in size and speed, giving the admin much more flexibility when managing large pools of storage.
The long term goal is to be able to choose allocation policies that match the data being stored to the underlying devices.
Because Btrfs maintains both data and metadata integrity checksums, it is able to detect bad copies of blocks and use the internal RAID code to pull up the correct data.
Btrfs supports snapshots that are writable and can be snapshotted again. The copy-on-write mechanism that backs the snapshotting code makes key features possible, such as transparent compression. In future releases we plan to add online fsck, deduplication, encryption and other features that have been on admin wish lists for a long time.
Ext4 and XFS are the two filesystems we use most often for comparison. They both perform very well, so they are usually our performance target during benchmarking runs.
My favorite demonstration of Btrfs’ flexibility is the conversion tool from Ext3/4 to Btrfs. The conversion tool places all of the Btrfs metadata in the free space of the Ext filesystem, and adds Btrfs extent pointers to all of the file data blocks.
The conversion maintains the original Ext metadata as a snapshot, leaving the original filesystem unmodified. Until the snapshot is deleted, the conversion can be undone, reverting things back to the original Ext filesystem.
Amanda: Here’s a fun question: how did you get started working with Linux? What’s the very first patch you had accepted?
Chris: My first Linux project was a friendly race with the Ext3 developers. At the time, Linux didn’t yet have a journaled filesystem, and I was an admin looking at the features Linux was missing before it could be used in my own data center. I ended up working on journaling code for ReiserFS, and was then able to switch to filesystem programming full time.
Back when ReiserFS was merged into mainline, I managed to corrupt Linus’ root filesystem (ext2 at the time) with a last minute patch. So far I haven’t repeated that yet, but each new merge window gives me another try.
Amanda: There were some performance metrics reports recently on Btrfs that weren’t that glowing in comparison to XFS or Ext4. What’s your response?
Chris: Benchmarking is one of my favorite parts of development. With the 2.6.31 merge window, we’ve fixed most of the performance bottlenecks that caused problems in those benchmark runs. But, our goal isn’t to win every benchmark. Today’s filesystems perform very well, and usually when bad performance is found it gets tuned and fixed.
Btrfs is concentrating on features that can’t be implemented with Ext4 or XFS. It is important that we perform well, but I don’t expect to be at the top of every benchmark result.
Amanda: Was Btrfs created to replace Ext3/4 or do you see users still using those file systems? What about XFS?
Chris: The goal is definitely to replace Ext3 and Ext4 as the default Linux filesystem. I wouldn’t be surprised to find people holding on to the Ext series, it has a long history of stability, and not everyone needs the latest and greatest features.
XFS is likely to stay around just as long. It has been heavily tuned and optimized for high scalability, and that kind of investment takes a long time to match.
Amanda: You are a member of the Linux Foundation’s Technical Advisory Board. Can you tell me about your participation in that group and what it means?
Chris: The TAB is an great way to connect the Linux Foundation with community. It gives a broader base of input into the issues the Linux Foundation is trying to solve, and more people are aware of the LF initiatives.
Amanda: Now that your employer Oracle is purchasing Sun, and with it Solaris’ ZFS file system, any plans to license under GPLv2 so developers could port it to Linux? If so, is Btrfs still as needed?
Chris: Sun has many interesting projects, and I’m looking forward to working with their R&D teams. We’re committed to continuing Btrfs development, and ZFS doesn’t change our long term plans in that area.
Amanda: To clear this up, once and for all: is it pronounced BetterFS or ButterFS?
Chris: <Grin> Definitely both.