By "core kernel," I mean code which affects the kernel as a whole and which isn't tightly associated with a single subsystem. The core of the Linux kernel is quite small, and it has also been surprisingly stable in recent times. The CPU scheduler has seen mostly incremental changes, and the core memory management code has seen few fundamental changes for years. On the other hand, there is still interest in a number of areas, including real time, asynchronous I/O, fast booting, and more.
Contents |
Real-time preemption
The real-time preemption patch set seeks to provide deterministic response times with a stock Linux kernel. It works by making everything preemptable, including code (spinlock-protected critical sections, interrupt handlers) which cannot be preempted in current kernels.
Forecast: much of the real-time preemption patch set has already found its way into the mainline. The most controversial changes remain out of tree, though; in particular, preemptible spinlocks remain outside of the mainline. The real-time developers have plans to merge much of the remaining work over the course of the next year - but that has been true for a few years now. That said, the realtime developers have recently returned to this project, so things may start to pick up again. The merging of the threaded interrupt handler infrastructure for 2.6.30 was an important step in that direction.
For more information: there are a number of articles covering the evolution of this patch set, including:
- Approaches to realtime Linux
- Realtime preemption, part 2
- Realtime preemption and read-copy-update
- The beginning of the realtime preemption debate
- Clockevents and dyntick
- What's in the realtime tree (October, 2007)
- Moving interrupts to threads (October, 2008)
- The return of the realtime preemption tree (February, 2009)
- The realtime preemption endgame (August, 2009)
- The realtime preemption minisummit (September, 2009)
Deadline scheduling
Classic Unix scheduling is priority-based, and realtime scheduling is doubly so. But priority-based scheduling does not always map well onto realtime tasks, which are generally expressed in terms of an amount of which which must be accomplished by a given deadline. Deadline schedulers express the problem in those terms: a deadline, a worst-case execution time, and, possibly, a period for recurring tasks. Beyond a better fit to the problem, deadline schedulers can guarantee that work will be done on time and refuse tasks which oversubscribe the system's abilities.
Forecast: there is a well-advanced deadline scheduler in the works for Linux. Many troublesome issues remain to be resolved, however, including the little problem of just how one avoids priority inversion problems. So it will be a little while yet, but, presumably, we'll have a deadline scheduler sometime in 2010.
For more information:
- Deadline scheduling for Linux (October, 2009)
Memory fragmentation avoidance
The Linux virtual memory system tends to fragment physical memory over time. This fragmentation is not normally a problem, but it can get in the way when large, physically-contiguous chunks of memory are required. On highly-fragmented systems, multi-page allocations can fail, leading to degraded system functionality.
There's a few developments which address the fragmentation problem. The most prominent are:
- Lumpy reclaim, which emphasizes reclaiming physically-contiguous pages
- Grouping of memory allocations so that those which can be moved are kept separate from those which cannot be moved.
Both techniques can make it easier for the kernel to satisfy multi-page contiguous allocations. The grouping patches, in addition, are useful for memory hotplugging, which, in turn, is a feature which can be used by virtualization solutions.
Forecast: Basic fragmentation avoidance patches (the ZONE_MOVEABLE memory zone) and lumpy reclaim were merged for 2.6.23. Further work on active fragmentation avoidance may be merged in future kernels.
For more information:
- Fragmentation avoidance (November, 2005)
- Avoiding - and fixing - memory fragmentation (November, 2006)
- Short topics in memory management (February, 2007)
- Memory compaction (January, 2010)
Huge pages
On almost all architectures, Linux works with 4096-byte pages of memory. Most contemporary processors are able to deal with larger pages, though, and there can be significant performance advantages to using "huge pages" in some situations.
Forecast: Linux has had basic huge page support for some time, though using it requires special measures by application developers. There would be value in "transparent huge page" support, wherein applications would simply use these pages when they are available and the kernel decides that it makes sense. There is a set of transparent huge page patches in circulation, but some performance issues remain and memory management code is always slow to be merged. So I'll not try to predict just when this feature might go into the mainline.
For more information:
- Transparent huge pages (October, 2009)
- Huge pages part 1 (February, 2010)
Syslets and threadlets
"Syslets" are a means for running small programs within the kernel; they are a way to run system calls asynchronously and without exiting to user space in between. "Threadlets" are a similar mechanism for running asynchronous code in user space. In either case, the code in question will run synchronously as long as it does not block. If it must wait for something, the kernel creates a new thread (or reuses an existing, spare thread) and continues user-space execution in that thread.
The initial motivation for this patch was to enable a complete asynchronous I/O implementation without the heavy maintenance overhead of the current AIO approach. Since syslets and threadlets allow any system call to be run asynchronously, however, they have a wider application than that.
Forecast: This code currently appears to be languishing due to a combination of tricky implementation issues and a lack of pressure from users for this new feature. It may yet come back, but it's hard to say when that might be.
For more information:
- Fibrils and asynchronous system calls (January, 2007)
- Alternatives to fibrils (Syslets, February, 2007)
- Threadlets (February, 2007)
- Realtime and syslets (September, 2007)
- Simpler syslets (December, 2007)
- A new approach to asynchronous I/O (January, 2009)
Big Kernel Lock
The Big Kernel Lock (BKL) was first introduce in the 2.0 kernel as a way to minimize concurrency and make basic SMP functionality work. Ever since then, the kernel developers have been working to squeeze the BKL out of the kernel as it poses an ongoing scalability problem. Recent difficulties have added a new urgency to this effort, with the result that more time is going into the BKL-removal task.
Forecast: The realtime preemption developers are determined to remove the BKL as part of the process of getting more of the realtime code into the mainline. The 2.6.33 development cycle saw the addition of a large set of patches eliminating the BKL from the reiserfs filesystem; various other call sites have also been taken out. We may never be entirely free of the BKL, but we're quickly getting to a point where most commonly-used code paths no longer use it.
For more information:
- The Big Kernel Lock lives on (May, 2004)
- The Big Kernel Lock strikes again (May, 2008)
- Kill BKL Vol. 2 (May, 2008)
- The realtime preemption mini-summit (September, 2009)
Fast boot
Nobody likes waiting for a system to boot under any circumstances. As Linux moves into more embedded devices, though, the boot-time problem is becoming more prominent. We expect our devices to be ready to operate as soon as we turn them on - not a minute or so later.
There has been interest in speeding up the bootstrap process for years. More recently, a focused effort associated with the Moblin project has shown that serious gains can be had in this area. Systems booting in five seconds have been demonstrated, and there is a lot of interest in reducing the boot time further. Doing this requires work both within the kernel (initializing hardware in parallel, for example) and in user space.
Forecast: some bits of fast-boot technology have been merged into the 2.6.28, 29, and 30 kernels, with more to come. This work occasionally runs into snags, so it must proceed carefully. But interest is high and distributions are picking up the work, so faster booting can be expected on a wide variety of systems in the near future.
For more information:
- Booting Linux in five seconds (September, 2008)
- An asynchronous function call infrastructure (January, 2009)
- USB and fast booting (April, 2009)

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.




