Linux 3.6 – 3.9 CPU scheduler performance

This is the first big series of benchmarks with the high accuracy cbenchsuite, The focus for those benchmarks was the performance of the Linux CPU scheduler in different kernel versions. I used two different systems, an old AMD dual core and a newer Core i7-2600 quad core, which is slightly overclocked. The systems were benchmarked with kernels 3.6, 3.7, 3.8 and 3.9-rc4. This post will give you a few interesting results but it is not a summary and it has no conclusion. This is actually about releasing the raw data of my measurements.


  • 10 different benchmarks, e.g. kernel compile, hackbench, fork-bench, …
  • A lot more different benchmark configurations, e.g. number of threads used
  • 4 monitors used to record memory, scheduler, … statistics
  • Kernel caches and swap was reset before each run for higher accuracy


Let’s start with some hackbench results. Hackbench spawns a number of groups of processes which communicate within the group. The runtime is shown in the following two charts.

dual core hackbench

Dual core system Hackbench results.

You can see that Linux 3.7.0 was actually the fastest overall kernel in this benchmark. With usage of pipes, the latest kernel 3.9 is on the same bad level as 3.6. Without pipes, 3.9 is actually not as bad as with pipes. However it’s also not the fastest in this benchmark on a dual core.

hackbench quad core.

Hackbench on a quad-core.

You can see similar results as on the dual core system. Unfortunately, this is not the best benchmark for 3.9, but 3.7 is still very good.

Continuing with the standard kernel compile benchmark.

kernel compile dual quad

Kernel compile benchmark on dual and quad core.

In this benchmark, 3.9 can outperform the other kernels significantly on the dual core. But there is no performance difference on the quad core system visible. I couldn’t find any explenation for this difference in the monitor results. However I found some interesting difference in the number of contextswitches for the dual core system.

single threaded kernel compile contextswitches

Number of contextswitches while compiling the kernel with 1 thread.

You can see that they have different task switching behavior especially on the dual core system. But if you compare those differences with the performance differences above, you can’t see any significant changes.

Result links

The following links will point you to the generated websites with a lot of results. There are three complete sets of visualized results available, all on the same dataset. The first contains both systems, the second shows just the dual core, and the third only the quad core. All generated html files will need javascript.

Comments are closed.