Author Topic: [Updated] x264 Linux 2.6.32 scheduler benchmarks  (Read 15052 times)

Offline saintdev

  • Member
  • Posts: 22
    • View Profile
[Updated] x264 Linux 2.6.32 scheduler benchmarks
« on: December 15, 2009, 12:44:59 AM »
I need to borrow some CPU time. Got an i7? Running a x86_64 2.6.32 kernel? Got x264? Check below.

If you've read the Diary of an x264 Developer entry on  "Open source collaboration done right". You know that my last set of benchmarks resulted in some pretty major performance improvements in the mainline linux kernel. I've done the benchmarks again, this time with the latest released kernel (2.6.32), and the latest BFS scheduler (2.6.32-bfs311). All tests were run on a Core2 Quad Q9300.

The results from this round are pretty much dead even (except for a hiccup in mainline with insanely fast settings nobody should ever use). Statistically BFS leads with the veryslow settings, however this is only about a 4.4% difference. If you consider encoding a 2 hour movie, it is going to take 7.5 hours, you're only going to save 20 minutes. Personally this is not significant to me, as either way I will leave my encoding going overnight.

There are a few interesting conclusions. First, thread counts far higher than number of cpus will generally not hurt performance. I tested up to 4*cpus (16 threads). Second, slower settings require higher thread counts until they reach their plateau.

Time for some pretty graphs. The first graph is a line graph plotting the number of threads vs. frames per second. The second is a zoomed graph showing the plateau area with error bars.

Update 3: Added tests with a 720p clip, including testing with sliced threads.

Update 2: Added SCHED_BATCH for BFS in addition to mainline at the request of CK. This mode is supposed to give a little better consistency between runs, and you can see this with the veryslow settings. Overall, this mode is slower than BFS without. And for medium, slower than mainline.

Update: Graphs have been updated to include a set of SCHED_BATCH runs as suggested to DS by one of the kernel developers. This was done by calling "schedtool -B -e ./x264_bench.sh". This fixes the drop in frame rate with ultrafast settings, and ties BFS in that case. Otherwise, it ties, or is slightly slower than without.

Veryslow settings:



Medium settings:



Ultrafast settings (lookahead completely disabled):



Scripts used:
http://saintdevelopment.com/benchmarks/bfs-vs-cfs/x264_bench.sh
http://saintdevelopment.com/benchmarks/bfs-vs-cfs/parse_results.py

Raw results:
http://saintdevelopment.com/benchmarks/bfs-vs-cfs/2.6.32-bfs311_x264.log
http://saintdevelopment.com/benchmarks/bfs-vs-cfs/2.6.32-bfs311-batch_x264.log
http://saintdevelopment.com/benchmarks/bfs-vs-cfs/2.6.32-mainline_x264.log
http://saintdevelopment.com/benchmarks/bfs-vs-cfs/2.6.32-mainline-batch_x264.log
« Last Edit: April 24, 2011, 10:07:19 PM by saintdev »

Offline nakTT

  • Member
  • Posts: 38
    • View Profile
Re: x264 Linux 2.6.32 scheduler benchmarks
« Reply #1 on: December 15, 2009, 12:49:47 AM »
It is your website? I thought it is DS's site

Offline saintdev

  • Member
  • Posts: 22
    • View Profile
Re: x264 Linux 2.6.32 scheduler benchmarks
« Reply #2 on: December 15, 2009, 12:53:44 AM »
It is your website? I thought it is DS's site
Yes the website is mine. The blog is DS's.

Offline saintdev

  • Member
  • Posts: 22
    • View Profile
Re: x264 Linux 2.6.32 scheduler benchmarks
« Reply #3 on: December 21, 2009, 11:09:16 PM »
New graphs run on a 720p clip. These also include benchmarks of sliced threads using the zerolatency preset.

Note: I did forget to set ratecontrol on the zerolatency tests. x264 defaults to --crf 23 in this case.

Update: Split out the bframe and non-bframe graphs for zerolatency. It was a little confusing otherwise.

--tune zerolatency --preset veryfast


--tune zerolatency --preset veryfast --bframes 3


--tune zerolatency (default/--preset medium)

--tune zerolatency --bframes 3 (default/--preset medium)


--preset ultrafast --no-scenecut --sync-lookahead 0 --qp 20 (lookahead disabled)


-B 2000 (defaults/--preset medium)


--preset veryslow --crf 20
« Last Edit: December 22, 2009, 12:14:55 AM by saintdev »

Offline saintdev

  • Member
  • Posts: 22
    • View Profile
Re: [Updated] x264 Linux 2.6.32 scheduler benchmarks
« Reply #4 on: December 25, 2009, 12:24:58 AM »
Request: We need someone who has a CPU with hyper-threading (Core i7, preferably) to run similar tests. Really we only need these on mainline, I would be interested to see how BFS performs also, but not necessary. Also, sliced threads are not necessary, but again, would be nice to have numbers for. If you're interested, the scripts I used are below. The test clips can be found at http://media.xiph.org/video/derf/

Quote
"Soccer" 4CIF clip, 3 presets, no sliced-threads
"Shields" 720p50 clip, 7 presets, sliced-threads

Notes: These scripts output to stdout, so you will need to redirect to a file when running them. Make sure you disable any frequency scaling, and background tasks (F@H). Do note that this may take a long time, on my C2Q Q9300 the second script takes about 24 hours to complete. If you run these tests, send your results to saintdev squiggly-at-thinger gmail.com

Offline xxthink

  • Member
  • Posts: 2
    • View Profile
Re: [Updated] x264 Linux 2.6.32 scheduler benchmarks
« Reply #5 on: September 20, 2010, 07:58:55 PM »
do you have test the results on the linux kernel 2.6.18?

Offline saintdev

  • Member
  • Posts: 22
    • View Profile
Re: [Updated] x264 Linux 2.6.32 scheduler benchmarks
« Reply #6 on: September 20, 2010, 08:40:44 PM »
No. Some of my hardware has no drivers in that old of a kernel, so I can't test that either.
Also, the old scheduler really sucked and is of no interest to me.