Author Topic: Atom optimizations appear to be causing problems  (Read 4276 times)

Offline nimble99

  • Member
  • Posts: 6
    • View Profile
Atom optimizations appear to be causing problems
« on: November 02, 2011, 12:29:05 PM »
This is a continuation of my discussion from here:

http://forums.creativecow.net/thread/291/558

and here (the stack overflow one has a lot of console output for diagnostics)

http://stackoverflow.com/questions/7936738/getting-either-incorrect-output-resolution-or-fps-from-ffmpeg

The issue appears to possibly be coming from libx264, hence I have brought the discussion here.

I am receiving an RTSP stream with ffmpeg (over tcp), which I am then transcoding in h264 for iPhone, then I segment it for iPhone HTTP streaming.
This works perfectly on my macbook pro, but when I run it on (OSX) my Atom CPU, my frame rate drops to 2fps (from the native 10), then the encode terminates after 50 or so frames.

I am using the exact same binaries on both machines (static builds), latest stable from a week ago. This is the only difference I can find between the console outputs on both machines:

Macbook: [libx264 @ 0x101027800] using cpu capabilities: MMX2 SSE2Fast SSSE3 Cache64
Atom: [libx264 @ 0x10110ec00] using cpu capabilities: MMX2 SSE2Fast SSSE3 Cache64 SlowCTZ SlowAtom

So, I assume that libx264 has some significantly different code-paths resulting from "SlowCTZ" and "SlowAtom".

I figure I can possibly comment out the lines in common_cpu.c that add the SLOW flags... would that work? But I would prefer to just add arguments to command line to correct this...

Any ideas?

Thank you very much!

[EDIT] Note: the frame rate drop is not due to pegging the CPU... ffmpeg is for some reason choosing to target the output to 2fps - CPU consumption is approx 5%.
« Last Edit: November 02, 2011, 02:16:31 PM by nimble99 »

Offline J_Darnley

  • Global Moderator
  • Member
  • *****
  • Posts: 397
    • View Profile
Re: Atom optimizations appear to be causing problems
« Reply #1 on: November 02, 2011, 12:42:20 PM »
Uh... Why do you think stopping x264 from knowing that you're using an atom will help?  The encode runs so slowly because the processor is so slow.
Knowledgeable about: cmd.exe, ffmpeg, x264

Offline nimble99

  • Member
  • Posts: 6
    • View Profile
Re: Atom optimizations appear to be causing problems
« Reply #2 on: November 02, 2011, 01:43:07 PM »
Well, I assume that the recognition of the CPU is causing it to avoid certain instructions... or just some code-paths in general to improve the performance. I believe that these alternative code-paths are causing different behavior from on my Core2... broken behavior.

The thing is, on my Atom it only consumes 4 or 5% CPU, so even if the CPU usage doubled, x3, or even x4, I'd still be happy with that consumption.
At the moment it isnt usable at all.

Im just making the assumption that removing Atom specializations wont actually break anything, because the list of 'optimizations' on the Core2Duo doesnt contain anything that is NOT on the Atom... the Atom detection simply adds two processor optimizations more than on the Core2. This is pure assumption, of course, just spit-balling.

I am hoping that if I can get it to execute the exact same code-path on the Atom that it does on the Core2... then it will work exactly the same and I will get my perfect h264 output...
« Last Edit: November 02, 2011, 01:46:47 PM by nimble99 »

Offline J_Darnley

  • Global Moderator
  • Member
  • *****
  • Posts: 397
    • View Profile
Re: Atom optimizations appear to be causing problems
« Reply #3 on: November 02, 2011, 02:02:15 PM »
Oh, right.  Now that makes a bit more sense.  If you want to try, you can search the code for where the X264_CPU_SLOW_CTZ and X264_CPU_SLOW_ATOM flags are used.

[EDIT]  Looks like common/quant.c for the former and common/dct.c, common/pixel.c and common/x86/mc-c.c for the latter.
[EDIT]  Better yet, see common/cpu.c to prevent the detection of the atom.
« Last Edit: November 02, 2011, 02:10:10 PM by J_Darnley »
Knowledgeable about: cmd.exe, ffmpeg, x264

Offline nimble99

  • Member
  • Posts: 6
    • View Profile
Re: Atom optimizations appear to be causing problems
« Reply #4 on: November 02, 2011, 02:12:11 PM »
Ok, thanks J_Darnley - I will give that a try this weekend.

If doing so corrects the issue - what would the community need (and how would I collect it) to resolve this back to a specific bug, to file it for resolution?
Obviously the ideal solution is to retain these performance optimizations, but fix the bug...

Offline nm

  • Member
  • Posts: 358
    • View Profile
Re: Atom optimizations appear to be causing problems
« Reply #5 on: November 02, 2011, 05:12:19 PM »
[EDIT] Note: the frame rate drop is not due to pegging the CPU... ffmpeg is for some reason choosing to target the output to 2fps - CPU consumption is approx 5%.

Are you sure about this? How did you measure CPU load during encoding? Did you do it during the 27 seconds that ffmpeg was actually running?

I tried your (single-threaded) ffmpeg command on a 2 GHz Core 2 Duo with a 1080p H.264 input stream and got 6 fps out. So 2 fps on Atom for a 1600x1200 H.264 input stream isn't that far from expected processing speed.

Atom probably can't even decode your input stream at 10 fps when you only let the decoder use one thread, and you are trying to downscale and encode at the same time. Since the input is a live stream and FFmpeg can't keep up with it, a buffer probably overflows somewhere in the input chain and the pipe gets closed. That's why FFmpeg stops after a while.

Things to try and do:

1. Use more decoding and encoding threads. Make sure you are using current ffmpeg for better multithreaded decoding.
2. Set encoding bitrate or use the CRF mode (try -crf 26). With your command-line FFmpeg defaults to 200 kbps and the logged QPs give a hint of crappy quality.
3. Use x264 presets.
4. If your Atom is a single-core model, there's no chance of pulling this off. You'll need something equivalent to Core 2 Duo.
« Last Edit: November 02, 2011, 05:17:15 PM by nm »

Offline nimble99

  • Member
  • Posts: 6
    • View Profile
Re: Atom optimizations appear to be causing problems
« Reply #6 on: November 02, 2011, 05:37:27 PM »
Hi nm.
I watched the CPU usage during the entire process lifecycle - didnt peg, but then I dont know that it was actually decoding or encoding properly.
This is my hardware (dual core Atom 330): http://www.tomshardware.com/reviews/zotac-ion-atom,2300-11.html

The CPU is pretty quick - VLC has no trouble playing the stream - I havent looked at the CPU when VLC renders the stream - I will check that out tonight.

What are these 'presets' that I could use? Can I load presets, then just specify the args that I want to override (such as resolution)?
My intention is specifically to stream live video to an iPhone (using the Apple segmenter), so I would need a preset geared towards that.

Thanks for the suggestions, I am an ffmpeg noob as far tuning goes, so I will definitely try your suggestions, thanks for that. I will report back asap.

Offline nm

  • Member
  • Posts: 358
    • View Profile
Re: Atom optimizations appear to be causing problems
« Reply #7 on: November 03, 2011, 08:50:20 AM »
I watched the CPU usage during the entire process lifecycle - didnt peg, but then I dont know that it was actually decoding or encoding properly.

I think it was running as it should with your command line, for a while. During the process, the overall load should be either 50 % (HT disabled) or 25 % (HT enabled), indicating that one virtual core is fully loaded. Use a monitoring tool that shows a graph over time for all CPU cores (four virtual cores on Atom 330 running in hyperthreading mode).

At 5 % load the processing speed should've been much lower than 2 fps.

This is my hardware (dual core Atom 330): http://www.tomshardware.com/reviews/zotac-ion-atom,2300-11.html

The CPU is pretty quick - VLC has no trouble playing the stream - I havent looked at the CPU when VLC renders the stream - I will check that out tonight.

Well, maybe the stream is low-bitrate and compressed with CAVLC instead of CABAC. But I'm pretty sure it still pegs one core, which means that you need to use more decoding and encoding threads.


What are these 'presets' that I could use? Can I load presets, then just specify the args that I want to override (such as resolution)?

Yes. Resolution isn't covered by the x264 presets. Other encoding parameters that are set by the preset can be overridden. With current FFmpeg/libav, you can use standard x264 presets and tunings (see x264 --fullhelp for more information). They make it much easier to set things up.

Offline nimble99

  • Member
  • Posts: 6
    • View Profile
Re: Atom optimizations appear to be causing problems
« Reply #8 on: November 03, 2011, 11:53:38 AM »
Hi nm,

Ok - I have had another play.
I recompiled libx264 and ffmpeg without the Atom detection, and the frame rate immediately bumped to 7 fps - still not quite 10.
You were absolutely right - the CPU was pegging - but just for a moment. I had to run it a few times to see it (due to the resolution of TOP/Activity Monitor). I think it was most certainly overloaded. It still died after about 10 seconds with some form of '[libx264] error GOP corruption'.
So, I reverted back to the default libx264/ffmpeg build and actually went to my camera and tuned the resolution down to 640x480@5fps. I did this to reduce the CPU load, because I thought perhaps the CPU load was causing, as you said, buffers to over/under flow.

The encode now works perfectly, it consumes about 25% on each of the 4 Hyper-threads. This is pretty much what you said it should. I wouldnt want to use much more power than this, because it is running 24x7.

The irony now is, if I have to tune the source down to 640x480 to make an acceptable CPU load, then I may as well just use -vcodec copy because I'm not getting my 1:1 iPhone 4 pixel ratio anyway!! In which case it uses 5% CPU because ffmpeg isnt really doing much.

I did test VLC again (against the 1600x1200 source), and it was using about 10%-15% CPU on each core to DECODE and render the x264 stream. So its expected that it takes 10 times the grunt to down-scale and re-encode the stream to 960x640?

So, until VLC 1.2 with its httplive access filter, I'll probably stick with this. I tried compiling VLC 1.2 last night but failed miserably - the forums seem to suggest the trunk is not in a buildable state - shame. If anyone out there has a running OSX build of 1.2 (no video output needed, just command-line transcoding) that they can donate to me, please, let me know!

Final result: the ffmpeg+segmenter has been running for 12 hours now, no hiccups, 25% CPU load on Atom 330 4 Hyper threads.

Thanks again nm, and J_Darnley - hopefully this thread contains enough detail to be of use to anyone else trying to do the same thing with an RTSP stream.

Offline nm

  • Member
  • Posts: 358
    • View Profile
Re: Atom optimizations appear to be causing problems
« Reply #9 on: November 03, 2011, 02:18:16 PM »
Ok - I have had another play.
I recompiled libx264 and ffmpeg without the Atom detection, and the frame rate immediately bumped to 7 fps - still not quite 10.

Well, such a speed increase is unexpected too. I might try playing around a bit with the optimizations later. Too bad I don't have an Atom CPU here.

The irony now is, if I have to tune the source down to 640x480 to make an acceptable CPU load, then I may as well just use -vcodec copy because I'm not getting my 1:1 iPhone 4 pixel ratio anyway!! In which case it uses 5% CPU because ffmpeg isnt really doing much.

Yep. If you care about energy consumption, it's best to use the H.264 stream that your camera provides as-is. Or only encode when somebody is watching.

I did test VLC again (against the 1600x1200 source), and it was using about 10%-15% CPU on each core to DECODE and render the x264 stream. So its expected that it takes 10 times the grunt to down-scale and re-encode the stream to 960x640?

Hmm. 15 % load sounds very low if VLC is using a software decoder. But maybe that's possible if the video is mostly stationary and noise-free, and is encoded efficiently.

I'd like to take a look at your 1600x1200 H.264 video, if you could upload a sample clip to MediaFire for example?
« Last Edit: November 03, 2011, 02:23:13 PM by nm »

Offline nm

  • Member
  • Posts: 358
    • View Profile
Re: Atom optimizations appear to be causing problems
« Reply #10 on: November 06, 2011, 03:25:54 PM »
Ok, I looked at your 1600x1200 stream, and it's heavily compressed with an average bitrate of 600 kbps. My 2 GHz C2D can downscale and re-encode it at 50 % load on a single core, so with some threading your Atom 330 should be able to handle it reliably.

This command line gave ~400 kbps output without causing too much additional damage to quality:

Code: [Select]
openRTSP -v -c -t "rtsp://server/stream" | ffmpeg -r 10 -i - -y -an -f mpegts -vcodec libx264 -r 10 -s 960x640 -preset veryfast -crf 28 -threads 3 -g 20 out.ts
I also enabled SlowCTZ and SlowAtom, but couldn't see any performance degradation on 64-bit Linux.
« Last Edit: November 07, 2011, 03:54:20 AM by nm »

Offline nimble99

  • Member
  • Posts: 6
    • View Profile
Re: Atom optimizations appear to be causing problems
« Reply #11 on: November 06, 2011, 03:44:05 PM »
Hi nm,

Thanks for that, I will give it a try tonight and see how it performs on Atom 330 and report back.
I am also piping the output to the apple segmenter which will be additional overhead.

I am using the standard 'ffmpeg iphone' settings (something link this, taken from cheat-sheet)
ffmpeg -y -i input -r 30000/1001 -s 480x272 -aspect 480:272 -vcodec libx264 -b 512k -bt 1024k -maxrate 4M -flags +loop -cmp +chroma -me_range 16 -g 300 -keyint_min 25 -sc_threshold 40 -i_qfactor 0.71 -rc_eq "blurCplx^(1-qComp)" -qcomp 0.6 -qmin 10 -qmax 51 -qdiff 4 -coder 0 -refs 1 -bufsize 4M -level 21 -partitions parti4x4+partp8x8+partb8x8 -subq 5 -f mp4 -pass 1 -an -title "Title" output.mp4

Is all the extra stuff in that (quality/qfactory/coder/partitions etc) unnecessary because I am using -vcodec copy?

Offline nm

  • Member
  • Posts: 358
    • View Profile
Re: Atom optimizations appear to be causing problems
« Reply #12 on: November 07, 2011, 03:52:05 AM »
Is all the extra stuff in that (quality/qfactory/coder/partitions etc) unnecessary because I am using -vcodec copy?

Yep. You only need this when copying:

Code: [Select]
ffmpeg -r 10 -i - -y -f mpegts -vcodec copy -an -

When re-encoding, my command line should work for iPhone 3GS and later models since they can decode High Profile H.264 video at relatively high bitrates.