26 Apr 2010 conrad   » (Journeyer)

Multi-camera, multi-resolution hardware encoding (libshcodecs 1.1.0)

I just released libshcodecs 1.1.0, a user-space library for controlling Renesas SH-Mobile hardware codecs. These tools now use libuiomux and libshveu for device access, memory management, colorspace coversion and rescaling.

The big feature is that it can now do simultaneous encode and decode of multiple streams. Coolest is that the shcodecs-record tool can handle multiple V4L2 camera interfaces, and can encode multiple streams of different resolutions from each camera source. And it can do this without breaking a sweat:

# time shcodecs-record -P k264-v4l2-vga.ctl k264-v4l2-vga-cam2-null.ctl k264-v4l2-qvga-null.ctl k264-v4l2-qvga-cam2-null.ctl
[0] Input file: /dev/video0
[0] Output file: /dev/null
[1] Input file: /dev/video2
[1] Output file: /dev/null
[2] Input file: /dev/video0
[2] Output file: /dev/null
[3] Input file: /dev/video2
[3] Output file: /dev/null
Camera 0 resolution:  640x480
Camera 1 resolution:  640x480
[0] Encode resolution:  640x480
[1] Encode resolution:  640x480
[2] Encode resolution:  320x240
[3] Encode resolution:  320x240
Target framerate:   30.0 fps
  Encoding @ 29.48 fps  (avg 30.04 fps)
Elapsed time (capture): 33.3 s
Captured 1000 frames (30.00 fps)

Elapsed time (capture): 33.3 s
Captured 1000 frames (30.00 fps)
[0] Elapsed time (encode): 33.3 s
[0] Encoded 1000 frames (30.04 fps)
[1] Elapsed time (encode): 33.3 s
[1] Encoded 1000 frames (30.04 fps)
[2] Elapsed time (encode): 33.3 s
[2] Encoded 1000 frames (30.04 fps)
[3] Elapsed time (encode): 33.3 s
[3] Encoded 1000 frames (30.04 fps)

real    0m34.137s
user    0m1.016s
sys     0m0.748s

That's 4 simultaneous H.264 encodes using 2 VGA camera sources, encoding each into both VGA and QVGA in realtime at 30fps, and using < 2% of this 500MHz SH7724 CPU:

top - 07:45:01 up 17 min,  2 users,  load average: 0.02, 0.04, 0.05
Tasks:  51 total,   2 running,  49 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.9%us,  0.6%sy,  0.0%ni, 95.2%id,  1.6%wa,  0.3%hi,  0.3%si,  0.0%st
Mem:    248332k total,    78380k used,   169952k free,        0k buffers
Swap:        0k total,        0k used,        0k free,    24672k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 1526 root      20   0 68412 1676 1256 R  1.9  0.7   0:00.89 shcodecs-record
 1482 root      20   0  2976 1188  980 R  0.3  0.5   0:07.67 top
    1 root      20   0  2372  708  620 S  0.0  0.3   0:01.63 init
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kthreadd
    3 root      20   0     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/0

Similarly, a single 720p encode:

# time shcodecs-record -P k264-v4l2-720p.ctl 
[0] Input file: /dev/video0
[0] Output file: /dev/null
Camera 0 resolution:  1280x720
[0] Encode resolution:  1280x720
Target framerate:   30.0 fps
  Encoding @ 31.11 fps  (avg 29.97 fps)
Elapsed time (capture): 33.4 s
Captured 1000 frames (29.97 fps)
[0] Elapsed time (encode): 33.4 s
[0] Encoded 1000 frames (29.97 fps)

real    0m33.887s
user    0m0.684s
sys     0m0.492s

using < 2% CPU:

top - 07:53:51 up 26 min,  2 users,  load average: 0.02, 0.03, 0.03
Tasks:  50 total,   1 running,  49 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.3%us,  0.6%sy,  0.0%ni, 96.1%id,  1.9%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:    248332k total,    78168k used,   170164k free,        0k buffers
Swap:        0k total,        0k used,        0k free,    24736k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 1618 root      20   0 49152 1532 1256 S  1.3  0.6   0:00.93 shcodecs-record
 1482 root      20   0  2976 1188  980 R  0.6  0.5   0:11.15 top
    1 root      20   0  2372  708  620 S  0.0  0.3   0:01.63 init
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kthreadd
    3 root      20   0     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/0

Of course the reason the CPU is doing so little work is that it is just acknowledging interrupts and setting up the rescale and encode hardware to do the actual work. This shows the kind of results that can be achieved when hardware manufacturers include ASIC support for video encoding ;-)

This version of shcodecs-record uses a new shcodecs_encoder_run_multiple() function which runs multiple encoder instances in a consistent order, interleaving the encoding of individual frames. This allows the encoded output to be used in a realtime streaming environment.

libshcodecs-1.1.0 also includes support for running encoders and decoders in parallel threads, a feature developed by Phil Edworthy of Renesas Electronics Europe. We'll be using this in some GStreamer plugins under development (gst-sh-mobile), to make it even simpler to make use of this hardware video acceleration in applications.

Syndicated 2010-04-23 09:06:00 (Updated 2010-04-26 09:59:34) from Conrad Parker

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!