For Pixel-binning it's needs less power with less pixels.
The most simple algorithm and less power consuming is calculating averages.
Example of double in one direction (quad) resolution of HD:
width: 1920 x 2 = 3840.
aspect-ratio of photo: 3:2.
2/3 * 1920 = 1280 (1080 fits in it).
height: 2 * 1280 = 2560
megapixels: 3840 * 2560 = 9,830,400
Example of triple in one direction (nine-times) resolution of HD:
width: 1920 x 3 = 5760
height: 3 x 1280 = 3840
First example would be perfect for a aps-c camera.
And you get real RGB-pixels in the video-mode, where as the pixels of the camera are only sub-pixels (Bayer-matrix).
In fact it is sRAW.
Second example would be perfect for a full-frame camera.
Although dividing by 3 is not possible with bit-shifting, and block of 3x3 don't contain the same number of R, G or B-subpixels (Bayer-matrix).
Finally a resolution of : 4 x 1920 by 4 x 1280
Blocks of 4x4, can also be done with bit-shifting, would result in a 39,3 megapixel camera (full-frame).
Compared to the 18mp aps-c 7D this could be done.
And a sRAW of 9,8 megapixel is very good!
I see the first option probably being the most ideal for a 1st generation video-stills hybrid camera if it can handle it. One thing that must be taken into account is getting such a huge quantity of data off the sensors fast enough; using a 2x2 sampling matrix for pixel binning only lets you bin a pair of green pixels, R and B can't be binned without "cheating", so you're still scanning 75% of every pixel post-binning on the camera's 16:9 area. But you would get an R,B and binned G value for every 2x2 block, which eliminates the need for interpolation. Detail should be much higher than what you see now.
At 75% you're looking at pushing through numbers of 220 million pixels per second at 30fps and 177 million pixels per second at 24fps (rolling shutter would be horrid with today's sensors, the scanning process would definitely have to be sped up).
If you started cutting corners and cheated by say, scanning the R and B of only every other 2x2 group you could cut the required number of scanned pixels from 75% to 50%, producing numbers that are a little lower (118 million pixels/sec @24fps). You'd have to interpolate the missing R and B values, but that shouldn't be too hard since all the adjacent 2x2 blocks have them.
118mp/sec is still a lot but I believe the 7D dual processors can already handle those kind of numbers (a shame they kept the 5D's scanning routine). The sensor would have to be designed to handle the high heat buildup and the scanning process would have to be sped up quite a bit in order to avoid what would be horrible rolling shutter. By contrast the 5D should have to use about 72 million pixels per second for its video mode by omitting 2 of 3 lines (assuming the "bin & skip" routine I think is occurring is correct)
I'm not sure how a 3x3 binning routine would work but if the chip could be wired for binning up to 3 rows in differing patterns I don't see why it couldn't be done. Yes every 3x3 block would be unequal, with either 4 or 5 green pixels and either 1 or 4 R or B pixels but you'd still get a full RGB sampling from every block. After binning you'd be looking at a scan rate of 33% of your 16:9 cropped area. Panasonic MAY be doing something like that with its GH1 but it may be a more primitive routine.
Another option is using a lower mp sensor still and just do a full scan of all its pixels, demosaic and then downsample to 1080 to get your final output. From a video standpoint this is the most desirable option since the AA filter could be used and on Bayer sensors you get the best quality picture from oversampling and downsampling. But how many people would buy, say, a 5 or 6 megapixel camera that would surely not be cheap?
Finally you'd have to consider how to process such a huge amount of data. RAW would sidestep this but would require an insanely fast card to write it to. Another option that would be more affordable may be to have the highest quality mode (2x2 or full scan) available only for, say, 30 seconds. The data could then be unloaded into a large buffer and processed bit by bit (not in realtime). This is what the Casios do for their burst modes.
We'll see what happens, it'll be interesting