My guess - greater than 22.1 megapixels. That gives you 3x1920 pixels across, allowing for easy bining of a 3x3 pixel block from the sensor into a single pixel of 1920x1080 video... add a few pixels to each side for video stabilization and the 24MP sensor looks very likely.
How is binning at 3x3 on a grid that repeats at 2x2 easy?
Photosensors are in a 2x2 grid, but you're not binning a quad of photosensors; just the reds together, greens together (x2), and blues together... So yes, it is easy.
In the attached image you bin the following:
And you bin down right back into a bayer pattern which you can than save as-is, or demosaic, or whatever.
That method throws away half the light collected.
The 2x2 approach bins the 2 greens, 1 red and 1 blue in the block into 1 RGB pixel. Simple and no light lost.
Rather than do the above, why not demosaic and interpolate with a simple interpolation method like bilinear? That way, you'd at least keep all the light.
I'm sorry, but you apparently don't understand what binning is.
You're saying instead of demosaicing just use each 2x2 block of subpixels to create one rgb pixel. That is not binning, and would really gain you nothing. It should probably be better than line skipping, though. However this is not binning.
Binning is adding the signals from all the same colored subpixels a square block (the way phase one does it is slightly more complex, each color is in a square, but the squares for each color that are combined are offset). Binning reduces noise (main reason for doing it), and should improve low light performance (since the "metapixel" collects photons like an equivalent size pixel would. The tradeoff is reduced resolution (which is not an issue for video) and increased moire/aliasing because the AA filter is effectively nullified.