My guess - greater than 22.1 megapixels. That gives you 3x1920 pixels across, allowing for easy bining of a 3x3 pixel block from the sensor into a single pixel of 1920x1080 video... add a few pixels to each side for video stabilization and the 24MP sensor looks very likely.
How is binning at 3x3 on a grid that repeats at 2x2 easy?
Photosensors are in a 2x2 grid, but you're not binning a quad of photosensors; just the reds together, greens together (x2), and blues together... So yes, it is easy.
In the attached image you bin the following:
And you bin down right back into a bayer pattern which you can than save as-is, or demosaic, or whatever.
That method throws away half the light collected.
The 2x2 approach bins the 2 greens, 1 red and 1 blue in the block into 1 RGB pixel. Simple and no light lost.
Rather than do the above, why not demosaic and interpolate with a simple interpolation method like bilinear? That way, you'd at least keep all the light.