3x3 oversampling is more a curse than a blessing in RGB sensor with Bayer filter matrix.
Let's consider 6x6 block of pixels that will be downsampled to 2x2 pixels for video:
GRG RGR
BGB GBG
GRG RGR
BGB GBG
GRG RGR
BGB GBG
hm, no. your example is for 3x3 pixels block, not for 6x6. 1 pixel is:
GR
BG
so 6x6 block downsampled to 2x2 block looks like
GRGRGR GRGRGR
BGBGBG BGBGBG
GRGRGR GRGRGR
BGBGBG BGBGBG
GRGRGR GRGRGR
BGBGBG BGBGBG
GRGRGR GRGRGR
BGBGBG BGBGBG
GRGRGR GRGRGR
BGBGBG BGBGBG
GRGRGR GRGRGR
BGBGBG BGBGBG
so, each pixel for HD video has same number of subpixels: 18 G, 9 R and 9 B.
and there is no problems with downsampling.
Hmm no.

Let's go back to basics.
Each cell in RGB sensor with Bayer filter samples one wavelength coresponding to one basic color channel: red, green or blue. The two missing colors are extrapolated from surrounding samples (this process is often called "demosaicing").
Thus - one cell after demosaicing becomes full RGB triplet. One channel is sampled, other two - interpolated.
This is called pixel (picture element) and it's a basis for further processing.
Most digital cameras work that way.
Next, if we use 2x2 matrix of cells we'll have 2 greens, one blue and one red. After combining this in one pixel we'll get image with half width, half height. Each pixel of this image has all channels are sampled (but with slightly different spatial resolution). There's no interpolation, and with correct anti-alias filter we'll get near-alias-free reconstruction. This is what Canon C300 is doing internally.
If we get 3x3 matrix of cells we'll have pixels with 3 different kinds of R/G/B ratios i've described before. This is a very bad situation, because after averaging noise levels will wary depending on pixel position and channel.
There are cameras that do this but they are mostly toys (USB webcams). It's very easy to spot when you do any image analysis on such images. In high-noise situations noise has has a checkboard pattern.
From reconstruction standpoint any number of N*M cells that produce one pixel is OK as long as both M & N are even.