If you bin the 4 pixel from the same color,you will have a big effect on the sharpness. because the binned pixel will be over lapping the other 2 color. You will end up a blurry picture. Let us face it, if the binning is so easy, All camera manufacturer could have done it.
That's the point. Binning isn't as easy as one might imagine, due to the mechanism how bayer sensors work. For Foveon-Type-Sensors it's pretty simple. Basically it's like this: Take the information from the pixels that are designated for binning, add their values and divorce by the number of binned pixels to get your result.
Bayer is more complicated even without binning, since in a microcluster you have 2 green, 1 blue and 1 red sensitive pixel (due to the different bandwidth of light that arrives at pixel level). But rather than those four pixels beeing "smashed" together to result in one final pixel color (which would also result in some kind of binning) the pixels itself influence each other in values. So for example a blue pixel can deliver yellow color if blue light is none, but the green an the red pixels next to the blue one deliver high values.
Now you might have to options to work with:
a) "Glueing together" the microcluster (2x2) to result in one larger pixel. This however would be the easy way to achieve what is considered more sensivity per pixel, but on the other hand, you don't really increase dynamic range and only poorly increase color sensivity due to the way, the pixels influence each other already at the current state of technics in bayer sensors.
b) Clustering the microclusters (2x2 clusters of 2x2). This is pretty hard to imagine, since "sticking together" the clusters would result in variant a) but that's not the goal. The thing how this works out is to have 4 times the information of green, 4 times the information of red and 4 times the information of blue (so a cluster would have 8 green, 4 red, 4 blue pixels) to interact with each other, rather than really reducing it to four single sources of information. let's take the picture i posted before to have a look at it: Pick a single blue pixel - for casual bayer calculation, based on this layout, your information-giving neighbor-pixels are the green on top, the green on left, the red on upper left. In a binned enviroment you would consider the 4 inner pixels as your "binned pixels" from which surround arranged pixels you get that information needed to produce the color. For example, take the upper left blue one: Information for Red is four times available from upper right, lower right, upper left, lower left - now these values are added up and divided by four to achieve the value for red. for green these would be the upper and the left one, the upper and the right one, the lower and the right one and the lower and the left one, resulting in 8 pixels ginving information for green, divided by 4 to achieve the resulting value. same goes for all other pixels in this 2x2-center of the cluster. the reduced overall pixels are due to line interleaving of the outer pxiels, which are only used as interferring pixels.
the result is, you get a slightly less good signal-noise ratio than in variant a) but an impressivly improved color-sensivity and also an sligthly increased dynamic range in higher iso-modes.
however neither of the options I described is, what binning would practically work like in bayer sensors since it's a whole lot more complicated. but it should give you an example for why it's not implemented in current "low cost" cameras, since the processors just cant provide enough calculation-ressources to do the job for variant b). Variant a) on the other hand is just the same you could achieve bei scaling the image on your computer down to 25%, taking an algorithm that respecs the medium color values for each pixel that is combined. as i stated earlier, for foveon-type sensors, this method really works out well, since every pixel already has all the information provided on it's own. For bayer-type, this might result in some kind of blurry thing.