I disagree. You are considering only theoretical properties, not construction limitations.If you are comparing pixel to pixel you're totally missing the point when one has pixels that are one-half the linear dimensions and one-fourth the area of the other.
Viewing a 22-23 MP image at 100% on a 23" HD monitor is like looking at a piece of a 60"x40" enlargement.
Viewing an 88-90MP image at 100% on the same monitor is like looking at a piece of a 120"x80" enlargement.
Viewing the 88-90M image at 50% gives the same enlargement size as viewing the 22-23MP image at 100%. Both are the same sized piece of a 60"x40" enlargement.
Since noise is random (that's what makes it "noise"), averaging multiple noisy pixels together makes the averaged larger pixel less noisy and reduces the standard deviation between each pixel than the four smaller pixels it replaced.
Possibility of noise is defined by Signal-To-Noise ratio (SNR). SNR affects readout of the pixel. Larger pixels generally have better SNR than smaller ones. That's why FF camera with can achieve higher ISO with less noise than ASPC camera with similar number of pixels.
But let's for now consider that both large and small pixel have same SNR. When you read large pixel, you get read noise X. When you read small pixel, you get read noise X as well (because we are considering same SNR). To create large pixel from small ones, you need to read 4 of them because combining happens on digital representation, not analog => input read noise is 4 times X. Now your digital processor has to work to average that 4X. Since it has in theory 4 times more information, it can produce better result than readout from large pixel.
But that is just theory because large and small pixel does not have same SNR - that is the whole point why we don't have only high megapixel cameras with ability to bin pixels. Using smaller pixels reduces sensitivity and dynamic range. There is also fill factor - not whole area of pixel is photo sensitive. Fill factor means how big area of the pixel really captures the light. Fill factor is dependent on manufacturing technology but again, larger pixel has larger fill factor than smaller one because amount of electronics does not increase with pixel size. There are other technologies like microlenses or back illuminated sensors which improve fill factor significantly but they are much easier to do right on larger pixels.