Can you clarify. If they each individually have more noise (your comment above), how do they average to less noise as a single pixel?
It's a property of math, relied upon in statistics.
Say we're imaging something half-reflective of light, and it's so dark that even a pure white object will only yield one photon in each pixel of a hi-MP sensor. So our gray object should yield a half-photon per pixel. NO single pixel will have the correct answer of half-reflective, because there's no such thing as a half-photon. Instead they'll either have twice the real value, or a zero value. Noise is +- 100%, basically! (This is like: we know the odds when flipping a coin is 50% either way, but if we then just flip a coin one time, we cannot get 50%, we only get 100% heads or 0% heads.)
Now, take 4 pixels of this hi-MP sensor, either getting 0 photons (black) or 1 (white), and sum them. Consider receipt of a photon as a coin flip. Giving 0 for no photon, and 1 for a photon, the equally likely possibilities are:
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
Now you'll see a total of:
0 -- 1 time in 16. We're reporting this larger pixel to be black, 100% off its true value.
1 -- 4 times in 16. We're reporting this larger pixel to be 25, 50% off its true value.
2 -- 6 times in 16. We're reporting this larger pixel to be 50, its true value.
3 -- 4 times in 16. We're reporting this larger pixel to be 75, 50% off its true value.
4 -- 1 time in 16. We're reporting this larger pixel to be 100, 100% off its true value.
Instead of being off the correct value of .5 by 100% as before, now we're off by: 100 * 1/16 + 50 * 4/16 + 0 * 6/16 + 50 * 4/16 + 100 * 1/16 = 37.5%.
Now, take a lo-MP sensor with 1/4 the resolution. Its pixels are big enough they'll get 4 photons from a white object. Our gray object should return 2 photons. The math works identically to the above five cases and their chances of happening, giving the same 37.5% noise.
So, back to sensors. An 80MP back-side sensor should capture as many photons total as a 20MP sensor, though that means only 1/4 the photos per pixel and thus far higher noise per pixel. But then average four neighboring pixels together and the noise level comes down to exactly the same as the 20MP sensor.