You say "with identical technology, size of sensor is all that matters". I disagree. If you simply made a bigger 7D sensor, with the same technology and same pixel density, then it will have the same noise characteristics as the 7D sensor.
This is totally false. If you make the 7D sensor bigger, you'll have more of the same pixels AND about a stop and a third better noise performance, assuming constant f-stop and constant framing. That means, for the same image, you're going to have to either get closer or use a longer focal length.
The right-hand column of this image demonstrates this. It's all the same sensor (and so all the same pixels) just using different sized portions of that sensor, and reframing to keep the final image framing constant. According to what you said above, the noise performance should all be the same. It isn't, and it isn't even close. The left column demonstrates by just how much. It's exactly how much you would think - the light you've lost with cropping is the amount of noise performance you've lost.
Well, how do you re-frame to keep the image on the larger sensor the same? With the same lens/optics, you need to get closer. But then you're getting more light on the lens, and thus on the sensor. Let's assume the large sensor is twice the diagonal size of the small sensor. You'll have to halve the distance to the subject. So 4 times as much light but also 4 times as many pixels. So same light on each pixel. Each pixel on the large sensor thus has the same SNR as those on the small sensor, but you could downsample, combining groups of 4 pixels to get the same image and number of pixels as the smaller sensor, but with better noise performance by a factor of two (averaging N pixels drops noise by sqrt(N)). But really this is because you've moved closer and thereby increased the light (signal).
On the other hand, without changing position, you could use a different lens to fill the larger sensor with the same view (so keeping the framing the same). This implies an increase in the focal length, which, for the same aperture, implies an increase in the f-stop, i.e. a reduction in the light density on the sensor. We've kept the total captured light the same but spread it over a larger area with more pixels, so the per-pixel SNR would decrease with the larger sensor. Again, you could combine pixels, downsampling, to improve the SNR. But I think only by a factor of two (again assuming the large sensor diagonal is twice that of the smaller sensor). So worse SNR as compared to the small sensor (but higher resolution due to more pixels).
One difficulty with this whole discussion it that one wants to say "Keeping everything else the same, here's what happens when you change the pixel size...". But it's actually impossible to keep everything else the same. Same optics, same lens , same shooting location, same framing, same viewing size, etc. One issue raised with the original post (which was excellent, by the way) was the upsampling applied to the full-frame image. But if you want to view them so the moon is the same size on your screen in both images, you need to either upsample one or downsample the other. Otherwise one image will be bigger than the other, making comparison problematic.
One other thing. Based on my somewhat crude calculation above (maybe this is well-known to the rest of you), it seems like from an SNR standpoint (with sensor size fixed), you are better off using bigger pixels, rather than subdividing each big pixel into smaller pixels, then averaging/downsampling them to recover the same number of pixels (as with the big pixels). I'm assuming that the noise comes from the electronics downstream of the light-gathering component, so that a big pixel has the same absolute amount of noise as a small pixel (but more signal), so 4 times the area means 4 times the SNR, whereas combining pixels will add the 4 light values, but also the 4 noise values. Assuming the noise is random and independent, you'll get some noise cancellation but only a Sqrt(4)=2 factor reduction, so lower SNR than the big pixel. To put this in practical terms, you get better SRN from the HTC One's 4 MP camera than downsampling the Nokia 1020's 40 MP image to 4 MP (assuming the same sensor size and optics, which may not be the case, but you get my point).
Downsampling uses averaging, not adding. If you added, then you would end up with a bunch of blown pixels. Averaging reduces noise, where adding does not. So downsampling has the exact same effect on SNR as using larger pixels or binning smaller pixels in hardware. Additionally, noise is poisson. If you have a pixel with twice the pixel pitch, you have four times the area, and you have four times the SNR...but you still have SQRT(4) noise. A pixel twice the pitch still only has half the noise. It doesn't matter if you use a larger pixel, or bin/average smaller pixels together. It doesn't even matter if you integrate four separate frames with the same noise together. It's always the same noise in the end. A pixel four times the area, averaging four pixels together, integrating four separate frames, all have SQRT(4) the amount of noise.
I was a bit sloppy. I agree and was trying (sort of) to say the same thing. Averaging involves adding, then normalizing and it is the adding that affects SNR. I was imagining a low-noise situation where one was trying to get the "signal" into a reasonable range. Imagine shooting a white card in low light. You want it to be 8-bit-RGB=(256,256,256), but it is coming out at (64,64,64). To get it to the "correct" value, you would scale/normalize x4. Alternatively, you could add the values of 4 pixels together (no normalization necessary) and increase SNR at the same time. From an SNR standpoint, it is the adding step of the averaging that is significant. The divide-by-number-of-pixels can be lumped with the downstream normalizing you do via ISO/Levels/Curves or whatever since it doesn't change the SNR.
More interestingly (to me at least) is that you seem to be saying that the noise is intrinsic to the light capture at the front-end of the processing chain and not due to downstream amplification (or whatever). So (effectively) turning the photon count into (say) a voltage already has the Poisson noise (Shot Noise), and increasing the pixel area already does the adding. Thus 4 times the area generates 4 times the signal and Sqrt(4)=2 times the noise, so 2 times the SNR, before any subsequent processing (e.g. ISO-related amplification).
This gets at the point I was trying to make in comparing the HTC One vs the Nokia Lumina. And sagittariansrock is basically asking a similar question: Is it better to have pixels 4 times bigger or have 4 times as many pixels and downsample/average them together (by a factor of 4) in low-noise situations. Sounds like you are saying that the result is the same (SNR-wise). It which case (as sagittariansrock says) it would seem to make sense to put as many pixels on the sensor as possible and simply average/downsample in high-noise (low light) situations. When light is ample, you don't downsample and you have the advantage of higher resolution.
Also, your not quite right about a smaller aperture reducing SNR to a level below that of the smaller sensor. If you really do have two sensors, one with half the diagonal, then you could use a 100mm f/4 on the larger sensor and a 50mm f/2.8 on the smaller. That would get you identical framing. In that case, the total amount of light reaching the sensor is also identical.
Hmmm. Doubling the focal length would scale the image diagonal to produce the same framing on the larger sensor. Decreasing by one stop from 2.8 to 4 would cut the light intensity by a factor of 2. But with 4 tmes the area, it seems like the large sensor would capture 2 times the total light. I think the issue is that from 2.8 to 4, the aperture diameter increases by sqrt(2), not by 2 as the sensor does. So seems like a 100mm f/5.6 would produce the same total light on the large sensor as the 50 mm 2.8 on the small sensor. In this case, when you consider the actual aperture diameter, 100/5.6 on the 100mm vs. 50/2.8 on the 50 mm, you get the same diameter. If we suppose that the lens cost is mostly based on the diameter of the aperture (lens pricing being what it is, reality may be totally different, of course), the cost of the two lenses would be about the same. So even though two different lenses are involved, the cost is about the same, which makes for more of an apples-to-apples comparison lens-wise.
But if the total light has stayed the same, while the total Poisson noise has increased with the area by SQRT(4)=2, it seems like we have lost overall SNR. If the SNR is still ample, then no big deal. Otherwise, it seems that we would be better off concentrating the light on a smaller sensor area where we would get less total noise, so for a fixed number of pixels, you'd get higher pixel SNR.
THAT right there is exactly what equivalence is all about. But...pixel size isn't a factor. Because downsampling averages (which involves first adding, yes...but then dividing) pixels together, when you NORMALIZE, pixel size doesn't matter. Two large sensor cameras with different pixel sizes are still going to gather the same amount of light for any absolute area of the subject.
Right, so for a given sensor size, both the total signal and total noise are the same. But on a pixel-by-pixel basis, as you shrink the pixels by a factor of N, the signal goes down as 1/N but the noise goes down as 1/sqrt(N) (the reverse of the averaging/downsampling/binning situation. So increasing resolution involves decreasing SNR for each pixel. But if you fix the output resolution (monitor or print) then the sensor pixel size doesn't matter as long as the sensor resolution exceeds the output resolution. If you have more sensor pixels (with more noise) you'll just average them back together for each output pixel getting back to the same SNR. On the other hand, if the output resolution is higher than the sensor resolution, then, well, how do you up-res? You have some type of resolution-noise tradeoff.
A small sensor camera, for an identically framed subject (50mm f/2.8 instead of 100mm f/4) is going to gather the same amount of light for the same absolute area as the larger sensor camera...however it's only gathering the same amount of light because of the wider aperture. Slap a 100mm f/2.8 lens on your larger sensor, and it is now gathering twice the amount of light. (Plus, there are other benefits with the larger sensor...narrower depth of field, or a wider field of view, etc.)
Of course a 100mm f/2.8 will be pricier than a 50mm f/2.8. But I'm nitpicking.
Pixel size is irrelevant. SNR, and therefor dynamic range (assuming you have no other source of noise than what is inherent to the image signal itself) and noise are ultimately relative to total sensor area. That's it.
I think I'm finally seeing how things are getting confused. One needs to fix the output size & pixel count. Then the SNR of each output pixel is independant of the sensor pixel size---for a fixed sensor area. The SNR of each sensor pixel varies with the size of the sensor pixel, but the output pixel SNR does not. If the output is being displayed in a 1500x1000 window (on, say, a 90 dpi display), then a full frame sensor of 1500x1000 pixels will produce the same result as a 3000x2000 pixel full frame sensor. The individual pixels of the latter sensor will have lower SNR, but you'll be averaging 4 pixels to produce each output pixel and end up with the same SNR. So it all washes out. But this brings us back to the earlier question: Why isn't the finest pixel size used across all sensor sizes, since the effect of a larger pixel can be gained by downsampling (but the reverse cannot)? I assume that the cost of the electronics and/or lower yield with finer features is the reason?