Great points, everyone!
Take a case of the same camera, same lens but test two sensors of different sizes but same design, same spatial resolution (pixel pitch/size). The noise is the same, to say otherwise is suggesting that the noise in the centre of any given sensor is higher than that around the edges. ...
Now consider two sensors, one larger than the other, with the same number of pixels. Camera and lens are still the same. The smaller sensor has more noise, but not because the "sensor is smaller" per the above but rather because the photosites are smaller and each collects less photons therefore the SNR is higher. ...
So what about noise-reduction algorithms. They work better the more information (resolution) there is to work with but megapixels are megapixels. It doesn't matter whether the data (say 18MP) comes from a FF, APS-C, or smaller sensor. The algorithms work on "data" and it's irrelevant whether that data came from a larger or smaller sensor. Again, I'm taking the case here the same scene is imaged on the sensor regardless of size. The NR algorithms work better but the image starts with more noise... the final image from the sensor with smaller photosites is still noisier and in addition has the NR softening so overall the image quality is worse.
The missing factor is one discussed above, as highlighted in bold by NotABunny. What we generally care about is noise in the image, not noise from each pixel. To the point in your above examples, the missing factor is that with a smaller sensor, you have less total light hitting the sensor. The problem, in part, is that most ways of measuring sensor performance do so at the pixel level - that's how sensor performance is assessed in the QC setting, for example. So, sensitivity is measured in photoelectrons/lux second/pixel. Read noise is measured in RMS electrons/pixel or ADU/pixel. Dynamic range is measured in stops/pixel or dB/pixel, etc. All of those measurements ignore the total image, and the spatial fraquency of the sensor - and that's what determines image noise (which is what we really care about).
In terms of noise reduction algorithms, that's an interesting point. As far as I know, the in-camera NR processing is done only at the pixel level, i.e. each photosite is treated independently and processed as such. When NR is done in post-processing, empirically, some software does a better job than others, e.g. DxO, NoiseNinja and Topaz Denoise all do a better job at reducing noise while maintaining sharpness than Canon's DPP. I wonder if that's because like the in-camera NR, DPP is doing NR at the pixel level (albeit with more powerful algorightms), while the superior NR programs are using some sort of nearest neighbor analysis to reduce noise of individual pixels based on signal in surrounding pixels.
A good distinction to make here for folks who might not realise, is that the noise we see in any average to well-lit scene is primarily due to photon noise not due to the electronic noise (read noise) that most people think about when talking about noise. Photon noise actually increases with increased brightness but only increases as the square root of the number of photons absorbed by the sensor so the SNR of brighter scenes is higher than darker scenes.
This makes sense, as well.
It also raises another issue - how the ISO noise tests are commonly done. If you're shooting a test scene/chart and successively increasing the ISO, you have to decrease something else in parallel to keep the exposure the same. You can vary the aperture, but then you're not shooting the same scene because the DoF is changing (and OOF areas subjectively appear to have more noise). So, if you keep the aperture the same so you're taking the same picture, then you're decreasing shutter speed to compensate for the increased ISO - and that minimizes the contribution of shot noise. I don't really think tests done like that are fair, since they don't represent real-world use (i.e. it's not common to use ISO 3200 and 1/4000 s shutter speed - if you have that much light, you shoot at a much lower ISO, right?).
Like NormanBates, I've done some empirical testing, in this case during the course of comparing the 7D and 5DII. When I used a constant f/8 and varied shutter speed in tandem with ISO, from 1/30 s at ISO 100 to 1/8000 s at ISO 25600, the noise certainly increased with increasing ISO, but honestly it didn't look as bad as I thought. But as I stated, would anyone intentionally shoot a real-world shot at ISO 25600 and 1/8000 s? When I took a different approach, keeping aperture and shutter speed constant and knocking down the illumination a stop at a time with ND filters (less light, just like when you'd use high ISO in the real world), the ISO noise got worse much faster. To me, the noise in the ISO 25600 1/8000 s shot
looks fairly similar to that in an ISO 6400 1/60 s shot
(with 6-stops of ND to simulate dim lighting).
based on those results, I'm on the "more megapixels means more noise per pixel, but not necessarily more noise per image" camp
So am I, although I'd state it as "smaller pixels means more noise per pixel, but not necessarily more noise per image."