I'll add something to your explanation of the sensor read out process. In a CMOS sensor the electrons are not emptied during readout. It has to be reset after the readout. In a CCD they are emptied sequentially as they're read out.
Yes, you're right. The pixel is first measured, then emptied (reset). A CCD has to physically transfer the charge to the border of the chip where it's read out, that's why CCD readouts are destructive while CMOS's are not.
Since it's trivial to set up, I repeated neuroanatomist's experiment but cannot confirm his result that the noise depends on exposure time. I produced a scene that at f/4 was 75% well exposed at 1/2000s and iso 6400 (with a 5D2). Since we are interested in the detector rather than the lens I set the focus to infinity to blur any details of the scene (just a blank homogeneous wall).
For the second image, I used the same setting but with a Hoya NDX400 filter and 1/5s exposure instead. I chose the time carefully so that the exposure would be the same (judging from the on-screen histogram), and predictably the difference in exposure time was a factor 400 (expected because of the 400x ND filter).
I set noise reduction and sharpness enhancement to 0 in DPP, and white balance to "white fluorescent light" to keep it constant (and since it matches the lighting conditions). The same recipe was used on both exposures.
The attached image shows 100% crops of the identical region, the left being from the 1/2000s image and the right from the 1/5s image. Apart from the obvious colour cast introduced by the filter, there is no discernible difference in the noise. Computing the noise (stdvar) of the image confirms the visual impression; for the left/right part of the (R,G,B) channels, the noise is (3.03/3.06, 2.43/2.40, 3.03/3.07), i.e. the difference in noise between left and right is on the order of 1%, clearly insignificant.