The reason why the entire image from 5D2 looks cleaner is because it has a sensor that is 1.6 * 1.6 (= the crop factor) times larger than that of 7D. This is 2.56 times, meaning that it's log2( 2.56 ) = 1.35 stops cleaner. This is the expected difference.
This happens because images have a physical size, and this size is the size of the sensor. The larger sensor captures more light for the same photo. This can't be seen practically because the images are always scaled to display / paper size (= the same physical size), but with different scales. The images made by a sensor as large as a display don't need to be scaled at all (compared to a small FF sensor) and would therefore show stupefyingly low noise levels (for the entire photo). Unfortunately, the lens needs to be proportional in diameter.
Just curious, but what is the tradeoff between size of the photosites versus low light performance? If I understand the thesis correctly, a higher density wouldn't hurt in uncropped situations but would help for cropping situations, but how about in extremely low light situations. How does the noise floor for larger photosites compare to smaller photosites? Signal strength could be thought to be proportional to area (L^2), but how about noise? Is it something less than than L^2? If it is, then there is a trade between resolution and low light performance.