As I look at the crop/FF technology, the simplest common thread that pops out to me is that;
1. the FF sensors always have a lower pixel density which allows them to control noise and advantage IQ
2. the crop sensors with higher pixel densities have poor noise performance and limited actual resolving power
3. To some extent, (2) above may be due to less expensive technology used in the crop sensors.
I'm using the term 'resolving power' because, as many have pointed out, increasing pixel density does not (necessarily) a higher resolution system make. Case in point; As many and as profound as the 7D's strengths are, compared to the 40D, resolving power isn't the biggest "a hah" discovery. Its better, but not by leaps and bounds as the pixel count would imply. Which means -- the more pixels you cram into the same space the more physics dictates that you will have consequences (such as noise) that limit actual resolving power.
So -- and I'm making some broad hypothetical assumptions here -- if we assume that IQ limitations are dominated by the artifacts of pixel density, then its clear why there is a benefit to longer lenses and bigger sensors, and that a crop sensor will never equal a FF sensor of the same pixel count. Also, and following the same assumption, we would have to conclude that a FF sensor at 46mp (cropped to 1.6x) would perform identically to an 18mp crop sensor of the same technology-- because the actual resolving power would be identical.
I would suggest that as long as the technology itself, in the FF, is superior -- that is, the artifacts of pixel density are better controlled, then we could see the crop camera offering no advantage (other than cost) over the FF, in the situations we are discussing. Ajay's comparison is quite striking here, in that the 5D3 should behave more like the 10mp 40D, when cropped to 1.6x, instead of the 18mp 7D. Moreover, if the 5D3 would have offered mid-thirties mp, I bet we would see equivalency to the 7D or even superior!
The point that was raised about "over cropping" is interesting, i.e. how much can you throw away and still have a decent picture. I'm not a physicist but let me suggest that the answer is still about resolving power -- the sensor that has the best resolving power will show the best image. what does that mean -- to me it means that the comparison earlier, posted by Ajay, will continue to hold even as you crop things down further and further. the 7D will continue to show a slight edge, but as a practical matter the difference between the two will become more apparent as you crop further and further until the resulting image is essentially a pixel-peep.
Ajay you could demonstrate that by simply cropping the images further in Post.