A particular format of sensor has no IQ. There's nothing inherently better about APS-H than APS-C or FF.
True when you're talking about IQ on the pixel level, untrue when you're talking about IQ on the image level. The 20D and 5DII have the same size pixels. Yes, the 5DII pixels are 'better' but not better enough to account for the IQ differences between the sensors. The FF sensor has better IQ because as a whole, the larger sensor gathers more total light.
I'll beg to differ with you there. I'm pretty sure that if you took the center 8MP from a 5DII image that it would be better than the 20D's image when using the same lens at the same setting. e.g. from 20m away, use a 50/1.4 and photograph the same subject with both cameras. Yes, the images will not be the same but the center 8MP of both images should be. The center 8MP don't somehow magically benefit from the pixels around the edge of the sensor.
Yes, cropping throws away the benefit of the FF sensor gathering more total light. I'm saying that if you move closer with the FF camera (or zoom in with a zoom lens) so you're getting the same framing covering the sensor, you'll get better IQ from the FF sensor. That means there is an inherent advantage to FF, unless you go around shooting everything wider than you need and planning to crop away 60% of all your images.
But when you move or otherwise change the framing then you're undoing the "reach" aspect of crop sensors.
I also wonder whether the FF sensor does capture more light.
What determines how much light lands on the sensor is the lens.
Imagine, for instance, that you've got a light bulb on that's 3m away from the camera. The amount of light that will be collected by the lens is fixed, regardless of the camera/sensor, by the size of the front element of the lens. So the 50/1.4 will provide the same amount of illumination in the lightbox on a crop sensor as it will for a full frame. Now if both sensors have the same pixel density then the amount of light that is captured that represents the light is the same (assuming that the image of the light does not exceed the size of the sensor when projected into the camera by the lens.) So at this point, the full frame sensor does not capture more light over the same area of the sensor as the crop sensor. Overall the full frame sensor does gather more light but not from the subject. Replacing a crop sensor with a full frame sensor does not somehow magically cause there to be more light present.
When it changes is when we move closer to the object to replicate the same field of view that we had with the crop sensor. I'll note that in doing so, it is the increase in size of the image on the sensor that allows more light to be captured - the exposure parameters will also change as you get closer. Then because the distance relative to the object changes, we must therefore capture more light. That also goes to say that using a zoom lens in a manner that does not substantially change the distance to the subject will also not deliver more light.
Consider also that all sensors are the same distance from the lens' focal point.
What full frame sensors give us is better separation of detail. Of course, having said that, I feel like I'm wrong but I just can't see it.
For those that don't follow, the amount of light that reaches a given point from an object is an inverse-squared relationship, meaning that at 4m from an object you receive 1/4 of the light that you do when you're 2m from it.