First, I'd like to thank MichaelClark and PrivateByDesign for their carefully explained posts regarding equivalence.
And thanks, AlanF, for correcting me that lack of noise is mainly proportional to sqrt(total #photons captured).
Also, my example with a 200mm f4 is just to simplify what I'm trying to say, rather than focusing on a specific lens I want or use.
I'd like to give my opinions on how these 3 things compare for the same FF camera & ideal FF sensor, with the viewed "images" at the same size & brightness (where "ideal" just means 100% sensor quantum efficiency or 100% lens transmittance to simplify the discussion):
#1: Using an ideal FF 200mm f4 lens, with a 2x crop
#2: Using an ideal FF 200mm f4 lens, with an ideal 2x TC
#3: Using an ideal FF 400mm f8 lens
* All will produce the same image regarding the same angle of view, DOF, and OOF blur.
* All will capture the same rate of total photons per unit time for the image.
The differences between them are:
* #1 has the image on just 25% of the sensor area, while #2 and #3 have the image on 100% of the sensor area.
* Thus #2 and #3 have 4x the full-well capacity for the image as #1 and thus can capture 4x the total amount of light for it.
* If you stop your exposure at time T (when #1 reaches full-well for the brightest area) then #1 reaches "full exposure" while #2 and #3 are only 25% exposed but use a 4x ISO amplification to reach full exposure, and thus all three will produce the same image brightness from the same total # of photons, and thus with the same level of noise, assuming lack of noise is mainly due to sqrt(total #photons captured).
* If you allow the exposure in #2 and #3 to continue to be four times longer (4*T) then they will reach full exposure with 4x the total photons and thus sqrt(4) = 2x less noise than #1 (but 4x longer exposure can have 4x more subject motion blur, if any).
* #1 will have less pixels displayed in the image (vs #2 and #3) and thus will have less resolution with rougher edges, but using good upsizing interpolation can start to look somewhat close to the others.
* #1 and #3 will use (typically) a similar number of lens elements, while #2 adds a significant # of extra lens elements with a slight transmission loss but a significant contrast and resolution loss.
* Since #1 uses 25% of the sensor area for the image, the remaining 75% around it is still recorded (assuming you told the camera to do so). That allows you to more easily track fast moving subjects that might move out of view, and the ability in post to shift your image somewhat for a subject that moved off-center, or to zoom out to get a wider image (which is very useful, but not the point of this discussion).
All in all, my preference (if limited to just these 3 choices) is to use:
#3 for the best quality image, assuming the subject is not moving too fast to follow, and I can afford an additional lens and be willing to carry it around.
#1 would be my 2nd choice (if #3 isn't chosen).
#2 would not be my choice, as I'd prefer #1 over it mainly for the extra 75% area coverage around the image which I could use for the 3 reasons mentioned above.