Resolution and signal to noise, which are linear functions of focal length, not surface area are the factors that are important, because you can enlarge an image. You can increase the area of the image from say a 200mm lens to be the same size as for a 300mm lens on the same camera by increasing the number of pixels in each dimension in PS by 1.5 in each dimension, and the difference between the two images will be a factor of 1.5 in resolution, not 1.5^2. Or, in the good old days of film, you would have enlarged the negative by 1.5 times more to give the same size photo but with 1.5x less resolution. You also increase noise to signal on enlargement, but N/S depends on the square root of the area of the image, i.e. its linear dimension.

Consider an analogous example: an APS-C has a crop factor of 1.6 because the image is effectively larger by 1.6 in both axis directions. We all say that the crop has a 1.6 times advantage in reach, not 1.6^2 (= 2.56), the increase in area. The S/N is worse than a FF by, in theory, a factor of 1.6, not 2.56.

What a telephoto lens does is, in effect, to let you get closer to your subject proportional to the focal length of your lens. Suppose a bird is 100m away. Then you will get the same size image when you have a 200mm lens 20m away, a 300mm lens, 30m away, a 500mm lens 50m away, a 600mm lens 60m away, a 700mm lens 70m away etc. It is focal length and not focal length squared that is important.