Because if I have the camera in 1/3200 fixed shutter at widest aperture to freeze motion as best as possible, f/7.1 will require notably higher ISO setting than f/5.6 for similar brightness, and as iso goes up amount of noise increases and contrast goes down. Cropping will not reduce contrast or increase noise artifacts, it will just make the existing noise artifacts a bit larger in size.
I think that's the trouble here. You seem to understand that both 400 5.6 and 500 7.1 let in the same amount of total light (400/5.6 ~ 500/7.1) and the difference is just over what area that light is spread. As cropping to the same field of view negates the difference in area, the total amount of light that makes up both images is also the same.
What you seem to misunderstand is how noise works.
In particular the notion that increasing ISO increases noise while cropping an image just makes noise more apparent is problematic. Because these actually have the same impact: Making existing noise more apparent.
Especially with regards to high ISOs, increasing it does not actually add any noise to your image. It only amplifies the noise already in the source signal. The noise that dominates low light photography isn't a property of the sensor. It is a property of the very light itself, called shot noise.
Light is noisy. When light is the signal, the amount of noise is the square root of the signal. So with more light, there is more noise in the signal - but it is less apparent, as it grows slower than the light. The term signal to noise ratio (SNR or S/R) expresses how apparent the noise is and is calculated by the signal divided by the sum of the noise sources. As shot noise is by far the largest source for photography with low light and short exposure times, that's the only one that matters for your question.
ISO has no effect on shot noise. It is just amplifying it, but not adding to it. You can essentially ignore ISO when you want to optimize how little noise you can see in your images. More light is the only way to reduce the noise. As cropping the 400 5.6 image to the 500 7.1 field of view brings the amount of light in both images to the same level, the amount of noise is the same.
The difference in ISO is just there to cause the brightness to be the same, because that is of course necessary to make a meaningful comparison. But it is not degrading the 500 7.1 shot to be any worse than the 400 5.6 shot. In fact, as the 500 7.1 shot is uncropped, it has more detail and therefore more headroom for further cropping - more reach essentially.
I think its not entirely doable by math because sensors respond differently to higher ISOs in terms of ugliness of artifacts and how easy to remove
I don't quite understand what you mean by this. But if you argue that using only SNR to judge how noisy an image is simplifies the visual complexity of noise too much, that is in my opinion an intuitive reaction to the subject but nonetheless not correct. I originally felt the same way and investigated it myself through images. If you are skeptical of my claims, you can look deeper into what I am basing them off.
I have compiled the results in a fairly decent thread about equivalency here:
Post in thread 'Equivalency - Now with pictures!'
https://www.canonrumors.com/forum/threads/equivalency-now-with-pictures.39787/post-874838
That's essentially covering the question you are asking here (focal length vs cropping). I still have to sit down and clean up (and finish!) those posts, which I haven't done yet. So excuse the rough edges, I just hope it might give you some impressions. It is not enough to just view the images, as they aren't fully self explaining. I am hoping to improve that eventually, but for now you should read at least the passage above each image that describes it.