Scaling down itself is an arbitrary process as you're trying to equalise both cameras by choosing arbitrary target size.
Within a given technological generation same format sensors
are essentially equal in high ISO noise performance. They only appear unequal when you arbitrarily enlarge one more than the other, otherwise known as pixel peeping. Telling you to view at the same size and/or apply some NR before downsizing for print is not an attempt to make the sensors equal. They already are virtually equal in noise quantity. It's an attempt to get you away from pixel peeping (which is explicitly unequal), and also to explain how you can deal with the qualitative difference in noise from higher pixel density sensors (it's literally sharper).
I typically do not reference DxO because A) they are prone to error, and B) I've never cared for how they present their results (i.e. sensor scoring). But some of their tests do provide valuable insight. Do you know what their Sports test scores tell us?
That high ISO differences between FF cameras are negligible. They show a 1/3rd stop difference between the 5Ds and 5D mark IV, and maybe a half stop between the 5Ds and 1DX mark III. This is in rough agreement with what I see when looking at DPReview RAW samples. And these aren't even of the same tech generation. The 5D IV and 1DX mark III have lower pixel densities and newer tech, and that's all the improvement they can deliver. You'll find the differences to be minor within both the Nikon and Sony lines as well.
Pixel density simply does not impact high ISO noise to the degree people believe. Nor is technology improving high ISO performance very quickly, if at all with some generations. The easy gains here were made in the 2000's. We're now at the point that photon shot noise dominates by a wide margin, and the electronics are about as good as they're going to get without active cooling or CFA removal. If you're expecting the R5 to be leaps and bounds better than the 5Ds/sR at high ISO, or the R6 to be better than the R5, you've set yourself up for disappointment. (Though, knowing how human beings perceive the world, if you believe R5/R6 high ISO will be dramatically better than any FF camera before, then
you will likely experience that regardless of the truth.)
Why are you downscaling B to A, not upscaling A to B?
If you upscale A to B the noise quantity will be nearly identical. The noise quality will be different because everything in A will be blurrier than it appears in B.
But why we choose the lower-mpix sensor as a target size?
Because as a general rule people don't expect to make 48" prints from ISO 12,800 shots.