So far it seems like guys have to use downsampling and other tricks to convince themselves, that the IQ is not worse than the former 24mpx generation.
Downsampling is no trick. I think you mean that noise at 1:1 pixel viewing is worse?
That is simply thinking about resolution the wrong way. You're likely viewing your image on a monitor with a fixed size. If you view your image so that it fills the monitor and each image pixel is displayed by one monitor pixel, you are essentially looking at an extreme crop of the image.
You are likely familiar with the concept of cropped images (or images from a crop sensor) appearing more noisy. With a modern sensor, that noise is mostly coming from fluctuations in the light source it self, called shot noise. It is not the fault of the sensor, but a limit of physics.
So what you do is instead of magnifying your image so far that the pixels are displayed 1:1 when comparing noise between images is you magnify the images so that they match each other in terms ofnfield of view, if you imaging comparing shots from a test scene that show the same image. This way, the shot noise has the same contribution to both images and the only difference comes from the noise introduced by the sensor, so dark current and read noise.
Of course if the compared images have different resolutions, one will be downscaled or upscaled more. For comparing the sensor quality, this does not matter though. If you want to measure physics, go ahead and do your comparison at 1:1 for each image, resulting in different field of views and different amount of crop.
That doesn't seem to be a fair comparison though. If you had looked at the images at the same field of view, the higher resolution would simply allow the corresponding sensor to capture more detail. Which isn't such a bad thing.