Since you went and invoked Nyquist, I will ask - what is the physical phenomenon that we are sampling, and what property(ies) of that phenomenon provide the limits from which we determine the minimal frequency to adequately capture all information present, and the optimal oversampling frequency?

Now you got me thinking and crunching some quick numerics. The sinc function is bipolar while the optical intensity is only unipolar which is a basic difference. Usually functions are not bandwidth limited unlike ideal low pass functions like the sinc. Therefore if you want to caputre ALL information you need infinitely fine sampling.

However if we fourier transform the 1D-function of an airy disc and compare the energy content per frequency to the sinc-function with a same resolution (according to the Raileigh criterion) we need about double the sampling frequency to catch almost all energy. Therefore if we sample twice as fine as the resolution according to the Rayleigh criterion, we should be fine. This sampling rate is about 15% finer than for MTF50%. (At least for a series of points, i don't know if it also holds for lines.)

So using the MTF50% megapixel values as a measure sounds good. A problem, however, is that the pixels have area and are not ideal sampling points. Therefore we must compensate for that by making the pixel areas much smaller than the resolution. In order to not throw away so much light we will have to increase the resolution to a multiple the MTF50% values in order to get close to 90% of the signal energy available in the optical resolution.