I am no technical wizard and my answers are only what I have found from my own experiences, but the answer to your question is yes and no, which is why there is so much debate and confusion over the internet. Smaller pixels gather less light, so higher mp cameras are worse at this point. However lower mp cameras have a smaller output size, higher mp a larger output size. Interpolate down those higher numbers of less efficient pixels to the same number of pixels as the lower mp camera and you finish with the same performance in output quality terms.
Putting it another way, if you print out a very high iso image from at 5DIV at native resolution, so 28" across at 240 dpi, and also print the same image from the 5DS at native resolution and at the same high iso, so this picture will be 36" across, the 5DIV will show better ISO performance. However reduce the 5DS image down to the same size as the 5DIV, down from 36" to 28", and the quality of the picture will be the same.
Or to sum it up: in theory yes, in practice no.
FWIW, my simplistic understanding is it works this way.
Smaller pixels gather less light, but the important thing is the total light gathered by the sensor. So, the amount of light per pixel isn't necessarily the key.
Larger pixels have a greater full well capacity. When shooting in good light (so generally low ISO), the greater full well capacity can help because it allows a greater range of results (subtley of tones) than a small pixel which should become full/saturated sooner. On the other hand, a smaller pixel is going to gather less light in a given amount of time, so in the end again, it's not as simple as saying a larger pixel is necessarily better. It depends on how big the difference is between the small and large pixels.
Where a difference comes is that larger pixels potentially mean a greater light-gathering area on the sensor. Each pixel has a "wall", and small pixels mean more pixels for a given size sensor, which means more "walls". Assuming the walls are the same width, more of the sensor is lost to non-light-gathering walls. So, if the pixels have the same QE, the overall sensor QE for the sensor with small pixels is lower than the overall sensor QE for the sensor with large pixels. When there is plenty of light (so assume low ISO), the slightly lower QE doesn't matter. As light level drops though (think higher ISO), the difference in QE starts to become more significant.
Incidentally, I understand this is essentially why BSI sensor designs are regarded as better than the older designs, ie BSI moves some components off the sensor surface which is used for light gathering, so allows the light gathering areas to be larger while the physical size of the sensor stays the same.
However, as Sporgon has said, you have to take into account output size too. As you reduce output size, differences becomes harder to pick. As you increase output size, differences get easier to see. But as you increase output size, the lower resolution sensor starts to hit its limit before the higher resolution sensor, so even if it started with any advantage, it ends up having to fight a slightly different issue once output size gets large enough.
If anyone has a better understanding than me, I'm all ears!