Pixel size doesn't matter for low light performance. Total sensor area and quantum efficiency matter. It doesn't matter how finely you divide the light your receiving and converting into free charge. If you increase the amount of light your receiving (more total sensor area) and increase the rate of incident photon strikes to electron conversions, then you have better high ISO performance. It wouldn't matter if you had 10mp, 50mp, 120mp, or 500mp.
I get this, but I've wondered whether there might be some truth to the myth, though not in the way many people imagine. While I accept that your explanation is true, it applies when using identical tech throughout the sensor. I've wondered whether it's disproportionately more expensive to make high-density sensors and whether some compromises would be made to keep the costs of the higher MP sensors within reason. The practical result would be that higher MP had worse low-light performance, but only because it's not identical sensor tech.
There has certainly been a LOT of research into making smaller sensors (which pretty much always have smaller pixels) more sensitive to light. That research undoubtedly has cost billions. That said, most of the research into making better small pixels has been done to make ultra tiny sensors viable...the kinds of 1/3" down to around 1/8" sized sensors found in small compact cameras, tablets, phablets, phones, and every other device that uses a microscopic sensor. Each of those sensors is usually a tiny fraction of the cost of one APS-C or FF sensor, though, despite having considerably smaller pixels (between 1 to 2 microns these days, with a new generation of sub-micron pixel sensors coming very soon.)
The reason those sensors have problems with noise, again, isn't because of the small pixels...its the small sensor area. They are WAY smaller than even an APS-C. A couple orders of magnitude smaller at least, if not many more. To have enough pixels to be useful on such small sensors, the pixels themselves have to be tiny. That doesn't increase noise...all it means is that the sensor is "resolving" and/or "exhibiting" noise at a higher frequency. Blend a 2x2 matrix of pixels together with a median algorithm, and you would have the same noise as a sensor with pixels twice as large (linearly, 4x as much area...again, assuming similar tech, however within a given generation of cameras, sensor tech is usually very similar). These tiny sensors in tiny cameras in all the tiny devices we have these days perform so well because they actually use significantly better technology that what is found in our DSLRs. These tiny cameras employ some cutting edge science to increase their light gathering capacity, increase photodiode surface area, increase quantum efficiency, use per-pixel memories to increase charge capacity, etc. If a full-frame DSLR had the same kind of technology as a 1/8" sensor, we would have something like a 864mp 15fps ISO 1.6 million megapixel MONSTER that used color splitting (rather than color filtration) with at least 24 stops of dynamic range thanks to multi-bucket memories, digital readouts, black silicon (basically silicon that uses the same general technology as nanocoated lens elements to eliminate reflection), and a host of other advancements. A full-frame sensor in a DSLR that used the same technology as the microsensor used in the upcoming iPhone or Android would be utterly mind blowing. (Not to mention space guzzling...we would need a new kind of storage technology to handle 2.7Gb per RAW.
BTW, when I talk about noise in this context, I am pretty much referring to random sources of noise. That is primarily photon shot noise, as well as a bit of random noise from dark current and the random component of read noise. Pattern noise, which is always due to the electronics, is a different story. That is a matter of specific technological construction, materials, and sensor design. Pattern noise is usually buried very deeply within the signal, though, and unless your lifting your shadows by many stops, it is usually a non-factor. Photon shot noise and dark current are really the big ones. In normal photography, dark current is pretty much inconsequential, as CDS takes care of it (in astrophotography, dark current can be your worst enemy, as it accumulates with time....ugh...)
Its this difference in noise frequency...all noise frequency, particularly random noise frequency, where image normalization matters (LTRLI will like this). Dynamic range is talked about a lot, however it's usually talked about in the context of editing latitide: "How many stops can I lift my shadows?" That is certainly a factor of dynamic range, and clearly the one that everyone cares about today. Increasing dynamic range in such a way that you gain editing latitude means reducing read noise such that the original RAW, unscaled or anything like that, has less noise in the shadows, thereby increasing the usable range of bit depth in the RAW image. Dynamic range is also affected by other sources of noise than just the pattern read noise, however. All random sources of noise affect it as well, though, and that includes random noise introduced during read as well as the primary source of random noise, photon shot noise.
In order to compare noise of cameras with different size sensors, one must normalize their outputs. Scale them to the same size. It really doesn't matter if you scale up or down, however scaling down to a common target is usually the approach taken. Assuming you downsampled the images from a number of cameras all with different sensor sizes, but all with the same pixel count, to the same image size, say an image with 2000 pixels on the long side, you'll find that the larger the sensor, the lower the noise. If we instead had a set of cameras where the larger sensors had fewer pixels and smaller sensors had more pixels, again we would still see that the larger sensor had less noise...however we would also find that the smaller sensors had more detail. The thing about detail is, especially when there is a lot of it, it tends to drown out noise. This is a perceptual matter...the noise of the smaller sensors with smaller pixels is still higher, statistically speaking (i.e. if it was measured), however that higher level of noise would be more readily recognized when it occurs in smooth areas, gradients and solid areas (i.e. background boke).
The perceptual factor is difficult to nail down, it's highly subjective, but it does play a role in whether we as humans THINK one camera is noisier than another. This is actually one of the big problems with the 7D. It still has a very high resolution sensor...it's pixels are still a lot smaller than those of the 5D III, 6D, and most other DSLRs on the market with the exception of less than a handful (i.e. the 70D, a couple Nikon APS-C cameras). The reason the 7D is perceived as noisy is because it has a tendency to be a bit soft. It's got a "strong" AA filter (personally, I think it's just right for the job it was designed to do, but it does blur more than a lot of AA filters on newer cameras these days), and that strong AA filter eliminates a certain amount of high frequency detail...high frequency detail that would otherwise drown out noise. (The other problem is that the 7D doesn't actually gather as much light as newer counterparts, even including some of the lower end Rebels that ended up with the same sensor...the 7D can only gather a charge of about 20ke- per pixel, vs. say the 70D, which gathers nearly 27ke- per pixel...per SMALLER pixel, which indicates the 70D is gathering almost 50% more light than the 7D within the same sensor area). The 7D isn't necessarily much noisier than its counterparts and competitors...it just SEEMS noisier because it's a bit softer, and that softer detail has a harder time drowning out noise with meaningful information. I also think, in practice, that the 7D's noise is more difficult to clean up, as photon shot noise isn't "crisp" and just per-pixel...it kind of "bleeds" into multiple pixels (probably because of the AA filter).
Anyway, when it comes to sensors of the same size, the biggest differences are usually quantum efficiency and read noise (and, for some applications, dark current). The Sony Exmor, for example, is a superior sensor in all three of those categories. It has quite a bit more Q.E. than any Canon sensor (by as much as 15%), it has significantly lower read noise, and it actually also has less dark current (which only really matters for longer exposures.) Full frame Exmors are still the same area as the sensors in the 5D III and 1D X, but they gather a lot more light, and they introduce far less noise into the deep shadows. That's the only real difference. Assuming one created an exposure where the lowest pixel level was well above the read noise floor...you would find little of significant difference between cameras with these sensors that actually had anything to do with the sensor (you would find differences, but if you really looked into the reasons for those differences, I am willing to bet good money you would find the AF system, metering system, frame rate, and ability of the photographer to work quickly with the camera to change settings, find their subject, focus it, etc. as the key factors driving the differences in IQ.
I had an increasingly tough time with my 7D getting it to focus consistently...using the 5D III is EFFORTLESS...it practically works itself, and when I need to do anything, it's like it knows my mind. It's that factor right there, the ability to expend little effort using a camera to get good results, that makes Canon king of the DSLR. Canon is at the pinnacle of DSLR design. Their current generation of cameras are truly exquisite when it comes to making it easy, making it effortless, for the photographer to be a photographer, instead of a camera operator. I put off the 5D III for a good long while, largely because I wanted to see what the 7D II turned out to be. I rather regret that decision now, as even if the 7D II turns out to be phenomenal, and is just as effortless to use as the 5D III or 1D X...I spent an extra year hassling around with the 7D when I didn't really have to.
If you want low noise, go with a bigger frame, regardless of pixel count or size. If you want more detail, go with a smaller frame and more pixels. That's all that should really go into the decision making of whether to get a FF camera or an APS-C camera. Once you've picked one of those two things, then it's time to figure out what of all the other features will best serve your needs...and in my experience, it's all those other factors that are WAY, WAY more important. "Effortless"....that should really be Canon's new ad campaign. That's what Canon's current cameras do for you...they make photography effortless. I couldn't really give a crap about the minutia IQ when I can just point and shoot and the camera just does what I need it to.