now that they have gapless microlenses and such you don't lose much by going to smaller photosites, any individual photosite is noisier if it is smaller but all together it's reasonably close to the same, withing reason.
Not quite. You still lose FWC when moving to a smaller pixel. Since both large pixel and small pixel sensors all use gapless microlenses these days, use of microlenses on sensors with smaller pixels really doesn't level the playing field like it did when it was first introduced. Microlenses improve Q.E. by increasing the number of photons that actually make it all the way to the photodiode, but photodiode capacity is entirely dependent on area...and in that respect, all else being equal (which is pretty much the case these days), larger pixels are still better.
There are other technologies that can still improve Q.E. on sensors with smaller pixels. Lightpipe tech for FSI sensors, BSI sensors, weaker CFA's, etc. are all techniques used on higher density sensors that can still help level the playing field. Even with those technologies, FWC of higher density sensors is usually less than 40k electrons/pixel, where as FWC with lower density sensors gets as high as 100k electrons/pixel. Granted, you can always downsample a higher resolution image and reduce noise, but generally you buy a high resolution camera for a reason, and the final output is usually upscaled, not downscaled.
In the end, once 180nm technology normalizes across the board for brands and sensor sizes/densities, I think the choices will boil down to two key things: resolution at the cost of noise or IQ at the cost of resolution. Personally, I'm fine with those choices...you can always own two cameras for different purposes. 
As I said each individual photosite does worse but taken together....
And I also said "do reasonably close to" as well as "within reasonable differences between photosite sizes" not "exactly the same" for "any possible difference in relative photosite scale".
With the current tech, a 40MP cam, overall, not 100% view comparing each photosite, doesn't have more than a a very modest bit worse high iso performance than using same tech on a 12MP it would seem and might even do a trace better for low iso dr. You gain so much more from the extra detail/reach compared to the likely insignificant loss in high iso.
The whole point is that you don't need two different cams for two different purposes. You can just own the high MP cam and when you care more about detail then print super large, view 100%, etc. and when you do care about noise then just print or view it at the same scale that you'd have to do with the lower MP cam. You very nearly get best of both worlds.