Pixel size is irrelevant. SNR, and therefor dynamic range (assuming you have no other source of noise than what is inherent to the image signal itself) and noise are ultimately relative to total sensor area. That's it.
Uhmm... except when pixel size is not irrelevant.
I have to disaggree with you, somewhat, on one point; dynamic range will become limited when pixels become too small, and hence their full-well capacity decreases by more than just the ratio of their surface area.
I say this because, I suspect, the vertical dimension of the photodiode will have some aspect ratio limit with regards to the surface area. When the surface area becomes too small, the other dimension will have to shrink also, and that will iimit the full-capacity/surface area, decreasing maximum DR. You'll still be able to reduce noise levels quite effectively by binning/averaging, either hardware or software, but you'll reach a lower maximum when the pixel geometry gets too small.
I suspect something like 40MP smartphone camera may be an example.
EDIT: Actually, we're already there in varying degrees.
Since many sensor systems are already counting individual electrons, smaller pixels are just gonna be DR-limited. 14bits at 1 bit per electron is only 16384 e-
Small pixels are useful even with full well counts well below that, like 2^10, but then that's already a 10-stop or less DR. When you start averaging them, you're not gonna gain quite all of that DR back. And then when you hit the aspect ratio limit for the photo-diode, the DR curve will really drop off.
Perhaps a resident math-whiz could graph that curve for a demo.... (nudge, hint-hint