But I think that * is* essentially what is going on, because DxOmark's print score is based on a normalization to 8 megapixels. I'd think that would buy you a gain of a stop or so. ( something like log2 ( sqrt(36 megapixels / 8 megapixels ) )

That's the theoretical part, which would work in an ideal situation with ideal noise characteristics and real numbers instead of quantized integers.

AFAIK, Nikon RAWs currently clamp any digitized negative values to zero (compare that to Canon having a bias value of 2048 in the data). This roughly halves the stdev of the dark frame image captures in Nikon's case, in the end inflating the "measured DR" by roughly one stop.

I don't follow this at all. Luminance isn't negative, so why would it make sense to have negative numbers ? If this clipping really takes place, does this show up on the SNR curves ? I don't really buy that they can inflate the estimated dynamic range by clipping relatively high values (one problem with this is that it leaves some dynamic range on the table). I see other problems with this line of reasoning. For one, you don't halve the standard deviation by throwing away half the distribution because it's heavily skewed (e.g. the left tail is bounded and the right isn't). I might be missing something, but the above looks like nonsense to me.

Another result of this is that for values of low magnitude, oversampling the individual pixel values in SW does not result in the expected behavior of noise converging towards zero. Since the noise-converging-to-zero is a key assumption in the whole "increase-DR-by-binning" scenario, it's quite trivial to notice that the theory doesn't hold water in this case.

When we talk about how "theory" plays out in the real world, it is far from "trivial".

In the case of signal to noise and its application to dynamic range -- even we fail to realize the "theoretical" blackpoint due to quantization error (because the actual noise is less than the quantization error), we still increase usable dynamic range.

Suppose for example our "shadow noise level" (noise at signal level of 1) is 1 -- so 1 on our scale corresponds to the blackpoint. If we average, theoretically, we could reduce the blackpoint, but our error is stuck at 1 due to quantization. That if I understand it is your argument.

But lets step up a couple of stops. At a signal level of 4, our noise level is 2 (proportional to sqrt of the signal), so at this level we would reduce signal to noise by binning (that is, quantization error isn't the limiting factor).

So you will gain usable dynamic range by increasing resolution, even quantization places a kind of floor on your blackpoint.