I just thought I would share some info on the raw files Micheal shared. I decided to load them up in PixInsight. PixInsight is an astrophotography processing tool, and it is extremely powerful. It has amazing tools, many for noise reduction. It also has a statistics module as well. The cool thing about PI is you can operate on your images in linear space (without any additional processing applied, so it's "pure", untainted data that has not been affected by RAW editor tone curves or anything like that.) It is also capable of processing data in a wide variety of numerical spaces, including 8/10/12/14/16-bit integer, 32-bit float as well as 64-bit float. So, it makes for a nice platform for analyzing data in an objective manner.
To that end, here is some data I was able to generate using PI's statistics module in linear mode, with the original images without any edits, as well as some after various attempts at noise reduction. I used a PixInsight feature, Previews, to cordon off certain regions of each image and perform statistics on. Most of the statistics I'll be providing will have a name in the form of <imageFile> -> <previewName>, just so you know what your looking at.
First off, the linear images, when opened up in PI, definitely show that swatch 1 was not anywhere close to the clipping point, in either image. Here are a couple crops showing just the strip, as rendered by default in PI's linear mode, along with statistical details from swatch #1:
5DIII
_3H7C0105->Preview04
Data Point R G B
count (%) 100.0000 100.0000 100.0000
count (px) 63784 63784 63784
mean 166.227 154.809 177.034
median 165.973 154.607 176.774
variance 46.421 31.791 38.432
stdDev 6.813 5.638 6.199
avgDev 7.068 5.867 6.444
MAD 7.609 6.328 6.951
minimum 135.272 131.047 147.743
maximum 191.163 175.872 201.148
D750
DSC_1424->Preview06
Data Point R G B
count (%) 100.0000 100.0000 100.0000
count (px) 62748 62748 62748
mean 191.694 172.764 185.102
median 189.837 171.012 183.233
variance 58.980 43.680 44.843
stdDev 7.680 6.609 6.696
avgDev 7.971 6.870 6.953
MAD 8.065 6.738 6.738
minimum 176.241 158.728 167.611
maximum 218.949 194.171 205.922
The statistics are for each color channel, so R,G, B. As you can see from the maximum data point, the white level, it barely topped 200 levels (8-bit), meaning there was really a TON of headroom left on each sensor. This explains why the swatches became indiscernible from each somewhere between swatch 17 and 22, when I expected them to remain visible beyond that.
To clarify what's going on here. I know that swatch 1 appeared to be "clipped" to Michael. Both the cameras themselves, and any RAW editor he would have used, would have shown the white swatch #1 to be nearly or actually pure white because of tone curves. PixInsight does not apply any kind of curve unless I tell it to, so the images I've shared above are truly, 100% linear space.
As a side note...you should notice that the D750 crop is more reddish, and appears a bit brighter. It seems the red channel is slightly brighter in this image. I am not sure why, possibly a difference in white balance. The D750 image is indeed exposed a little bit more than the 5D III image. Again, if you look at the maximum statistic, you can see that there is a difference of ~30 levels in red, ~20 levels green, ~5 levels blue. Such a discrepancy should be eliminated for the most objective test results. I would say that swatch #1 should be clipped, all levels should be 255, 255, 255. Swatch 2 should be as close as possible to clipping without actually clipping in any channel, and the channels should deviate as minimally as possible (both across channels and across cameras...so if you get something like 250, 250, 250 +/- 3 in both cameras, that should suffice.)
This may be an opportunity to further explain, with a visual backup, something I've tried to explain in the past about shadow pushing. Note how much of each swatch, regardless of whether it is the 5D III or D750, is black? I don't just mean dark, but black. To my eyes, with my screen brightness turned up to maximum, in linear space, I could not differentiate any swatches from 22 through 41 for the D750, and I could not differentiate swatches 17 through 41 for the 5D III. That means that, in the case of both cameras, half or more of the dynamic range by default renders as black on screen. It isn't actually black, as you'll see in a moment here when I "stretch" the data, there is a lot of tonal detail in those "black" swatches that are most definitely not black. It just looks black. That is why we push shadows.
(Caveat: Since the highlight swatches were underexposed, once properly lit I suspect that the "indistinguishable from black" zone would shift down to swatches 25-29 or around there...however even in that case, one solid third of the tonal range available in the Stouffer T4110 step wedge shows up black by default, and this some of it (not necessarily all of it) would need to be pushed in order to contribute usefully to the final photograph...and this is even more true with a standard S-type contrast curve applied...that would again darken some of those swatches, maybe bringing the range back to somewhere between 22-25.)
Anyway, moving on. The next set of statistics are from the same linear images as shown above. Only in this case, the statistics are from swatch 41. There are two interesting things I glean from these statistics: noise levels and maximum level. I've also included a couple more images. These images demonstrate PixInsights "screen stretch" function, which applies a default MTF stretch curve to bring up the deep tones. The white swatches get blown, however screen stretch is normally intended for use with astro images, which pretty much don't have "whites"...almost all the data in an astro image falls into swatches 39-41.

Anyway, the screen stretch aims to bring up the darkest tones to a common level, and it is fully automated (in other words, I did not do anything personally to bias the results), so it is a good visual indication of the differences between each camera.
5D III
_3H7C0105->Preview01
R G B
count (%) 92.3782 95.3265 91.3499
count (px) 18956 19561 18745
mean 0.282 0.178 0.170
median 0.261 0.167 0.163
variance 0.026 0.007 0.010
stdDev 0.163 0.085 0.100
avgDev 0.160 0.081 0.098
MAD 0.156 0.081 0.098
minimum 0.008 0.008 0.004
maximum 1.268 0.642 0.825
D750
DSC_1424->Preview01
R G B
count (%) 100.0000 100.0000 100.0000
count (px) 23896 23896 23896
mean 19.070 9.470 10.899
median 19.074 9.471 10.899
variance 0.001 0.000 0.001
stdDev 0.038 0.019 0.023
avgDev 0.037 0.019 0.021
MAD 0.035 0.012 0.017
minimum 18.895 9.370 10.794
maximum 19.241 9.572 11.027
So, first statistic,
maximum. There is clearly a large difference between the two cameras here. At this level, despite the D750 being slightly more exposed than the 5D III, a difference in maximum level of 10-20 levels is FAR more meaningful than a difference of 20-30 levels in the highlights. I honestly do not know if this is simply because the D750 has more dynamic range or not. Whatever the cause, the difference in maximum level between the 5D III and D750 here is massive. More massive than I would have assumed given the exposure levels of both cameras. I would be very, very interested in seeing what changes here if the exposures were fixed (i.e. Step 1 fully clipped, step 2 close to but not actually clipped.) I suspect the 5D III maximum level would increase, but so would the D750 maximum level. Would they deviate even more? Interesting stuff. (Michael, do you think you could rerun the D750, 5D III, and 7D II step wedge tests to correct the exposures?)
Second statistic,
stdDev. This is the noise level, the standard deviation from the mean. It is about an order of magnitude difference in the green channel, more than an order of magnitude difference in the red and blue channels. I think the numbers speak for themselves...the D750 has significantly lower noise than the 5D III. I don't think anyone should be surprised by that...however I do think it is quite telling, as it can explain why the D750 steps from 35-41 look so clean...there is very low noise.
As far as the darkest steps with a discernible edge. With the screen stretch, I can see an edge between 35/36 on the 5D III, and I can see an edge between 37/38 on the D750. Because I was curious, I threw in some previews on swatch 41 of the stretched versions of both images, and here are the stdDev and maximums for each:
5D III
_3H7C0105_clone->Preview01
mean 59.503 59.127 55.742
median 59.735 58.634 56.661
variance 699.516 478.584 642.829
stdDev 26.448 21.877 25.354
maximum 152.770 136.280 150.739
D750
DSC_1424_clone->Preview01
mean 58.039 54.884 54.096
median 58.082 54.366 53.284
stdDev 11.211 6.536 6.822
maximum 97.654 84.568 87.623
Again, I think the statistics speak for themselves. Very similar mean/median values for both swatches (screen stretch aims to achieve a normative result, so we should expect that both images would have the same mean), big difference in stdDev and maximums (the higher stdDev of the 5D III ultimately pushes the brightest pixels brighter than on the D750...that IS what noise is, the deviation around the mean, and the larger the deviation, the brighter the brightest pixels will be.)
Finally, I ran some noise reduction on both sets of images. PixInsight has some phenomenal NR tools. One of my favorite is TGVDenoise. This is a Total Generalized Variation algorithm, and makes use of a slightly different kind of math to effectively wipe out noise entirely to a certain frequency, while generally leaving lower frequencies untouched. It is very good at removing per-pixel noise frequencies, without affecting lower frequencies at all.
I've applied TGVDenoise to the two images below, then stretched them with a screen stretch so you could see the results:
5D III
D750
TGV is configured according to the statistics of each image. I used the same settings for both images, with the exception of the Edge Protection setting, which I configured according to the stdDev statistic. By doing this, the same frequencies of noise are removed from each image, leaving behind the rest. I believe the results are once again pretty self explanatory. Aside from the vertical banding, there is clearly some other level of lower frequency noise present in the 5D III image. I personally refer to this as color blotchiness. Even despite the complete obliteration of high frequency noise, it doesn't really change things for the 5D III...the color noise is still very intrusive.
For a little bit of subjective opinion, as far as the first swatch where noise becomes unacceptably intrusive, without any NR, I wold say swatch 25 on the 5D III. On the D750, it isn't particularly objectionable to me all the way to swatch 41, but if I had to pick an eminently usable cutoff point, I'd say around swatch 32 or 33. With NR, I would say swatch 27 on the 5D III, and swatch 37 on the D750.
One final thing. I thought it would be useful to show an example of what I think a properly exposed step wedge should look like rendered as unprocessed linear data. Michael, if you do generate some additional images, I think you should aim for something like this: