I think the number I have is related to video, because they specifically mentioned that it's faster than the R3.
I shall look to see if there's a consistent correlation between ES and video numbers.
This is how photo vs highest res video compare with recent models (I will consider a measurement error of 1 ms because of some small inconsistences with the source of the values I am getting, so approximative values):
R3: 5 ms (14 bit) - 10 ms
R5: 17 ms (12 bit) - 16 ms
R6ii: 14 ms (12 bit) - 14 ms
R6: 20 ms (12 bit) - 30 ms
a1: 4 ms (14 bit) - 15 ms
a9: 6 ms (12 bit?) - 23 ms
Z9: 4 ms (14 bit?) - 15 ms
a7s3: 20 ms (14 bit) - 9 ms
a7iv: 67 ms (14 bit) - 27 ms
What we know: going from 14 bit to 12 bit halves your readout time (this is shown with Sony cameras that have ES modes for both 14 and 12 bit). Video is usually read at 12 bit (X-H2s has a mode that is an exception here... don't believe in Sony when they talk about 16 bit video with their hybrid cameras).
So, we can see that most cameras use their fastest readout in photo mode while they slow down in video. For these I selected there, R3, R6, a1, a9 and Z9 fall into this category. The stacked sensor cameras there are interesting cases because they shoot 14 bit, meaning that, in theory, if they shot 12 bit they could be even faster in photo mode (I don't know if there is any other factor limiting it).
When it comes to the other cameras, they seem to mostly match their video and photo (at 12 bit) speeds. a7iv and a7s3 have half the readout speed in photo when compared to video, but those measurements are from 14 bit modes (it's the mode available for Sony cameras when they shoot single shot).
R6 is a particular case that has a relatively fast readout in photo for a non-stacked sensor camera of its time, but very slow in video (PS.: 30 ms is the bare minimum needed to shoot 30 fps video, in fact this is the reason the a7rv cannot shoot 8K in 30 fps, its readout speed is at around 40 ms in 8K, which limits it to 24 fps). My guesses here lay on heat management. I believe the manufactures limit the readout speed in video to avoid overheating. The case of the R6 enforces this idea, because it's a camera that suffers with overheat even at 4K 24 fps. In fact, the R6 is able to do "full sensor" (actually 1.06 crop) readout in 60 fps with about 15 ms - considering the small crop plus 16:9 video crop. I believe here Canon forced through the heating limitation to be able to offer full sensor readout 4K60, knowing that it would overheat even faster. They could've done the same for 4K24/30 but decided against it because it would have even stricter limitations in its most important video mode.
In summary, I believe that video readout speed doesn't tell the whole story - there seems to be other bottlenecks besides how fast the sensor can be read during video, bottlenecks that manufacturers need to take into account before they decide limitations there. So if the R5ii has a readout speed 30% faster than the R3 in video, this would mean 7 ms RS, in video (this is exactly how much you need to allow full sensor readout at 120 fps btw). For photo, this only tells us that it is at least 7 ms for 12 bit, or 14 ms in 14 bit, but it can be anything faster than that. So, we cannot draw any conclusion about its photo RS.
For these who are not familiar with the math, the fps vs RS requirement works like this: to shoot X fps, you need to read the sensor in at least 1/X seconds. So, for 30 fps = 1/30 = 0.033s = 33 ms. 60 fps = 1/60 = 16.7 ms. 120 fps = 1/120 = 8.3ms. The a7rv has about 40 ms of RS in 8K, so it cannot reach the necessary speed to shoot 8K30, hence it is limited to 24 fps in 8K. The a7siii is just a tad too slow to reach full sensor readout at 4K120, hence it has a small crop to be able to get there.