No saying canon would ever do this, but projectors have been using this technique in reverse for years to get a 4K image from 1920x1080 DLP/LCD chips. So, it would certainly be doable in a camera. Again, not saying it’s what Canon has done here, just saying it’s not anywhere near as farfetched an idea as you seem to think.
I actually had no idea that projectors did that - very interesting. I learned something! With that said, I believe there is a considerable difference in doing that for
projection of video versus
capture of video.
In projection, the projector is repeating parts of the the same source frame at a refresh rate faster than human perception - it plays 4 parts of each frame that line that line up perfectly per frame of video. In capture, the 4 pieces of each frame are not the same because the subject in the video can move during capture. Also, to capture 8K 30 fps, you'd be limited to probably a shutter speed of faster than 1/120th of a second (four 4K frames per one frame of 8K) which which doesn't line up with common practice. I've always heard that the typically desired shutter speed is twice as fast as the frame rate - so for 30 fps you'd likely want 1/60th of a second exposure per frame, not the minimum of 1/120th of a second here (or faster from a practical perspective).
Also, those projectors move millions of micro mirrors to a limited number of set positions to achieve that outcome. Since digital projectors use a moving micro mirror system anyway, adding additional positions to each micro mirror was not likely a quantum leap in technology. The above proposal for capture was to move the whole sensor using the IBIS system to capture the extra pixels, which would be a very different proposal as you're moving a whole lot more machinery than millions of tiny mirrors.
But let's assume you could overcome all those issues. To output 8K at 30fps you'd need the same data throughput as 4K 120 fps. To use sensor shift to capture the four 4K frames you'd need to make one 8K frame, that 4K sensor would have to record 4K at 120fps. So using a sensor shift to capture a higher resolution video wouldn't actually reduce the data throughput requirement, it would only reduce the sensor resolution requirement.
So not only would it be super hard to get a camera to capture in this way, and likely produce an inferior output, there would be no data throughput benefit to doing it, and the only savings would be sensor size. That is a lot of engineering to overcome for a pretty limited benefit all things considered (in my opinion anyway!).