So nine is not made of a 3x3 grid, but instead a "+" shaped set of photos?
I think a 3x3 grid is much more likely than a + shape.
As others have mentioned, if this is only over one pixel area it will give 400MP of data, but not 9x more resolution than the original 45MP. It will be somewhat like a sensor with an overly strong antialiasing filter. Also I think you'll see almost 0 additional color resolution because your data is still going to be clusters of individual colors.
Very strong parallels between this and crazy MP cameras in Samsung (and others I'm sure) phones. They don't have a standard Bayer filter. The last generation, 108MP sensor had a 3x3 pixel bayer filter. So each square of the color filter covered 9 pixels. This makes their pixel binning (to 12MP) much easier, as they can just connect a group of pixels that are all touching, together to read out one color. Not sure how aggressive the AA filter is on these, but I'd expect it to be closer to 12MP of resolving power than 108MP. Current version of that is even more extreme with a 4x4 grid for 200MPix natively binned to 12.5MP.
There are also some parallels to how most consumer 4k/UHD projectors (and many early 1080p rear-projection TV's) work. TI calls it XPR for DLP (often referred to as wobulation back in the 1080p days.) They're mostly 1080P imaging chips that are moved around either just diagonally for pseudo-4k or in a square for less-pseudo-but-still-kind-of-pseudo-4k. Fill factor is quite high on the original 1080P image, so quite a lot of overlap, so not true 4k image. A side benefit of this, at least on DLP, is reduction of screen door effect. A sharply focused DLP projector can exhibit quite high contrast black lines in between pixels (at least when viewed ridiculously closely.) With pixel shifting, the black lines are just at 50% brightness, so the effect is significantly diminished.