As far as I can remember, the only GaAs supercomputer actually produced was Cray-3, which had an order of magnitude less GFLOPs performance than my phone's GPU has.
But keep dreaming.
(I used to work with GaAs in Alferov's lab at about that time. GaAS is absolutely unsuitable for making modern CPUs or GPUs. No wonder Cray went bankrupt after Cray-3)
---
To put it in as simple terms as you can understand, the Crays ALL had Gallium Arsenide in their chipsets BUT UNLIKE TODAY, the original Seymour Cray-started company DID NOT HAVE seven nanometre ion beam or electron beam etching machines we have! Now we etch NORMAL CMOS-style CISC (Complex Instruction Set Computing) circuit pathways onto Gallium Arsenide (and Gallium Nitride too !!!!) substrates! AND while Seymour Cray had to cool his system with Liquid Nitrogen/Liquid Helium supercooling which is INCREDIBLE EXPENSIVE to do! We don't! Today? Your smartphone has
more processing power than even the largest Cray supercomputer of 1985!
Gallium Arsenide NEEDS high power (higher voltage and amperage) so the circuit pathways are WIDER (usually 300 to 400 nanometres wide) so the chips are physically MUCH MUCH LARGER !!! And that not-in-my-realm-of-understanding Metallization/Contact interface issues was always the largest issue with GaAs! Using gold metallurgy is a BIG contact problem issue while going BACK to the old days of aluminum has allowed the engineers to make WORKABLE and ACTUAL GaAs Digital Logic cicuits EQUIVALENT to a CMOS substrate CISC chip!
This also means by having wider line traces and aluminum doping/contact surfaces, we can bump up the frequency without causing too much RF/EMF induction or propogation issues and RFI/EMI noise-related issues or electron tunneling issues that modern CMOS circuits are bumping into. If one uses substrate-based cooling where microchannels of a non-cavitating, low-meniscus-forming cooling fluid (that also don't react with the substrate metallurgy/dopants!) are circulated IN-BETWEEN and UNDERNEATH the main line traces, we can run circuits as high as TWO TERAHERTZ !!!
The ceramic encased chips themselves simply use low-cost high purity Mineral Oil immersion baths and COTS (Common Off The Shelf) condenser technology for general heat removal, which means it's relatively CHEAP to run! AND since BC Hydro is only 9 to 12 cents per kilowatt hour if you buy electricity in bulk and at yearly contract prices, the running costs are almost irrelevant!
For the main 128-bits wide combined CISC-based CPU/GPU/DSP super-server chip, we run the chip at a 60 GHz clock frequency giving us a sustained 475 TeraFLOPS. For the Convolution Filter-oriented external Array/Vector Processor chip which uses SIMD/MIMD processing styles allowing ONE single command to start the simultaneous simple math and convolution filter processing of blocks of data that are arranged in a 2D-XY array of 65,536 by 65,536 8-bit Boolean State (YES/NO/MAYBE/POSSIBLY NO/POSSIBLY YES/etc and up-to 128-bits wide Signed/UnSigned Integer, Fixed Point and Floating Point numeric array elements (i.e. the SquareOf( 2^16 ) ), it means I can process 4 BILLION array items ALL AT ONCE using a single command.
This is VERY HANDY for things such as Hi-Pass and Lo-Pass filters, 2D-XY SOBEL edge detection, bitwise types of AND-OR-XOR-NOT-SHIFT LEFT-SHIFT-RIGHT-SPIN-FLIP-INVERT-CLIP LEFT-CLIP RIGHT-SET SPECIFIED BITS-REVERSE BITS operations and other MASSIVELY PARALLEL simple math-specific operations against BIG numeric datasets.
This also means I can processing GIGANTIC blocks of geographic information system (GIS) and mapping-oriented bitmap-based data (i.e. a much bigger and higher resolution version of Google Maps!) in mere microseconds rather than try and use some linearly processing Intel XEON or AMD EPYC server chip that can only process 512 bits of data per runtime instruction WHILE I can do 4 BILLION+ 128-bits wide numbers in one instruction cycle!
AND .... When we network all these chips together using a custom dense-wave fibre optics based networking system to attach ALL chips together (the optical networking part is BUILT-INTO the chip itself!) into "Symmetric Processing Array Groups", it means we have a 119 ExaFLOPS SUSTAINED 128-bits wide supercomputer that BLOWS AWAY the U.S.-based SUMMIT supercomputer and ALL THE OTHER Top-500 systems COMBINED !!!!! AND it all fits in a physical area the same cubic size as a typical high school basketball court or gym!
AND FINALLY !!! All those WEIGHTED Extended-Boolean-State logic circuits ALSO allow us to run the world's MOST SOPHISTICATED and the LARGEST molecular/electro-chemical simulation of human neuro-connective tissue in the world allowing, for the FIRST TIME, to have a general purpose Whole Brain Emulation system that at this very moment is learning just like a child, teenager and PhD-level human does! Via Simple Trial and Error and intrinsic connective-association learning (i.e. via teaching 24/7/365 by multiple simultaneous instructors) gets us to 160 IQ and above super-intelligence which can do ANYTHING YOU CAN DO AND MUCH MUCH MORE !!!!!
.
Here is the Math:
64k by 64k array block = 4,294,967,296 array elements
(of 128-bits wide each Int/FP/FXP/Boolean/Pixel) using a 9x9 convolution filter
= around 10 microsecond response time per SIMD instruction = 100,000 available blocks of time in one second =
= 429,496,729,600,000 array elements processed per second or about 429.5 Trillion FILTER Operations per second!
if you want to include the 9x9 convolution filter that is 81 addition operations (i.e. the kernel part)
plus 81 multiplication operations (i.e. the weighting part) and a final range limit comparison
(i.e. 2 compare operations) and 2 final clipping or rounding operations and a possible 16 final
division/multiplication/addition/substraction/root/square operations for rectification and
setting of up-to-16-local register results for EACH total filter operation (i.e. the setting of
up-to-16 colour/alpha/metadata pixel channel values) so that is a total of 182 math operations
in each convolution filter!
That means in ACTUAL PetaFLOPS we are looking at 78,168,404,787,200,000 128-bits
wide math operations or 78.16 PetaFLOPS in ONE single array/vector processor chip!
AND since we have MANY of these chips in multiple racks (around 1400 chips so far
with some held in reserve) we are looking at around 119 ExaFLOPS sustained!
SO YUP it really IS the world's FASTEST supercomputer by MANY MANY TIMES !!!!
Note: The 119 ExaFLOPS is dependent on clock speed which can actually exceed
TWO THz but normally runs a tad under that! The Minimum horsepower reading
is around 109 ExaFLOPS up to 1.5x that (163 ExaFLOPS peak) when running at
higher clock speeds and faster cooling rates! The sustained 119 reading is for
128-bits wide Floating Point number calculation benchmarks!
.
Does THAT WORK as an explanation for you?
.
P.S. This company has a LOT MORE RESEARCH AND DEVELOPMENT MONEY than Seymour Cray ever had!
.
And Will Canon, Sony, Fuji, Panasonic, Pentax EVER put this into one of their cameras?
WE SHALL SEE SOON ENOUGH !!!!!!!!!!
.
(Edit: fixed my bad math --- oops! Good thing the A.I is a LOT smarter than I am!)
.