Founder of REX Computing here; I highly recommend checking out my interview on the Microarch Club podcast linked elsewhere on the thread; will also answer questions on this thread if anyone has them.
The teaser reminds me a lot of other *failed* high performance/high efficiency architecture redesigns that failed because of the unreasonable effort required to squeeze out a useful fraction of the promised gains e.g. Transputer and Cell. Can you link to written documentation of how existing code can be ported? I doubt you can just recompile ffmpeg or libx264, but level of toolchain support can early adopters expect? Does it require manually partitioning code+data and mapping it to the on-chip network topology?
We had a basic LLVM backend that supported a slightly modified clang frontend and a basic ABI. We tried to make it drastically easier for both the programmer and compiler to handle memory by having all memory (code+data) be part of a global flat address space across the chip, with guarantees being made to the compiler by the NoC on the latency of all memory accesses across one or multiple chips. We tested this with very small programs that could fit in the local memory of up to two chips (128KB of memory), but in theory it could have scaled up to the 64 bit address space limit. Compilation time for programs was long, but fully automated, specifically to improve upon problems faced by Cell and other scratchpad memory architectures… some of our original funding in 2015 from DARPA was actually for automated scratchpad memory management techniques on Texas Instruments DSPs and Cell (our paper: https://dl.acm.org/doi/pdf/10.1145/2818950.2818966)
This was all designed a decade ago, and REX has been in effectively hibernation since the end of 2017 after successfully taping out our 16 core test chip back in 2016, but being unable to raise additional funding to continue. I have continued to work on architectures that have leveraged scratchpad memories in different ways, including on cryptocurrency and machine learning ASICs, including at my current startup, Positron AI (https://positron.ai)
> Thomas Sohmers joins to discuss dropping out of high school at age 17 to start a chip company, lessons from the successes and failures of past processor architectures, the history of VLIW, and the new AI hardware appliances he and his team are building at Positron AI.
Vaguely reminds me of the Adapteva Epiphany RISC multi-processors from the old Parallella Kickstarter project, and presumably others, but that's the one I played with for a while.
I'm not sure how this project's interconnect differs, they do say theirs is revolutionary, maybe that's the difference.
The premise of simplifying architecture, focus on memory, reliance on software, even the fact that you can stack a ton of chips per nose, all sounds very much like Groq. I wonder if this another case of multiple discovery.