Is CUDA tied very closely to the Nvidia hardware and architecture so that all the abstraction would not make sense on other platforms? I know very little about hardware and low level software.
CUDA isn’t really that hyper specific to NVIDIA hardware as an api.
But a lot of the most useful libraries are closed source and available on NVIDIA hardware only.
You could probably get most open source CUDA to run on other vendors hardware without crazy work. But you’d spend a ton more work getting to parity on ecosystem and lawyer fees when NVIDIA come at you.
The kind of CUDA you or I would write is not very hardware specific (a few constants here and there) but the kind of CUDA behind cuBLAS with a million magic flags, inline PTX ("GPU assembly") and exploitation of driver/firmware hacks is. It's like the difference between numerics code in C and and numerics code in C with tons of in-line assembly code for each one of a number of specific processors.
You can see similar things if you buy datacenter-grade CPUs from AMD or Intel and compare their per-model optimized BLAS builds and compilers to using OpenBLAS or swapping them around. The difference is not world ending but you can see maybe 50% in some cases.
Thanks