Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's 1.

It means that a developer can use their relatively low-powered Apple device (with UMA) to develop for deployment on nvidia's relatively high-powered systems.

That's nice to have for a range of reasons.





If Apple cannot do their own implementation of CUDA due to copyright second best is this; getting developers to build for LMX (which is on their laptops) and still get NVIDIA hardware support.

Apple should do a similar thing for AMD.


I thought that the US Supreme Court decision in Google v. Oracle and the Java reimplementation provided enough case precedent to allow companies to re-implement something like CUDA APIs?

https://www.theverge.com/2021/4/5/22367851/google-oracle-sup...

https://en.wikipedia.org/wiki/Google_LLC_v._Oracle_America,_....


Exactly and see also ROCM/HIP which is AMD’s reimplementation of CUDA for their gpus.

Reimplementation of CUDA C++, not CUDA.

CUDA is a set of four compilers, namely C, C++, Fortran and Python JIT DSLs, a bytecode and two compiler backend libraries, a set of compute libraries collection for the languages listed above, plugins for Eclipse and Visual Studio, a GPU graphical debugger and profiler.


There's ZLUDA for AMD that actually implements CUDA, but it's still quite immature yet

It would be great for Apple if enough developers took this path and Apple could later release datacenter GPUs that support MLX without CUDA.

It's the other way around. If Apple released data center GPUs then developers might take that path. Apple has shown time and again they don't care for developers, so it's on them.

What is the performance penalty compared to a program in native CUDA?

"relatively high powered"? there's nothing faster out there.

Relative to what you can get in the cloud or on a desktop machine.

I wonder what Apple would have to do to make metal + its processors run faster than nVidia? I guess that it's all about the interconnects really.

Right now, for LLMs, the only limiting factor on Apple Silicon is memory bandwidth. There hasn’t been progress on this since the original M1 Ultra. And since abandoning UltraFusion, we won’t see progress here anytime soon either.

Have they abandoned UltraFusion? Last I’d heard, they’d just said something like “not all generations will get an Ultra chip” around the time the M4 showed up (the first M chip lacking an Ultra variation), which makes me think the M5 or M6 is fairly likely to get an Ultra.

this is like saying the only limiting factor on computers is the von neumann bottleneck

Is this true per watt?

It doesn't matter for a lot of applications. But fair, for a big part of them it is either essential or a nice to have. But completely off the point if we are waging fastest compute no matter what.

...fastest compute no matter watt

Relative to the apple hardware, the nvidia is high powered.

I appreciate that English is your second language after your Hungarian mother-tongue. My comment reflects upon the low and high powered compute of the apple vs. nvidia hardware.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: