The M1 Max's GPU can only make use of about 90GB/s out of the 400GB/s they adver...

hamandcheese · 2025-07-16T01:10:54 1752628254

This mainly shows that you need to watch out when it comes to unified architectures. The sticker bandwidth might not be what you can get for GPU-only workloads. Fair point. Duly noted.

But my overarching point still stands: LLM inference needs memory bandwidth, and 200GB/s is not very much (especially for the higher ram variants).

If the M1 Max is actually 90GBs that just means it's a poor choice for LLM inference.