The M1 Max's GPU can only make use of about 90GB/s out of the 400GB/s they advertise/support. If the AMD chip can make better use of its 200GB/s then, as you say, it will manage to have better LLM tokens per second. You can't just look at what has the wider/faster memory bus.
This mainly shows that you need to watch out when it comes to unified architectures. The sticker bandwidth might not be what you can get for GPU-only workloads. Fair point. Duly noted.
But my overarching point still stands: LLM inference needs memory bandwidth, and 200GB/s is not very much (especially for the higher ram variants).
If the M1 Max is actually 90GBs that just means it's a poor choice for LLM inference.
https://www.anandtech.com/show/17024/apple-m1-max-performanc...