It is possible to have multiple memory ranks to reduce the bus width requirement...

wtallis · on Dec 4, 2024

Having four chips per channel is exactly why this is implausible. DDR5 can barely operate with four ranks per channel, at severely reduced speeds. Pulling that off with GDDR6 or GDDR7 is not something we can presume to be possible without specific evidence. The highest-density configurations possible for LPDDR5x are dual-rank and byte mode (one chip per 8 bits of the memory bus, so two chips ganged together to populate a 16-bit channel) — and that still operates at less than half the speed of GDDR6.

I've not seen any proposals for buffering LPDDR or GDDR, so an analog to LRDIMMs is not a readily-available technology.

GDDR is the memory technology that operates at the edge of what's possible for per-pin bandwidth. Loading that memory bus down with many ranks is not something we can expect to be achievable by just putting down more pads on the PCB.

jmb99 · on Dec 4, 2024

> DDR5 can barely operate with four ranks per channel, at severely reduced speeds.

That is objectively false. See, for instance, V-color’s threadripper RAM[0]. If 96GB quad-rank modules @ 6000Mhz in octo-channel counts as “barely operating” maybe we have different definitions of operation requirements.

As a side note, their quad-channel 3-rank RAM [1] hits 8000MHz, out of the box. Admittedly only 24GB modules, but still.

[0] https://v-color.net/products/ddr5-ocrdimm-amd-wrx90-workstat... [1] https://v-color.net/products/ddr5-oc-rdimm-amd-trx50-worksta...

wtallis · on Dec 4, 2024

You linked to registered/buffered memory modules. I already addressed that case; it doesn't apply to LPDDR or GDDR.

ryao · on Dec 4, 2024

In that case, we need a 512-bit memory bus to do this using the 32Gbit GDDR7 chips that should be on the market in the near future. This would be very expensive, but it should be possible, or do you see a reason why that cannot be done either?

That said, I am not an electrical engineer (although I work alongside one and have had a minor role in picking low end components for custom PCBs), I think if Intel were to make a GPU with 128GB VRAM using GDDR7 in the next year or two, the engineer who does the trace routing to make it possible should make a go fund me page for people to send beer money.

wtallis · on Dec 4, 2024

I think the goalposts may have shifted a bit, from why hasn't Intel made such a card to why is Intel not (publicly) working on such a card to be released in a year or two.

In terms of what would have been feasible for Intel to bring to market in 2024, the cheapest option for 128GB capacity would probably have been ~8.5Gb/s LPDDR5x on a 256-bit bus, but to at least match the bandwidth of the chip they just launched, it would have made more sense to use a 512-bit bus and bump the die size back up to ~half the reticle limit like their previous generation die with a 256-bit bus. So they would have had a quite slow but high-capacity GPU with a manufacturing cost equal to at least an RTX 4080, before adding in the cost of all that DRAM. And if they had started working on that chip as soon as LLaMA went public, they might have been able to deliver it by now.

It's no surprise at all that such a risky niche product did not emerge from a division of Intel that is lucky to not have been liquidated yet.

ryao · on Dec 4, 2024

In hindsight, I misread you as saying that 128GB of RAM on a “basic GPU” is not technically feasible. My reply was to say it is feasible.

Intel is rumored to have a B770 GPU in development, but it was running late and then was delayed to next year since it had yet to tape out, so they are launching their B580 and B570 graphics cards, which had been ready to go for a while, now. That is why the bus size appears to have dropped across generations. Presumably, if they made a 512-bit bus version, it would be a 9 series card. They certainly left room for it in their lineup, but as far as leaks have been concerned, there is not much hope for one. I do not expect them to use anything other than GDDR7 on their battlemage cards.

As for a high memory ARC card, I am of the opinion that such a product would sell well among the local llama community. There might even be more sales of a high memory ARC card for inference than of the regular ARC cards for gaming given that their discrete graphics sales peaked at 250,000 in Q1 2023 before collapsing, which can be confirmed using the data here:

https://www.tomshardware.com/pc-components/gpus/discrete-gpu...

The market for high memory GPUs is surely bigger than that. That said, Intel is likely pricing their ARC GPUs at a loss after R&D costs are considered. This is likely intended to help them break into a new market, although it has not been going well for them so far. I would guess that they are at least a generation away from profitability.

Intel intends for its Gaudi 3 accelerators to be used for this rather than the ARC line. Those coincidentally have 128GB of RAM, but they use HBM rather than a DDR variant. Qualcomm on the other hand made its own accelerator with 128GB of LPDDR4x RAM:

https://www.qualcomm.com/news/onq/2023/11/introducing-qualco...

If my math is right, Qualcomm went with a 1024-bit memory bus and some incorrect rounding (rounding 137.5 to 138 before multiplying by 4) to reach their stated bandwidth figure. Qualcomm is not selling it through the PC parts supply chain, so I have no clue how much it costs, but I assume that it is expensive. I assume that they used LPDDR4x to be able to build a product since they were too late in securing HBM supply and even if they did, they would not be able to scale production to meet demand growth since Nvidia is buying all of the HBM that it can.