It is possible to have multiple memory ranks to reduce the bus width requirements for a given amount of memory. Nvidia has demonstrated that this is doable with GDDR6X on the RTX 3090. The RTX 3090 has a 384-bit bus with 24 memory ICs, despite only needing 12 to reach 384-bit. That means it has every two chips sharing one 32-bit interface, which is a dual rank configuration. If you look at the history of computer memory, you can find many examples of multi-rank configurations. I also recall LR-DIMMs as being another way of achieving this.
Achieving 128GB VRAM with a 256-bit bus (which seems like a reasonable bus width) would mean some multiple of 8 chips. If Micron, Samsung or SK Hynix made 128Gb GDDR7 chips, then 8 would suffice. The best right now seems 24Gb, although 32Gb seems likely to follow (and it would likely come sooner if a large customer such as Intel asked for it), so they would just need to have 32 chips in a quad rank configuration to achieve 128GB.
This assumes that there is no limit in the GDDR7 specification that prevents quad rank configurations. If there is and it still supports dual rank like GDDR6X did, then a 512-bit bus could be done. It would likely be extremely pricy and require a new chip tape out that has much more IO logic transistors to handle the additional bus width (and IO logic transistor scaling is dead, so the die area would be huge), but it is hypothetically possible. Given how much people are willing to pay for more VRAM, it could make business sense to do.
Even if there is no limit in the GDDR7 specification that prevents quadrank, their memory IO logic would need to support it and if it does not, they would need to redesign that and do a new chip tape out in addition to a new board design. This would also be very expensive, although not as expensive as going to a 512-bit memory interface.
In summary, adding more memory would cost more to do and it would not improve competitiveness in the target market for these cards, which I imagine is the main reason that they do not do it.
By the way, the reason that Nvidia implemented support for 2 chips per channel is because they wanted to be able to reach 48GB VRAM on the workstation variant of the 3090 that is known as the RTX A6000 (non-Ada). I do not know why they used 24x 8Gb chips rather than 12x 16Gb on the 3090, although if I had to guess, it had something to do with rank interleaving.
Having four chips per channel is exactly why this is implausible. DDR5 can barely operate with four ranks per channel, at severely reduced speeds. Pulling that off with GDDR6 or GDDR7 is not something we can presume to be possible without specific evidence. The highest-density configurations possible for LPDDR5x are dual-rank and byte mode (one chip per 8 bits of the memory bus, so two chips ganged together to populate a 16-bit channel) — and that still operates at less than half the speed of GDDR6.
I've not seen any proposals for buffering LPDDR or GDDR, so an analog to LRDIMMs is not a readily-available technology.
GDDR is the memory technology that operates at the edge of what's possible for per-pin bandwidth. Loading that memory bus down with many ranks is not something we can expect to be achievable by just putting down more pads on the PCB.
> DDR5 can barely operate with four ranks per channel, at severely reduced speeds.
That is objectively false. See, for instance, V-color’s threadripper RAM[0]. If 96GB quad-rank modules @ 6000Mhz in octo-channel counts as “barely operating” maybe we have different definitions of operation requirements.
As a side note, their quad-channel 3-rank RAM [1] hits 8000MHz, out of the box. Admittedly only 24GB modules, but still.
In that case, we need a 512-bit memory bus to do this using the 32Gbit GDDR7 chips that should be on the market in the near future. This would be very expensive, but it should be possible, or do you see a reason why that cannot be done either?
That said, I am not an electrical engineer (although I work alongside one and have had a minor role in picking low end components for custom PCBs), I think if Intel were to make a GPU with 128GB VRAM using GDDR7 in the next year or two, the engineer who does the trace routing to make it possible should make a go fund me page for people to send beer money.
I think the goalposts may have shifted a bit, from why hasn't Intel made such a card to why is Intel not (publicly) working on such a card to be released in a year or two.
In terms of what would have been feasible for Intel to bring to market in 2024, the cheapest option for 128GB capacity would probably have been ~8.5Gb/s LPDDR5x on a 256-bit bus, but to at least match the bandwidth of the chip they just launched, it would have made more sense to use a 512-bit bus and bump the die size back up to ~half the reticle limit like their previous generation die with a 256-bit bus. So they would have had a quite slow but high-capacity GPU with a manufacturing cost equal to at least an RTX 4080, before adding in the cost of all that DRAM. And if they had started working on that chip as soon as LLaMA went public, they might have been able to deliver it by now.
It's no surprise at all that such a risky niche product did not emerge from a division of Intel that is lucky to not have been liquidated yet.
In hindsight, I misread you as saying that 128GB of RAM on a “basic GPU” is not technically feasible. My reply was to say it is feasible.
Intel is rumored to have a B770 GPU in development, but it was running late and then was delayed to next year since it had yet to tape out, so they are launching their B580 and B570 graphics cards, which had been ready to go for a while, now. That is why the bus size appears to have dropped across generations. Presumably, if they made a 512-bit bus version, it would be a 9 series card. They certainly left room for it in their lineup, but as far as leaks have been concerned, there is not much hope for one. I do not expect them to use anything other than GDDR7 on their battlemage cards.
As for a high memory ARC card, I am of the opinion that such a product would sell well among the local llama community. There might even be more sales of a high memory ARC card for inference than of the regular ARC cards for gaming given that their discrete graphics sales peaked at 250,000 in Q1 2023 before collapsing, which can be confirmed using the data here:
The market for high memory GPUs is surely bigger than that. That said, Intel is likely pricing their ARC GPUs at a loss after R&D costs are considered. This is likely intended to help them break into a new market, although it has not been going well for them so far. I would guess that they are at least a generation away from profitability.
Intel intends for its Gaudi 3 accelerators to be used for this rather than the ARC line. Those coincidentally have 128GB of RAM, but they use HBM rather than a DDR variant. Qualcomm on the other hand made its own accelerator with 128GB of LPDDR4x RAM:
If my math is right, Qualcomm went with a 1024-bit memory bus and some incorrect rounding (rounding 137.5 to 138 before multiplying by 4) to reach their stated bandwidth figure. Qualcomm is not selling it through the PC parts supply chain, so I have no clue how much it costs, but I assume that it is expensive. I assume that they used LPDDR4x to be able to build a product since they were too late in securing HBM supply and even if they did, they would not be able to scale production to meet demand growth since Nvidia is buying all of the HBM that it can.
Achieving 128GB VRAM with a 256-bit bus (which seems like a reasonable bus width) would mean some multiple of 8 chips. If Micron, Samsung or SK Hynix made 128Gb GDDR7 chips, then 8 would suffice. The best right now seems 24Gb, although 32Gb seems likely to follow (and it would likely come sooner if a large customer such as Intel asked for it), so they would just need to have 32 chips in a quad rank configuration to achieve 128GB.
This assumes that there is no limit in the GDDR7 specification that prevents quad rank configurations. If there is and it still supports dual rank like GDDR6X did, then a 512-bit bus could be done. It would likely be extremely pricy and require a new chip tape out that has much more IO logic transistors to handle the additional bus width (and IO logic transistor scaling is dead, so the die area would be huge), but it is hypothetically possible. Given how much people are willing to pay for more VRAM, it could make business sense to do.
Even if there is no limit in the GDDR7 specification that prevents quadrank, their memory IO logic would need to support it and if it does not, they would need to redesign that and do a new chip tape out in addition to a new board design. This would also be very expensive, although not as expensive as going to a 512-bit memory interface.
In summary, adding more memory would cost more to do and it would not improve competitiveness in the target market for these cards, which I imagine is the main reason that they do not do it.
By the way, the reason that Nvidia implemented support for 2 chips per channel is because they wanted to be able to reach 48GB VRAM on the workstation variant of the 3090 that is known as the RTX A6000 (non-Ada). I do not know why they used 24x 8Gb chips rather than 12x 16Gb on the 3090, although if I had to guess, it had something to do with rank interleaving.