Those aren't methods of training networks - they are ways to compress (via quant...

astrange · on Jan 3, 2025

I know. The important thing is how the inference works on them.

HarHarVeryFunny · on Jan 3, 2025

But we're discussing a training technique, that explicitly takes advantage of the continuous (and embedding vs token probability) representations ...

You could quantize a model like this after training, as usual, but that's irrelevant.

astrange · on Jan 4, 2025

The paper title is "Training Large Language Models to Reason in a Continuous Latent Space". It's true it says training in the title, but the goal (reasoning in continuous space) happens at inference time.