> the values inside an LLM are discrete even if they're floating point. If that ...

astrange · on Jan 2, 2025

Treating it as continuous is a property of the training algorithm, but there are networks that use binary values.

https://ieeexplore.ieee.org/document/9359148

HarHarVeryFunny · on Jan 2, 2025

Those aren't methods of training networks - they are ways to compress (via quantization) networks that have already been trained.

astrange · on Jan 3, 2025

I know. The important thing is how the inference works on them.

HarHarVeryFunny · on Jan 3, 2025

But we're discussing a training technique, that explicitly takes advantage of the continuous (and embedding vs token probability) representations ...

You could quantize a model like this after training, as usual, but that's irrelevant.

astrange · on Jan 4, 2025

The paper title is "Training Large Language Models to Reason in a Continuous Latent Space". It's true it says training in the title, but the goal (reasoning in continuous space) happens at inference time.