> the values inside an LLM are discrete even if they're floating point.
If that were true they'd never be able to learn anything - neural nets depend on continuous gradients to learn. Weights get updated by incremental/continuous amounts based on gradients.
Even at the output of an LLM, where the internal embeddings have been mapped to token probabilities, those probabilities are also continuous. It's only when you sample from the model that a continuous probability becomes a discrete chosen token.
The paper title is "Training Large Language Models to Reason in a
Continuous Latent Space". It's true it says training in the title, but the goal (reasoning in continuous space) happens at inference time.
If that were true they'd never be able to learn anything - neural nets depend on continuous gradients to learn. Weights get updated by incremental/continuous amounts based on gradients.
Even at the output of an LLM, where the internal embeddings have been mapped to token probabilities, those probabilities are also continuous. It's only when you sample from the model that a continuous probability becomes a discrete chosen token.