Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> the values inside an LLM are discrete even if they're floating point.

If that were true they'd never be able to learn anything - neural nets depend on continuous gradients to learn. Weights get updated by incremental/continuous amounts based on gradients.

Even at the output of an LLM, where the internal embeddings have been mapped to token probabilities, those probabilities are also continuous. It's only when you sample from the model that a continuous probability becomes a discrete chosen token.



Treating it as continuous is a property of the training algorithm, but there are networks that use binary values.

https://ieeexplore.ieee.org/document/9359148

https://arxiv.org/abs/2205.13016


Those aren't methods of training networks - they are ways to compress (via quantization) networks that have already been trained.


I know. The important thing is how the inference works on them.


But we're discussing a training technique, that explicitly takes advantage of the continuous (and embedding vs token probability) representations ...

You could quantize a model like this after training, as usual, but that's irrelevant.


The paper title is "Training Large Language Models to Reason in a Continuous Latent Space". It's true it says training in the title, but the goal (reasoning in continuous space) happens at inference time.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: