Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Those aren't methods of training networks - they are ways to compress (via quantization) networks that have already been trained.


I know. The important thing is how the inference works on them.


But we're discussing a training technique, that explicitly takes advantage of the continuous (and embedding vs token probability) representations ...

You could quantize a model like this after training, as usual, but that's irrelevant.


The paper title is "Training Large Language Models to Reason in a Continuous Latent Space". It's true it says training in the title, but the goal (reasoning in continuous space) happens at inference time.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: