"It doesn't" depends on specific implementation. "It can't" is wrong. https://ar...

andrewmcwatters · on Dec 27, 2024

It can't is technically correct, and the paper you link explicitly states that it outlines an _external_ system utilizing _labeled data_.

So, no, current models can't. You always need an external system for verifiability.

viraptor · on Dec 28, 2024

It's trained on labelled data - to figure out how to interpret the LLM. But the external system is used only to interpret the hidden states already present in the analysed network. That means the original LLM already contains the "knows/doesn't" signal. It's just not output by default.

HarHarVeryFunny · on Dec 28, 2024

"By looking at different wrong answers generated by the LLM, we note that although our approach sometimes gives a high confidence score on a wrong answer generated by the LLM, at other times it shows desirable properties such as giving higher uncertainty scores to better answers, and giving low confidence score when LLM does not know the answer."