Conjecture that is likely true, and damning for large language models presuming that it is: An LLM trained strictly on truth will still confabulate, because it will break the bindings in what it saw, in the interpolation process, and continue to fabricate.
Further conjecture; if the above is (as I suspect true), will we never see fully honest LLMs (though they might be used as components in larger systems)
@garymarcus what kinds of data/contexts would you expect to trigger these phantom associations? Humans could be said to do it when they extrapolate without enough information, but at least we’re sometimes aware of it.
I haven’t done a proper study @jerelev but i would it expect it to be a fairly regular occurence particularly in long texts (short texts might simply be directly regurgitated) and on items that are low frequency.
@garymarcus Perhaps the space between objective truths is not a smooth surface across which you can interpolate. It seems likely that concepts can be more or less smoothly interpolated, but truth operates on abstract connections between concepts. This looks more like a discrete, symbolic landscape.
@garymarcus “will we never see fully honest LLMs”
Doesn’t being honest or dishonest imply intention?
Can a greedy likelihood maximizing algorithm (LLM next token prediction) be dishonest?
@sbraun didn’t really mean to ascribe intention. more careful would have been “i never expect to see an llm that sticks only to things that are derivable from the ground truth it was trained on”