When you ask an LLM to estimate if a given answer was hallucinated, it converts internal probabilities into a token(s) that represent these probabilities.
Here you have an output of chatGPT where it estimates it's probabilities of hallucination - which I would argue are quite close to correct:
This is likely a hallucination. You are committing a type error or perhaps succumbing to circular reasoning.
No modern LLM can query, let alone understand, the values in its own weights. They are simply inputs for the vector math. If they happen to be correct that is a either a coincidence or because it was trained with data including conversations about its past abilities.
Consider this, do you thinks the internal weights are labeled and directly to some attribute of the world outside? If so then present evidence. They are not, so if it does attempt to query its own internal weights how does it know what to query? This could be an interesting academic problem and I would encourage you to present papers on this because last I looked this was and itractable as of 2023 for the much simpler neural networks of simple backpropagation networks from 20 years ago.
Also, using ChatGPT as a reference here shows a major deficiency in your judgement. At best this is circular reasoning but likely you simply don't know how to allocate trust or how others do. The argument is X is unreliable then citing X is obvious not going to convince people who distrust X. Additionally, experts
To expand on this consider the amount of brains that can directly query the amount of a specific neurotransmitter used between two neurons and interpret what that means. That isn't exactly the same, but a hopefully illustrative analogy. An LLM's output is an emergent property of its structure and all the weights being multiplied together just as an organism's intelligence is an emergent property of its brain structure and communication between neurons. Neither of the emergent parts can currently reliably query the substrate they emerge from for similar reasons.
That isn't to say that such an LLM or biological brain couldn't be built, but none presently are.
When you ask an LLM to estimate if a given answer was hallucinated, it converts internal probabilities into a token(s) that represent these probabilities.
Here you have an output of chatGPT where it estimates it's probabilities of hallucination - which I would argue are quite close to correct:
https://chatgpt.com/share/e/e10cbc17-cc35-432f-872b-cb061700...