The problem is that an LLM is a language model, not an objective reality model, so the best it can do is estimate the probability of a particular sentence appearing in the language, but not the probability that the sentence represents a true statement according to our objective reality.
They seem to think that they can use these confidence measures to filter the output when it is not confident of being correct, but there are an infinite number of highly probable sentences in a language which are false in reality. An LLM has no way of distinguishing between unlikely and false, or between likely and true.
This guy is a sex pest.