LLMs believe false statements even after explicit warnings that they're false
Fine-tuning tests show "bias ... toward confidently representing the claims as true."
Tech news from the best sources
Fine-tuning tests show "bias ... toward confidently representing the claims as true."
But training on "synthetic stories" that model good AI behavior can help.
Overtuning can cause models to "prioritize user satisfaction over truthfulness.”