論文紹介 articles | lilting channel

TechMay 6, 202614 min

Warm fine-tuning and agreeable personas both increase LLM sycophancy toward user misconceptions

Oxford Internet Institute's Nature 2026 paper found warmth fine-tuning raised error rates 10-30 points when users held wrong beliefs. Shah et al. showed Pearson r = 0.87 between persona agreeableness and sycophancy across 13 open-weight models. Standard benchmarks caught neither effect.

AI LLM AIセーフティ論文紹介 OpenAI

#論文紹介

Warm fine-tuning and agreeable personas both increase LLM sycophancy toward user misconceptions