Uncovering Unfaithful CoT in Deceptive Models

Independent research featured on the front page of LessWrong.