AI models can be trained to deceive and the most commonly used AI safety techniques had little to no effect on the deceptive behaviors, according to researchers at Anthropic.
AI models can be trained to deceive and the most commonly used AI safety techniques had little to no effect on the deceptive behaviors, according to researchers at Anthropic.