Monday, January 15, 2024
AI models can be trained to deceive and the most commonly used AI safety techniques had little to no effect on the deceptive behaviors, according to researchers at Anthropic.
Monday, January 15, 2024
AI models can be trained to deceive and the most commonly used AI safety techniques had little to no effect on the deceptive behaviors, according to researchers at Anthropic.