https://ift.tt/PWgF4Ms
from Techmeme https://ift.tt/m5BJzvr
Kyle Wiggers / TechCrunch:
Anthropic researchers: AI models can be trained to deceive and the most commonly used AI safety techniques had little to no effect on the deceptive behaviors — Most humans learn the skill of deceiving other humans. So can AI models learn the same? Yes, the answer seems — and terrifyingly, they're exceptionally good at it.
from Techmeme https://ift.tt/m5BJzvr
No comments:
Post a Comment