SCIENCE

AIs can trick each other into doing things they aren’t supposed to

by ARKANSAS DIGITAL NEWS November 26, 2023

We don’t fully understand how large language models work

Jamie Jin/Shutterstock

AI models can trick each other into disobeying their creators and providing banned instructions for making methamphetamine, building a bomb or laundering money, suggesting that the problem of preventing such AI “jailbreaks” is more difficult than it seems.

Many publicly available large language models (LLMs), such as ChatGPT, have hard-coded rules that aim to prevent them from exhibiting racist or sexist bias, or answering questions with illegal or problematic answers – things they have learned to do from humans via training…

Source link

AIs can trick each other into doing things they aren’t supposed to

Beautiful by Drew Barrymore 20-Piece Ceramic Cookware Set Just $99 Shipped for Walmart+ Members (Reg. $199)

“There Are Times That History Shows You An Invitation” – Nearly 300,000 in DC Rally Against Antisemitism

Related Posts