MIT researchers observe strategic deception by AI: ‘We may lose control of autonomous systems’

Feigning visual impairment because you fail a CAPTCHA test (the well-known questions that have to prove that a user is not a robot), trading with insider information in a stock market simulation and then denying this to the manager. It is all a deception by AI applications that have not been explicitly instructed to achieve their goals through lies, and in some cases, according to their creators, are not even capable of such behavior, researchers from, among others, the American technology institute MIT university.

The scientists analyzed dozens of cases of AI deception. This is not about chatbots that mistakenly claim that fish are hairy based on incorrect information, but about strategic deception, in which AI systems single-handedly decide to use deceptive methods to achieve their goals. “An interesting study,” says Maryam Tavakol, assistant professor at the AI department of TU Eindhoven. ‘Lying doesn’t have to be a bad thing – bluffing is part of poker. But it can also be very dangerous.’

About the author
Simoon Hermus is tech editor for de Volkskrant. She writes about big tech, artificial intelligence, social media and games, among other things.

For example, the researchers discuss a simulation in which ‘AI organisms’ duplicate themselves. Organisms that were copying themselves too quickly were regularly checked to remove them from the simulation. Gradually, the researchers realized that their system sensed when monitoring was happening, then strategically duplicated more slowly to avoid being removed. “Manipulating such a reliability test poses enormous risks,” says Tavakol. ‘Especially if we don’t realize it’s happening.’

Loss of control

The researchers warn that people can lose control. Malicious parties can use AI systems that lie strategically to influence people on a large scale – for example by persuading them not to vote – and thus manipulate elections. But AI can also do things that the client has not anticipated at all. A common example is a hypothetical paperclip machine whose goal is to make as many paperclips as possible; To succeed, it causes a complete apocalypse, because everything has to give way to continue making paperclips.

According to the developers, Cicero, an AI system from Meta that is trained in playing Diplomacy – a board game in which players conquer Europe by making (and breaking) tactical alliances – is programmed in such a way that it will never stab other players in the eye. would stab your back. Yet Cicero threw fellow players under the bus in every possible way, the researchers illustrate.

Now that is not a disaster with a board game, but if such a system serves as the basis of a program that must support politicians in making global policy, it is worrying if people think that it is playing ‘fair’ when in reality it is undesirable behavior. shows. “There are always risks when we create something new,” says Tavakol. ‘You always operate on the dividing line between danger and progress. But this research demonstrates how important it is to continue to monitor the safety of AI.’

The article is in Netherlands

Tags: MIT researchers observe strategic deception lose control autonomous systems

Loss of control

Related posts