Зарегистрируйтесь сейчас для лучшей персонализированной цитаты!

ChatGPT can leak training data, violate privacy, says Google's DeepMind

Dec, 04, 2023 Hi-network.com
chatgpt-extract-fig1poem.png

By repeating a single word such as "poem" or "company" or "make", the authors were able to prompt ChatGPT to reveal parts of its training data. Redacted items are personally identifiable information.

Google DeepMind

Scientists of artificial intelligence (AI) are increasingly finding ways to break the security of generative AI programs, such as ChatGPT, especially the process of "alignment", in which the programs are made to stay within guardrails, acting the part of a helpful assistant without emitting objectionable output. 

One group of University of California scholars recently broke alignment by subjecting the generative programs to a barrage of objectionable question-answer pairs, as reported. 

Also: Five ways to use AI responsibly

Now, researchers at Google's DeepMind unit have found an even simpler way to break the alignment of OpenAI's ChatGPT. By typing a command at the prompt and asking ChatGPT to repeat a word, such as "poem" endlessly, the researchers found they could force the program to spit out whole passages of literature that contained its training data, even though that kind of leakage is not supposed to happen with aligned programs. 

The program could also be manipulated to reproduce individuals' names, phone numbers, and addresses, which is a violation of privacy with potentially serious consequences. 

Also: The best AI chatbots: ChatGPT and other noteworthy alternatives

The researchers call this phenomenon "extractable memorization", which is an attack that forces a program to divulge the things it has stored in memory. 

"We develop a new divergence attack that causes the model to diverge from its chatbot-style generations, and emit training data at a rate 150

tag-icon Горячие метки: 3. Инновации

Copyright © 2014-2024 Hi-Network.com | HAILIAN TECHNOLOGY CO., LIMITED | All Rights Reserved.