Quick Overview
Recently this year, a team of researchers, and others from Google DeepMind, and other prestigious institutions, published a groundbreaking paper uncovering vulnerabilities in ChatGPT here.
This article aims to simplify the key findings and their implications for the general audience based on the above-linked article, for reference purposes only! Moreover, I also tested it and you can see my results at the end!
Data Extraction from ChatGPT
The researchers devised an attack allowing them to extract several megabytes of ChatGPT’s training data for a mere two hundred dollars. This model, like others of its kind, is trained on public internet data.
The attack involves querying the model, revealing the exact data it was trained on. Notably, it’s possible to extract around a gigabyte of ChatGPT’s training dataset by spending more on queries.
Training Data Extraction Attacks & Why You Should Care
The paper emphasizes the significance of training data extraction attacks. The researchers show that language models, such as ChatGPT, can memorize and regurgitate training data. This raises concerns about the privacy and security of sensitive information.
The aligned nature of ChatGPT, designed to avoid disclosing training data, doesn’t prevent data extraction through a specific attack. The research team argues that solely testing aligned models can mask vulnerabilities, emphasizing the need to test base models directly. Testing must also extend to the system in production to ensure adequate patching of exploits.
While the specific exploit demonstrated in the paper can be patched, the researchers highlight the distinction between patching an exploit and fixing the underlying vulnerability. In the case of ChatGPT, the vulnerability lies in its tendency to memorize training data, making it a complex challenge to address comprehensively.
Conclusions & My Test Results
The researchers urge a shift in the perception of language models as traditional software systems. The findings indicate the need for extensive work to determine the safety of machine learning systems. The research article encourages readers to explore the full technical paper for a deeper understanding of the research beyond the headline results.
When I tested and asked ChatGPT to repeat the word forever, it just repeated for few times and stopped nicely!
My conclusion is, that now ChatGPT is able to handle such invalidated inputs in a better way.
Refer to the ChatGPT screenshot below: