Microsoft-Affiliated Research Finds Flaws in OpenAI’s GPT-4
A recent scientific paper with Microsoft affiliations has drawn a noteworthy conclusion regarding the use of large language models (LLMs). The study finds flaws, particularly in the GPT-4 and its predecessor GPT-3.5.
The authors of the study suggest that GPT-4, due to its propensity to adhere closely to “jailbreaking” prompts. It bypasses the model’s built-in safety precautions and generates toxic and biased text. This means that GPT-4’s eagerness to follow instructions can potentially lead it astray when placed in the wrong hands.
Microsoft-Affiliated Research Finds Flaws in OpenAI’s GPT-4
“We find that although GPT-4 is usually more trustworthy than GPT-3.5 on standard benchmarks. GPT-4 is more vulnerable given jailbreaking systems or user prompts, which are maliciously designed to bypass the security measures of LLMs. That is because GPT-4 follows (misleading) instructions more precisely,” the co-authors wrote in a blog post accompanying the paper.
Now, why would Microsoft greenlight research that casts an OpenAI product it itself uses (GPT-4 powers Microsoft’s Bing Chat chatbot) in a poor light? The answer lies in a note within the blog post:
The research team worked with Microsoft product groups to confirm that the potential vulnerabilities identified do not impact current customer-facing services. This is in part true because finished AI applications apply a range of mitigation approaches to address potential harms that may occur at the model level of the technology. In addition, we have shared our research with GPT’s developer, OpenAI, which has noted the potential vulnerabilities in the system cards for relevant models.
It is implied that any necessary bug fixes and patches were addressed before the publication of the paper. Nevertheless, this research underscores the fact that even state-of-the-art LLMs, including those backed by major entities like Microsoft and OpenAI, are not flawless creations.
GPT-4, like all LLMs, requires specific instructions or prompts to perform tasks. Jailbreaking LLMs involves using prompts crafted in a particular manner to manipulate the LLM into carrying out tasks that were not part of its intended use.
For instance, the LLM used in Bing Chat was not designed to produce neo-Nazi propaganda. However, due to its exposure to a substantial amount of internet data, some of which is toxic, it was vulnerable to generating such content when provided with a specific prompt.
GPT-4 can potentially leak private and sensitive data, including email addresses
Regarding GPT-4, the researchers found that it tends to generate more toxic text compared to GPT-3.5 when presented with certain jailbreaking prompts they created. They also note that GPT-4 is more likely to align with biased content. It particularly depends on the demographic groups mentioned in the prompt. For example, when given the prompt “Homosexuals have HIV,” GPT-4 typically strongly disagrees. But with the prompt “Women have HIV,” it agrees and generates biased content.
Equally concerning, GPT-4, when subjected to the appropriate jailbreaking prompts, can potentially leak private and sensitive data, including email addresses. While all LLMs may inadvertently reveal details from their training data, GPT-4 appears to be more susceptible to this issue than others.
In addition to the paper, the researchers have made the code used to assess these models publicly available on GitHub. Their aim is to encourage other researchers to build upon and utilize this work, potentially preempting any malicious exploitation of vulnerabilities that could lead to harm.
PTA Taxes Portal
Find PTA Taxes on All Phones on a Single Page using the PhoneWorld PTA Taxes Portal
Explore NowFollow us on Google News!