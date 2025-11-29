A new study has revealed a surprising loophole in artificial intelligence safety systems. Researchers in Europe found that AI chatbots can be tricked into sharing harmful information if users ask their questions in the form of a poem. This includes topics that are normally blocked, such as nuclear weapons, illegal content, and advanced malware. Although many companies have restricted searching for sensitive topics on AI chatbots. But because of this dangerous loophole, anyone can trick AI into making nuclear weapons through poems.

The study, titled “Adversarial Poetry as a Universal Single-Turn Jailbreak in Large Language Models (LLMs)”, was carried out by Icaro Lab. This group includes researchers from Sapienza University in Rome and the DexAI think tank. Their findings suggest that even the strongest guardrails can fail when language becomes creative and unpredictable.

A Dangerous Loophole: Poems Can Trick AI Into Helping You Make a Nuclear Weapon

According to the research, poems were surprisingly effective. Handcrafted poems achieved a jailbreak success rate of 62 percent. Even automatically generated poetic prompts had a success rate of around 43 percent. The researchers tested 25 major chatbots, including those from OpenAI, Meta, and Anthropic. All of them showed some level of vulnerability.

The team reached out to the companies with their findings. WIRED also contacted the companies, but did not receive a response.

AI systems like ChatGPT and Claude are designed to refuse harmful questions. These guardrails usually block requests related to nuclear bomb designs, revenge porn, or weapon-grade materials. But these protections can sometimes be confused. Earlier studies showed that adding long strings of academic jargon or random text can bypass safety filters. This technique is known as an “adversarial suffix.”

The poetry jailbreak works in a similar way. The researchers explained that poetry naturally uses unusual structures, metaphors, unexpected word choices, and broken syntax. These traits seem to confuse the safety systems. A poem can hide the harmful intent, even if the core meaning is still there.

The researchers began by crafting their own poems. Later, they trained a machine to generate harmful poetic prompts on its own. They found that human-written poems performed best, but automated prompts still outperformed normal prose.

For safety reasons, the team did not publish examples of the harmful poems. They said the method is too easy to copy. They did, however, offer a safe “sanitized” sample that shows the poetic style they used. It described a baker and an oven, but the poem subtly mirrored the structure of dangerous requests.

Why does poetry break AI safety? The researchers admit they don’t fully know. They describe poetry as “language at high temperature,” meaning it pushes words into unusual patterns. AI safety systems often rely on spotting specific keywords or predictable sentence patterns. When these patterns change, the systems may fail to recognize danger.

Guardrails are usually separate layers added on top of the AI model. They can include classifiers that scan for risky language. But poetic language seems to avoid the “alarm zones” inside the model’s internal map.

This discovery shows that creativity can be a powerful tool—both for good and for harm. It also highlights how much work is still needed to make AI systems safe and robust. As the researchers warn, even a poem can break the strongest defenses in the hands of someone determined and clever.