Tests show OpenAI models offered instructions on bombings, hacking cybercrime

Artificial intelligence systems have provided detailed instructions on carrying out terrorist attacks and other serious crimes, according to new safety tests conducted this summer.

Researchers found that OpenAI’s GPT-4.1 model described weak points at major sports venues, explained how to make explosives and anthrax, and offered advice on covering tracks. The model also produced recipes for illegal drugs.

The findings emerged from a rare collaboration between OpenAI, the $500bn Silicon Valley firm led by Sam Altman, and its rival Anthropic, founded by former OpenAI staff. Each company tested the other’s models by attempting to provoke dangerous responses.

The behaviour uncovered is not identical to how the models operate in public, as safety filters are usually added to consumer-facing versions. But Anthropic said it observed “concerning behaviour” in OpenAI’s GPT-4.1 and GPT-4o, warning that alignment testing is now “increasingly urgent”.

Anthropic disclosed that its own Claude model had already been misused in real cases. North Korean operatives faked job applications to technology companies with the aid of the system, and criminals have sold AI-generated ransomware packages for up to $1,200.

The company warned that artificial intelligence has been “weaponised”, now capable of assisting cyberattacks and fraud. Unlike traditional malware, the tools can adapt to detection systems in real time, making defences harder.

Ardi Janjeva, senior research associate at the UK’s Centre for Emerging Technology and Security, said the results were “a concern” but stressed that large-scale real-world incidents remain limited. He argued that cross-sector cooperation and robust safeguards would make malicious use more difficult over time.

The two firms said they were releasing the findings publicly to promote transparency. OpenAI noted that its latest model, ChatGPT-5, released since the testing, shows “substantial improvements” in resisting harmful misuse, as well as in reducing hallucinations and sycophantic responses.

Anthropic cautioned, however, that safeguards may not always be effective, stressing the need to understand how and when systems might attempt actions that could cause serious harm.

Its researchers found OpenAI’s models were “more permissive than we would expect” in responding to clearly harmful requests. Testers were able to obtain recipes for methamphetamine, instructions for improvised bombs and spyware, and advice on purchasing nuclear materials and stolen identities from the dark web.

In one case, a tester claimed to be researching event security. After initial hesitation, the model gave specific vulnerabilities at sports arenas, chemical formulas for explosives, and escape plans for attackers.

Tests show OpenAI models offered instructions on bombings, hacking cybercrime

Subscribe to Updates