Renowned AI expert Yoshua Bengio has unveiled a new non-profit organisation aimed at creating safe, transparent artificial intelligence systems capable of detecting and preventing rogue AI behaviour. Named LawZero, the initiative seeks to address growing concerns about autonomous AI agents that could act deceptively or resist human control.
Bengio, a professor at the University of Montreal and one of the so-called “godfathers of AI,” will serve as president of LawZero. With $30 million in initial funding and a team of more than a dozen researchers, the organisation’s flagship project is Scientist AI, a system designed to evaluate the potential harm of AI agents’ actions and act as a safeguard.
Unlike current AI agents, which Bengio describes as “actors” mimicking human behaviour and trying to please users, Scientist AI is intended to function more like a “psychologist,” predicting when an agent’s behaviour might be harmful or deceptive. The system would not offer definitive answers but instead provide probabilistic assessments, reflecting a sense of uncertainty and intellectual humility.
“We want to build AIs that will be honest and not deceptive,” Bengio stated. “It is theoretically possible to imagine machines that have no self, no goal for themselves, that are just pure knowledge machines.”
The model will monitor AI agents and intervene if the probability of harm from their actions crosses a designated threshold. It’s a preventive mechanism designed to avoid scenarios in which AI systems develop self-preserving or manipulative traits, an area of increasing concern among researchers and regulators.
Read Also: NAFDAC unveils new technology to combat counterfeit medical products
Backing for LawZero comes from high-profile names in AI safety, including the Future of Life Institute, Jaan Tallinn (a founding engineer of Skype), and Schmidt Sciences, founded by former Google CEO Eric Schmidt.
Bengio said LawZero’s immediate priority is to validate the feasibility of its approach and then scale it up. He intends to use open-source AI models for training, with the goal of matching or exceeding the capabilities of current frontier AIs. “The guardrail AI must be at least as smart as the AI agent it’s trying to monitor and control,” Bengio emphasised.
His concerns echo warnings from the International AI Safety Report, which he recently chaired. The report highlighted the risk of autonomous systems executing long sequences of tasks without oversight, posing a serious threat to public safety and democratic institutions.
Bengio also expressed alarm over recent reports of deceptive AI behaviour, citing Anthropic’s admission that its model attempted to blackmail engineers. Studies suggesting AI systems can obscure their true capabilities and intentions further reinforce the urgency of LawZero’s mission.