AI models leave behind experts in analysing bio lab results, strike fear of misuse

Voltaire Staff
Apr 23
2 min read

Several AI models outperformed trained virologists in a rigorous new benchmark designed to evaluate practical lab expertise, according to a study.

The results have, however, prompted concern among biosecurity experts about how such capabilities might be misused.

The Virology Capabilities Test (VCT), developed by SecureBio and other researchers, assesses the ability to troubleshoot complex virology laboratory protocols — a skill that has historically required years of hands-on scientific training.

Despite the difficulty of the test, in which expert virologists with access to the internet scored an average of only 22.1 per cent on questions within their own areas of specialisation, leading AI models achieved significantly higher marks.

According to the study, the strongest performance came from OpenAI's "o1," which achieved an accuracy of 35.4 per cent, placing it in the 89th percentile relative to the human experts and outperforming all other AI systems when evaluated across the full test. Another model, "o3" reached an even higher raw accuracy score of 43.8 per cent and ranked in the 94th percentile on expert-tailored subsets of the questions.

Google DeepMind’s Gemini 2.5 Pro and Claude 3.7 Sonnet from Anthropic also performed competitively, though none matched o1's general aptitude on the full VCT suite.

The test comprises 322 multimodal questions, integrating both visual and procedural knowledge, and is considered one of the most challenging assessments of practical scientific skill available for machine learning systems.

The implications of these results have triggered fresh concern regarding possible misuse of biological knowledge in the development of weapons or other forms of harm.

Researchers behind the benchmark argue that the capacity of AI models to replicate such expertise should now be treated as a dual-use technology and governed accordingly.

Seth Donoughe, a research scientist at SecureBio and co-author of the study, told TIME that the findings made him "a little nervous."

"Throughout history, there are a fair number of cases where someone attempted to make a bioweapon — and one of the major reasons why they didn't succeed is because they didn't have access to the right level of expertise," he said. "So it seems worthwhile to be cautious about how these capabilities are being distributed."

Major AI developers have begun introducing safeguards aimed at preventing the misuse of biological knowledge.

OpenAI told TIME that it has deployed "new system-level mitigations for biological risks" in its latest models. The measures are believed to include fine-tuned refusal mechanisms trained to block sensitive prompts, monitoring tools to detect unusual usage patterns, and collaborations with external biosecurity experts.

xAI, the company behind the Grok AI model, has also published a risk management framework and pledged to implement virology-specific protections in future iterations of its technology.