Modern Australian
The Times

New ‘AI scientists’ are improving – but reveal their fundamental limits

  • Written by Karin Verspoor, Dean, School of Computing Technologies, RMIT University, RMIT University
New ‘AI scientists’ are improving – but reveal their fundamental limits

Many of the most exciting discoveries in science involve highly specialised knowledge and making connections between far-flung facts. Scientists must combine deep analysis with broad reasoning strategies.

As in many information-rich tasks, researchers are looking to artificial intelligence (AI) systems to speed up their work. AI tools may be able to support key steps such as generating ideas, reviewing existing work and analysing data.

The latest systems use large language models (LLMs) to allow scientists to interact naturally and directly with the vast body of knowledge captured in words in the scientific literature.

But as two new systems described in papers just published in Nature show, when it comes to science, language alone can only go so far.

What AI is doing to science

A number of organisations, such as Sakana AI, are trying to automate the entire scientific process. To date, these efforts have largely focused on computer science, where “experiments” mainly involved designing and writing code.

However, the Agents4Science conference organised at Stanford last October showcased a broader range of AI-generated papers. They covered topics from mechanical engineering and protein design to a system called BadScientist which deliberately produced “convincing but unsound” research.

I have previously raised concerns about the impacts of AI scientists on the scientific ecosystem. Recent work validates these concerns, showing increased quantity but lower quality of both papers and peer reviews, identifying fabricated references in published works, finding fabricated and misleading images, and more.

What scientists are doing with AI

AI systems clearly can’t be trusted to conduct the full process of science on their own. But how about using AI to help scientists get more done more quickly?

This is the intent of the two new systems described in Nature: Robin, made by non-profit Future House, and Co-Scientist, from Google DeepMind.

Both systems aim to accelerate scientific discovery, working in collaboration with a scientist. Both are also “multi-agent” AI systems, meaning they are built as a collection of specialised agents each targeting specific steps of the scientific discovery process, coordinated by a “supervisor” agent.

The agents that comprise Co-Scientist aim to mirror abstract cognitive tasks, such as a “reflection agent” that acts as a critical scientific peer reviewer assessing the quality of a hypothesis. “Ranking agents” debate research hypotheses in “tournaments”, using multiple interacting LLMs to simulate a discussion about the relative merits of two hypotheses.

Robin’s agents, on the other hand, are more tuned to specific tasks relevant to drug repurposing, aiming to identify new drugs for a given disease. One agent focuses on selecting experimental tests, while another analyses complex biomedical data.

How do the results stack up?

Co-Scientist can assess the quality of its generated proposals, using a method called the Elo rating which is best known for ranking chess players. Co-Scientist’s self-ratings of the novelty and impact of its outputs align quite well with the preferences of human experts and judgements by other LLM systems.

In a drug repurposing experiment, Co-Scientist selected 30 drug candidates as promising treatments for a kind of cancer called acute myeloid leukemia. Expert (human) oncologists refined the list, and five drugs were tested in the lab. Of these, three showed some positive results and one seemed to show particular promise.

Other experiments showed the potential of Co-Scientist to explore combinations of multiple drugs.

Notably, the predictions of Co-Scientist were not compared with the plethora of targeted computational and machine learning methods for drug repurposing that have been developed over decades of computational biology research. This means we don’t know whether the new general-purpose tool outperforms more specific AI approaches.

Both systems stop short of validating their hypotheses directly, which would involve real physical experiments. Both also rely heavily on human input to define the key scientific question, sense-check predictions, and prioritise predictions for further investigation.

Co-Scientist focuses primarily on generating hypotheses through elaborate reasoning agents, leaving validation and interpretation to subsequent steps. Robin also uses an agent to analyse data produced from real-world experiments.

Robin was used to propose 30 drug candidates for a condition called dry age-related macular degeneration. The top five were selected for testing.

Robin also made proposals for the experiments, with several suggestions overridden by the human scientists. Through several rounds of brainstorming and analysis, two drugs were identified as promising.

Testing of Robin’s individual agents showed those that dug through earlier research were better at the task than general-purpose LLMs. The analytical agent did less well on questions about statistics and bioinformatics, and relied heavily on human-supplied prompts.

The limits of language alone

AI can help scientists to navigate the vast amount of documented knowledge humans have acquired over the millennia. Use of computation to find patterns in large datasets, to integrate dispersed information, and to drive new discoveries from existing literature has already contributed to scientific progress for decades.

New models such as Robin and Co-Scientist represent a shift towards working directly in the realm of the language of science, rather than the realm of raw data. This allows more natural collaborations between scientist and machine, through language-based “discussions”.

However, more natural doesn’t necessarily mean more effective. Language-based communication can be imprecise and ambiguous, where science must be specific.

Models that combine the best of these worlds are on the horizon. These aim to link structured quantitative data to the concepts and relationships that describe the core facts beneath it.

Such models ground scientific reasoning in the structure of knowledge. They allow scientific evidence ranging from genomic sequences and protein structures to cellular imaging to be connected.

Words are how science is communicated. AI tools that facilitate making sense of the information that is hidden in all of those words are surely valuable. But the complexity of the natural world means that AI (co-) scientists will only be truly effective when they can go beyond connecting words together, to modelling the full complexity of the systems those words describe.

Authors: Karin Verspoor, Dean, School of Computing Technologies, RMIT University, RMIT University

Read more https://theconversation.com/new-ai-scientists-are-improving-but-reveal-their-fundamental-limits-283281

The Connection Between Visibility and Driver Confidence

Operating a vehicle safely requires an immediate, uncompromised stream of visual information from the surrounding road environment. A driver's decis...

Important Things To Know Before Starting An SMSF Setup

Planning for retirement requires careful financial decisions, and many Australians are now looking for more direct control over how their superannua...

Why Retail Cleaning Plays a Key Role in Customer Experience and Business Success

Professional retail cleaning services are an essential part of maintaining a welcoming, safe, and professional environment for customers and staff...

Simple Ways to Make a Commercial Property More Appealing to Buyers

Selling or leasing a commercial property isn’t just about listing the square metres, taking a few photos and waiting for the right person to appea...

What Café Owners Should Know Before Upgrading Their Display Setup

A café display fridge does a lot more than keep cakes cold and sandwiches fresh. It quietly shapes the way customers browse, the way staff move beh...

Creating a Backyard That Feels Comfortable All Year Round

A great backyard doesn’t need to be huge, expensive or perfectly styled. Most of the time, the spaces people actually use are the ones that feel e...

How Homeowners Can Make Smarter Energy Decisions Before Upgrading

Energy upgrades used to feel like something you only looked into after a power bill gave you a nasty surprise. These days, though, more homeowners a...

Why Retail CX Breaks During Peak Sales Events and How to Prevent It

Retail customer experience has become one of the most important drivers of revenue growth, especially during high-intensity sales periods. However, ev...

15 South Indian Dishes Everyone Should Try

If your only experience of "Indian food" is butter chicken and garlic naan, South Indian cuisine is going to feel like discovering an entirely new c...

What Every Homeowner Should Know About Roof and Drainage Maintenance

A home's roof and drainage system work together every day to protect the property from water damage. While many homeowners focus on visible areas such...

From Plans to Priced Quote: The Estimating Workflow Most Builders Skip

For a small one-off job, an experienced builder can size up the materials in their head. The problem is that most jobs are not small one-off jobs, and...

Organisational Experts Share Their Tips for Achieving a Clutter-Free Kitchen

They say the kitchen is the heart of a house which means a clutter-free kitchen not only makes your home in general look nicer, it also makes cookin...

10 Creative Ways AI Image Extenders Are Transforming Digital Content Creation in 2026

Introduction Artificial intelligence continues to reshape the digital landscape, and one of the most exciting innovations in 2026 is the rise of AI i...

What to Do When You're Arrested in Victoria

Most people have thought about this in the abstract. A knock at the door, a hand on the shoulder, a car pulled over on the Hume. In the abstract, th...

Common Financial Disputes During Separation

Separation hits on many levels, not just emotionally. When a partnership ends, untangling the financial side — assets, debts, and everything built t...

Why Posting More Content is Killing Your Brand

More content. More often. More platforms.Most brands have been running this playbook for three years. Most brands have nothing to show for it.Not be...

Garden Clean-Up vs. Regular Maintenance: Which Do You Really Need?

Most people ring a gardener and ask for a "tidy up." What they mean by that, and what the garden actually needs, are often two completely different ...

Solar Panel Maintenance Tips for Melbourne Homes

Three years in and the panels are still on the roof. The inverter is still blinking. The electricity bills are still lower than they used to be, rou...