ChatGPT-4 bungles 83% of diagnostic tests

Voltaire Staff
Jan 5, 2024
2 min read

A recent study in JAMA Pediatrics has revealed that ChatGPT-4 had an error rate of 83 per cent in diagnosing paediatric medical cases.

The latest version of the AI chatbot did a massive bungling in diagnosing difficult medical cases, especially when it came to children, it claimed.

The low success rate implies that human paediatricians won't be replaced by AI any time soon, according to the study's authors, who emphasised the importance of clinical experience.

In the study conducted at Cohen Children’s Medical Center in New York, researchers found that ChatGPT-4 is not ready for paediatric diagnoses, which requires additional consideration of the patient's age, especially in children due to their inability to articulate symptoms.

The researchers tested ChatGPT-4 against 100 paediatric cases published in medical journals. The AI's performance was evaluated by comparing its answers to those of qualified physician-researchers.

Out of these 100 cases, ChatGPT-4 provided the correct diagnosis in only 17, was plainly wrong in 72 cases, and did not fully capture the diagnosis in the remaining 11 cases.

The study identified weaknesses in ChatGPT and suggested ways to improve it as a useful tool in clinical care. Many doctors see the integration of AI chatbots into healthcare as inevitable, despite the challenges.

The medical field has embraced AI technologies early on, leading to both failures and successes. While there have been instances of algorithmic racial bias, there have also been successes in automating administrative tasks and interpreting medical images.

The potential for AI to solve complex diagnostic problems has generated interest in developing it into a helpful tool without relying on a brilliant medical expert.

One notable weakness was ChatGPT's difficulty in recognising known relationships between conditions. For instance, it failed to connect autism and scurvy in a medical case where restricted diets due to neuropsychiatric conditions could lead to vitamin deficiencies.

The researchers suggest that training the AI with accurate medical literature and providing real-time access to medical data could improve its diagnostic accuracy.