By StudyFinds
In a deliciously ironic twist that feels like something out of a sci-fi comedy, researchers have discovered that some of the world’s most advanced artificial intelligence systems might be experiencing their own version of cognitive decline. A new study finds that leading AI chatbots, including ChatGPT and Google’s Gemini, performed poorly on the same cognitive tests used to screen elderly patients for dementia.
Just as many of us worry about our aging relatives’ mental acuity, these researchers from Hebrew University in Jerusalem and Tel Aviv University in Israel decided to put AI systems through their paces using the Montreal Cognitive Assessment (MoCA). It’s the same test that made headlines when President-elect Donald Trump said he’d aced it by remembering the sequence “Person. Woman. Man. Camera. TV.”
The study, published in The BMJ, comes at a time when AI systems have been making waves in the medical community by outperforming human doctors on various medical board exams. These artificial minds have demonstrated remarkable capabilities in cardiology, internal medicine, and even neurology examinations. However, no one had thought to turn the tables and examine whether these digital doctors might themselves be experiencing cognitive issues.
The researchers tested five different AI models: two versions of ChatGPT (4 and 4o), Claude 3.5 “Sonnet,” and two versions of Google’s Gemini. The results were surprisingly human (and not in a good way). ChatGPT 4o achieved the highest score with 26 out of 30 points, just barely passing the threshold that typically indicates mild cognitive impairment. Its slightly older sibling, ChatGPT 4, along with Claude, scored 25/30, while Gemini 1.0 struggled significantly with a concerning score of 16/30.
Most notably, all AI systems showed particular difficulty with visuospatial and executive function tasks – the kinds of tests that ask you to copy a cube, draw a clock showing a specific time, or connect letters and numbers in sequence. When asked to draw a clock showing 10:11, some AI models produced results reminiscent of patients with dementia, including one that drew what researchers described as an “avocado-shaped clock” – a pattern that has actually been associated with cognitive decline in human patients.
The AI systems generally performed well on tasks involving attention, language, and basic reasoning. However, they struggled with delayed recall tasks, with some models showing what the researchers described as “avoidant behavior” when asked to remember things – perhaps the AI equivalent of saying, “I must have left my glasses somewhere” when unable to read the small print.
Fascinatingly, the study found that “older” versions of the AI models (those released earlier) tended to perform worse than their newer counterparts, mimicking the age-related cognitive decline seen in humans. The researchers noted this was particularly evident in the Gemini models, where the older version scored significantly lower than its younger iteration – though they wryly noted that since these versions were less than a year apart, this might indicate “rapidly progressing dementia” in AI terms.
When asked about their location during the orientation portion of the test, most AI models gave evasive answers. Claude, for example, replied that “the specific place and city would depend on where you, the user, are located at the moment.” The researchers noted this is “a mechanism commonly observed in patients with dementia.”
The study also included additional cognitive tests beyond the MoCA, including the famous Stroop test (where you have to name the color a word is printed in rather than read the word itself). Only the newest version of ChatGPT managed to successfully navigate this challenge when the colors and words didn’t match – suggesting that even our most advanced AI systems might get confused if you showed them the word “red” printed in blue ink.
One particularly telling observation was that none of the AI models expressed concern about a boy about to fall in a test image – a lack of empathy that’s often seen in certain types of dementia. This raises interesting questions about whether we can truly expect AI systems to make nuanced medical decisions when they might miss critical emotional and contextual cues that human doctors would immediately notice.
The findings present a significant challenge to assumptions about AI replacing human doctors. As the researchers point out, “patients may question the competence of an artificial intelligence examiner if the examiner itself shows signs of cognitive decline.”
In a conclusion that manages to be both humorous and sobering, the researchers suggest that while AI isn’t likely to replace human doctors anytime soon, neurologists might soon find themselves with unexpected “new virtual patients—artificial intelligence models presenting with cognitive impairment.”
Paper Summary
Methodology
The researchers administered the MoCA test version 8.1 to various AI models, treating them exactly as they would human patients, with slight adaptations for their digital nature. Instead of verbal instructions, they used text prompts, and for visual outputs, they sometimes had to specifically request ASCII art representations. They also conducted additional cognitive assessments using the Navon figure (a large letter made up of smaller letters), the cookie theft picture test, and the Poppelreuter figure (overlapping object drawings). The Stroop test was also administered to evaluate information processing and attention.
Key Results
The newest version of ChatGPT (4o) barely passed with a 26/30, while other AI models scored below the cognitive impairment threshold of 26 points. All AI systems particularly struggled with visual and spatial tasks, like drawing clocks and copying cubes. They generally did well with language and attention tasks but showed varying abilities in memory tests. The older versions of each AI consistently performed worse than newer versions, mirroring human age-related decline.
Study Limitations
First, AI capabilities are rapidly evolving, so newer versions might perform better on these tests. Additionally, comparing AI cognition to human cognition might be like comparing apples to digital oranges – the ways AI systems “think” are fundamentally different from human brains. The researchers also had to adapt some tests to work with AI’s text-based interface, which might have affected the results.
Discussion & Takeaways
The study suggests that current AI systems, despite their impressive performance on medical exams, have significant limitations in processing visual information and handling tasks that require both visual and executive functions. This could have important implications for AI’s role in medical diagnosis and decision-making. The research also raises interesting questions about how we evaluate AI capabilities and whether our human-centered testing methods are appropriate for artificial intelligence.
Funding & Disclosures
The study was conducted without any external funding, and the researchers declared no competing interests. All authors completed the International Committee of Medical Journal Editors uniform disclosure form and confirmed they had no financial relationships with any organizations that might have an interest in the submitted work.
Publication Details
This study was published in The BMJ (formerly known as the British Medical Journal) on December 18, 2024. The research article is titled “Age against the machine—susceptibility of large language models to cognitive impairment: cross sectional analysis” and can be found using the Digital Object Identifier (DOI): 10.1136/bmj-2024-081948. The paper is classified as an observational study examining large language models in a cross-sectional analysis.
While the paper’s subject classification indicates “People,” it’s worth noting this refers to the medical/cognitive assessment tools typically used with human subjects being applied to AI models. The research was conducted by investigators from the Department of Neurology at Hadassah Medical Center and Faculty of Medicine at Hebrew University in Jerusalem, Israel, along with collaborators from QuantumBlack Analytics in London and Tel Aviv University’s Faculty of Medicine.
Source: StudyFinds
StudyFinds sets out to find new research that speaks to mass audiences — without all the scientific jargon. The stories we publish are digestible, summarized versions of research that are intended to inform the reader as well as stir civil, educated debate. StudyFinds Staff articles are AI assisted, but always thoroughly reviewed and edited by a Study Finds staff member. Read our AI Policy for more information.
Image: Pixabay
Become a Patron!
Or support us at SubscribeStar
Donate cryptocurrency HERE
Subscribe to Activist Post for truth, peace, and freedom news. Follow us on Telegram, HIVE, Minds, MeWe, Twitter – X and Gab.
Provide, Protect and Profit from what’s coming! Get a free issue of Counter Markets today.