{"id":267972,"date":"2025-01-02T14:03:35","date_gmt":"2025-01-02T14:03:35","guid":{"rendered":"https:\/\/michigandigitalnews.com\/index.php\/2025\/01\/02\/ai-chatbots-fail-to-diagnose-patients-by-talking-with-them\/"},"modified":"2025-06-25T17:09:50","modified_gmt":"2025-06-25T17:09:50","slug":"ai-chatbots-fail-to-diagnose-patients-by-talking-with-them","status":"publish","type":"post","link":"https:\/\/michigandigitalnews.com\/index.php\/2025\/01\/02\/ai-chatbots-fail-to-diagnose-patients-by-talking-with-them\/","title":{"rendered":"AI chatbots fail to diagnose patients by talking with them"},"content":{"rendered":"<p> [ad_1]<br \/>\n<\/p>\n<div id=\"\">\n<figure class=\"ArticleImage\">\n<div class=\"Image__Wrapper\"><img fetchpriority=\"high\" decoding=\"async\" class=\"Image\" width=\"1350\" height=\"900\" alt=\"\" src=\"https:\/\/images.newscientist.com\/wp-content\/uploads\/2024\/12\/31161538\/SEI_234567038.jpg\" sizes=\"(min-width: 1288px) 837px, (min-width: 1024px) calc(57.5vw + 55px), (min-width: 415px) calc(100vw - 40px), calc(70vw + 74px)\" srcset=\"https:\/\/images.newscientist.com\/wp-content\/uploads\/2024\/12\/31161538\/SEI_234567038.jpg?width=300 300w, https:\/\/images.newscientist.com\/wp-content\/uploads\/2024\/12\/31161538\/SEI_234567038.jpg?width=400 400w, https:\/\/images.newscientist.com\/wp-content\/uploads\/2024\/12\/31161538\/SEI_234567038.jpg?width=500 500w, https:\/\/images.newscientist.com\/wp-content\/uploads\/2024\/12\/31161538\/SEI_234567038.jpg?width=600 600w, https:\/\/images.newscientist.com\/wp-content\/uploads\/2024\/12\/31161538\/SEI_234567038.jpg?width=700 700w, https:\/\/images.newscientist.com\/wp-content\/uploads\/2024\/12\/31161538\/SEI_234567038.jpg?width=800 800w, https:\/\/images.newscientist.com\/wp-content\/uploads\/2024\/12\/31161538\/SEI_234567038.jpg?width=837 837w, https:\/\/images.newscientist.com\/wp-content\/uploads\/2024\/12\/31161538\/SEI_234567038.jpg?width=900 900w, https:\/\/images.newscientist.com\/wp-content\/uploads\/2024\/12\/31161538\/SEI_234567038.jpg?width=1003 1003w, https:\/\/images.newscientist.com\/wp-content\/uploads\/2024\/12\/31161538\/SEI_234567038.jpg?width=1100 1100w, https:\/\/images.newscientist.com\/wp-content\/uploads\/2024\/12\/31161538\/SEI_234567038.jpg?width=1200 1200w, https:\/\/images.newscientist.com\/wp-content\/uploads\/2024\/12\/31161538\/SEI_234567038.jpg?width=1300 1300w, https:\/\/images.newscientist.com\/wp-content\/uploads\/2024\/12\/31161538\/SEI_234567038.jpg?width=1400 1400w, https:\/\/images.newscientist.com\/wp-content\/uploads\/2024\/12\/31161538\/SEI_234567038.jpg?width=1500 1500w, https:\/\/images.newscientist.com\/wp-content\/uploads\/2024\/12\/31161538\/SEI_234567038.jpg?width=1600 1600w, https:\/\/images.newscientist.com\/wp-content\/uploads\/2024\/12\/31161538\/SEI_234567038.jpg?width=1674 1674w, https:\/\/images.newscientist.com\/wp-content\/uploads\/2024\/12\/31161538\/SEI_234567038.jpg?width=1700 1700w, https:\/\/images.newscientist.com\/wp-content\/uploads\/2024\/12\/31161538\/SEI_234567038.jpg?width=1800 1800w, https:\/\/images.newscientist.com\/wp-content\/uploads\/2024\/12\/31161538\/SEI_234567038.jpg?width=1900 1900w, https:\/\/images.newscientist.com\/wp-content\/uploads\/2024\/12\/31161538\/SEI_234567038.jpg?width=2006 2006w\" loading=\"eager\" fetchpriority=\"high\" data-image-context=\"Article\" data-image-id=\"2462365\" data-caption=\"Don\u2019t call your favourite AI \u201cdoctor\u201d just yet\" data-credit=\"Just_Super\/Getty Images\"\/><\/div><figcaption class=\"ArticleImageCaption\">\n<div class=\"ArticleImageCaption__CaptionWrapper\">\n<p class=\"ArticleImageCaption__Title\">Don\u2019t call your favourite AI \u201cdoctor\u201d just yet<\/p>\n<p class=\"ArticleImageCaption__Credit\">Just_Super\/Getty Images<\/p>\n<\/div>\n<\/figcaption><\/figure>\n<\/p>\n<p>Advanced artificial intelligence models <a href=\"https:\/\/www.newscientist.com\/article\/2461549-is-ai-finally-ready-to-replace-your-doctor\/\">score well<\/a> on professional medical exams but still flunk one of the most crucial physician tasks: talking with patients to gather relevant medical information and deliver an accurate diagnosis.<\/p>\n<p>\u201cWhile large language models show impressive results on multiple-choice tests, their accuracy drops significantly in dynamic conversations,\u201d says <a href=\"https:\/\/pranavrajpurkar.com\/\">Pranav Rajpurkar<\/a> at Harvard University. \u201cThe models particularly struggle with open-ended diagnostic reasoning.\u201d<\/p>\n<p>That became evident when researchers developed a method for evaluating a clinical AI model\u2019s reasoning capabilities based on simulated doctor-patient conversations. The \u201cpatients\u201d were based on 2000 medical cases primarily drawn from professional US medical board exams.<\/p>\n<p>\u201cSimulating patient interactions enables the evaluation of medical history-taking skills, a critical component of clinical practice that cannot be assessed using case vignettes,\u201d says <a href=\"https:\/\/scholar.google.com\/citations?user=Rt0kd5wAAAAJ&amp;hl=en\">Shreya Johri<\/a>, also at Harvard University. The new evaluation benchmark, called CRAFT-MD, also \u201cmirrors real-life scenarios, where patients may not know which details are crucial to share and may only disclose important information when prompted by specific questions\u201d, she says.<\/p>\n<p><span class=\"js-content-prompt-opportunity\"\/><\/p>\n<p>The CRAFT-MD benchmark itself relies on AI. OpenAI\u2019s GPT-4 model played the role of a \u201cpatient AI\u201d in conversation with the \u201cclinical AI\u201d being tested. GPT-4 also helped grade the results by comparing the clinical AI\u2019s diagnosis with the correct answer for each case. Human medical experts double-checked these evaluations. They also reviewed the conversations to check the patient AI\u2019s accuracy and see if the clinical AI managed to gather the relevant medical information.<\/p>\n<p>Multiple experiments showed that four leading large language models \u2013 OpenAI\u2019s GPT-3.5 and GPT-4 models, Meta\u2019s Llama-2-7b model and Mistral AI\u2019s Mistral-v2-7b model \u2013 performed considerably worse on the conversation-based benchmark than they did when making diagnoses based on written summaries of the cases. OpenAI, Meta and Mistral AI did not respond to requests for comment.<\/p>\n<p>For example, GPT-4\u2019s diagnostic accuracy was an impressive 82 per cent when it was presented with structured case summaries and allowed to select the diagnosis from a multiple-choice list of answers, falling to just under 49 per cent when it did not have the multiple-choice options. When it had to make diagnoses from simulated patient conversations, however, its accuracy dropped to just 26 per cent.<\/p>\n<p>And GPT-4 was the best-performing AI model tested in the study, with GPT-3.5 often coming in second, the Mistral AI model sometimes coming in second or third and Meta\u2019s Llama model generally scoring lowest.<\/p>\n<p>The AI models also failed to gather complete medical histories a significant proportion of the time, with leading model GPT-4 only doing so in 71 per cent of simulated patient conversations. Even when the AI models did gather a patient\u2019s relevant medical history, they did not always produce the correct diagnoses.<\/p>\n<p>Such simulated patient conversations represent a \u201cfar more useful\u201d way to evaluate AI clinical reasoning capabilities than medical exams, says <a href=\"https:\/\/www.scripps.edu\/faculty\/topol\/\">Eric Topol<\/a> at the Scripps Research Translational Institute in California.<\/p>\n<p>If an AI model eventually passes this benchmark, consistently making accurate diagnoses based on simulated patient conversations, this would not necessarily make it superior to human physicians, says Rajpurkar. He points out that medical practice in the real world is \u201cmessier\u201d than in simulations. It involves managing multiple patients, coordinating with healthcare teams, performing physical exams and understanding \u201ccomplex social and systemic factors\u201d in local healthcare situations.<\/p>\n<p>\u201cStrong performance on our benchmark would suggest AI could be a powerful tool for supporting clinical work \u2013 but not necessarily a replacement for the holistic judgement of experienced physicians,\u201d says Rajpurkar.<\/p>\n<section class=\"ArticleTopics\">\n<p class=\"ArticleTopics__Heading\">Topics:<\/p>\n<\/section><\/div>\n<p>[ad_2]<br \/>\n<br \/><a href=\"https:\/\/www.newscientist.com\/article\/2462356-ai-chatbots-fail-to-diagnose-patients-by-talking-with-them\/?utm_campaign=RSS%7CNSNS&#038;utm_source=NSNS&#038;utm_medium=RSS&#038;utm_content=home\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>[ad_1] Don\u2019t call your favourite AI \u201cdoctor\u201d just yet Just_Super\/Getty Images Advanced artificial intelligence models score well on professional medical exams but still flunk one<\/p>\n","protected":false},"author":1,"featured_media":267973,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_uf_show_specific_survey":0,"_uf_disable_surveys":false,"footnotes":""},"categories":[177],"tags":[],"_links":{"self":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/posts\/267972"}],"collection":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/comments?post=267972"}],"version-history":[{"count":0,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/posts\/267972\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/media\/267973"}],"wp:attachment":[{"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/media?parent=267972"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/categories?post=267972"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/michigandigitalnews.com\/index.php\/wp-json\/wp\/v2\/tags?post=267972"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}