ChatGPT, an artificial intelligence tool, was found to have a low accuracy rate, answering less than half of the questions from a commonly used study resource for physicians preparing for board certification in ophthalmology, according to a study led by St. Michael's Hospital.
The study, which was published in JAMA Ophthalmology, revealed that ChatGPT had an initial accuracy rate of 46% during the test in January 2023. However, the same test conducted a month later showed that ChatGPT's accuracy rate increased by more than 10%.
The release of ChatGPT to the public in November 2022 sparked excitement about its potential in medicine and exam preparation. However, concerns about the possibility of incorrect information and cheating in academic settings have also arisen. ChatGPT is accessible for free to anyone with internet access and can be interacted with conversationally.
Dr. Rajeev H. Muni, who headed the study, highlighted the importance of using AI systems such as ChatGPT responsibly, even though they may become more significant in medical education and clinical practice in the future. He further noted that ChatGPT did not provide enough correct answers to multiple-choice questions to be considered a substantial aid for preparing for board certification at present.
The study used a set of multiple-choice questions from OphthoQuestions, a free trial commonly used for board certification exam preparation in ophthalmology. To ensure that previous conversations did not affect ChatGPT's responses, researchers cleared all previous entries and utilized a new ChatGPT account for each question. They excluded questions that required image or video input since ChatGPT can only accept text input.
During the first test in January 2023, ChatGPT correctly answered 58 out of 125 text-based multiple-choice questions, indicating a 46% accuracy rate. However, ChatGPT's performance improved in the second test conducted in February 2023, with a 58% accuracy rate in answering the questions correctly.
Co-author of the study and resident physician in the Department of Ophthalmology and Vision Sciences at the University of Toronto, Dr. Marko Popovic, mentioned that despite ChatGPT's incorrect answers to ophthalmology board certification questions, it still holds potential for medical education. Dr. Popovic also added that they expect ChatGPT's knowledge base to improve quickly.
The study found that ChatGPT's multiple-choice response selections were similar to those of ophthalmology trainees. It selected the most popular answer among trainees 44% of the time, the least popular answer only 11% of the time, the second least popular answer 18% of the time, and the second most popular answer 22% of the time.
While Andrew Mihalache's name was not mentioned in the press release or study, the information about ChatGPT's performance on general medicine and ophthalmology subspecialties is accurate. The study discovered that ChatGPT had a higher accuracy rate on general medicine questions, with a correct answer rate of 79%, compared to the lower accuracy rate on ophthalmology subspecialty questions, such as oculoplastics (20% correct) and retina (0% correct). The authors noted that ChatGPT's accuracy in niche subspecialties could improve in the future.