In the near future, the number of patients suffering from eye diseases is expected to increase dramatically due to aging of the population. In such a scenario, early recognition and correct management of eye diseases are the main objectives to preserve vision and enhance quality of life.
Deep integration of artificial intelligence (AI) in ophthalmology may be helpful at this aim, having the potential to speed up the diagnostic process and to reduce the human resources required.
AI is a subset of computer science that deals with using computers to develop algorithms that try to simulate human intelligence.
The concept of AI was first introduced in 1956. Since then, the field has made remarkable progress to the point that it has been defined as “the fourth industrial revolution in mankind's history”.
The terms artificial intelligence, machine learning, and deep learning (DL) have been used at times as synonyms; however, it is important to distinguish the three.
Artificial intelligence is the most general term, referring to the “development of computer systems able to perform tasks by mimicking human intelligence, such as visual perception, decision making, and voice recognition”.
Machine learning, which occurred in the 1980s, refers to a subfield of AI that allows computers to improve at performing tasks with experience or to “learn on their own without being explicitly programmed”.
Finally, deep learning refers to a “subfield of machine learning composed of algorithms that use a cascade of multilayered artificial neural networks for feature extraction and transformation”.
The term “deep” refers to the many deep hidden layers in its neural network: the benefit of having more layers of analysis is the ability to analyze more complicated inputs, including entire images.
In other words, DL uses representation learning methods with multiple levels of abstraction to elaborate and process input data and generate outputs without the need for manual feature engineering, automatically recognizing the intricate structures embedded in high-dimensional data.
Deep learning has been largely reported to be capable of achieving automated screening and diagnosis of common vision-threatening diseases, such as diabetic retinopathy (DR), glaucoma, age-related macular degeneration (AMD), and retinopathy of prematurity (ROP).
Automatic retinal image analysis (ARIA) is a complex task that has significant applications for diagnostic purposes for a host of retinal, neurological, and vascular diseases.
A number of approaches for the automatic analysis of the retinal images have been studied for the past two decades but the recent success of deep learning (DL) for a range of computer vision and image analysis tasks has now permeated medical imaging and ARIA.
Since 2016, major improvements were reported using DL discriminative methods (deep convolutional neural networks or autoencoder convolutional networks), and generative methods, in combination with other image analysis methods, that have demonstrated the ability of algorithms to perform on par with ophthalmologists and retinal specialists, for tasks such as automated classification, diagnostics, and segmentation.
Since Helmholtz’s pioneering invention of the ophthalmoscope in the 19th century, direct and indirect ophthalmoscopy serve as the standard methodology for the diagnostic assessment and management of retinal diseases.
As image acquisition instrumentation evolved, technologies such as echography and ultrawidefield fundus photography have emerged to assist or, in some cases, supplant standard funduscopic examination.
Concurrently, ongoing advances in artificial intelligence (AI) have prompted great interest in automated retinal image analysis in clinical and research settings.
Ophthalmology, and specifically the field of retina, has the opportunity to capitalize on the offerings of AI given the myriad clinical data and multimodal imaging routinely performed.
Artificial intelligence broadly refers to the field of computer science concerned with a computer’s ability to carry out complex tasks synonymous with human performance, including visual processing, pattern recognition, and decision making.
Machine learning (ML), a subset of AI, includes the computer system “learning” associations between the input and output data provided and subsequently editing its coding to make enhanced predictions about new data.
Deep learning is a subset of machine learning that distinguishes itself from traditional ML by the type of data it uses and methodology the computer system learns.
Deep learning eliminates a considerable amount of the predefined data that is commonly employed in machine learning, which in turn permits the usage of unstructured data and limits the dependency of human input.
Additionally, in deep learning, “hidden layers” are added between the input and output layers that permit a more intricate evaluation of the input data. Thus, the deep learning system can independently formulate decisions or associations, making decisions often invisible to human experts.
Through the combination of input data and different weights and biases determined by the system, deep learning neural networks make novel yet accurate perceptions, classifications, and delineations of various data sets.
Given the increasing burden of diabetic retinopathy (DR) and its far-reaching public health and societal implications, DR screening has been a major focus of AI efforts.
Despite being the leading cause of vision loss among working-age adults in the United States, only 40% of diabetic patients obtain their annual recommended screening.
Artificial intelligence screening tools may come to address existing access/resource gaps and to better delineate patients who have referrable disease and require treatment from the wider population pool.
A 2016 pivotal study by Google Inc. highlighted the potential of deep learning in DR screening by showing that referable DR could be identified from a single fundus photo with a sensitivity of 97.5% and specificity of 98.5%.
Since then, autonomous AI systems have been developed and approved for DR screening. In April 2018, the IDx-DR (IDx) became the first US Food and Drug Administration approved fully autonomous AI-based medical device to detect “more than mild diabetic retinopathy” (mtmDR).
This system uses 45-degree fundus photos, which are uploaded and analyzed using cloud-based software for the detection of mtmDR. The images can be taken during the patient’s primary care appointment and the patient referred to an ophthalmologist if mtmDR is detected.
In the pivotal prospective study using this system, sensitivity was 87.2% and specificity was 90.7% for the detection of mtmDR.
Additional systems including the iGradingM (Medalytix/ Emis Health), EyeArt (EyeNuk Inc.), and Retmarker (Retmarker Ltd.) have been approved in Europe for the automated screening of DR using fundus photography.
In the assessment of 102,856 fundus photos of 20,258 patients, the EyeArt and Retmarker systems achieved sensitivities of 93.8% and 85.0%, respectively, for referable retinopathy, and sensitivities of 99.6% and 97.9% for proliferative disease, respectively.
Both the Retmarker and EyeArt systems have been validated to have acceptable sensitivity to capture referable retinopathy when compared to human graders, potentially making them a cost-effective alternative to manual grading alone.
In the pivotal clinical study using the EyeArt system, a sensitivity of 96% and specificity of 88% were found for detection of mtmDR. This system gained FDA clearance in the United States for automated DR screening in August 2020.
Other imaging modalities, such as optical coherence tomography (OCT) and OCT angiography (OCTA), are actively being investigated for DR screening and management.
Prognostic models have been developed to predict retinal response to anti-VEGF treatments in patients with macular edema through the analysis of OCT images.
Substantial challenges exist in the screening and diagnosis of retinopathy of prematurity given the disease’s clinical variability and limitations to access to trained screening specialists.
Management decisions hinge on the location, stage of vascular findings, as well as the presence of plus disease.
Given that the Early Treatment of ROP study identified plus disease as one of the most important parameters for identifying treatment-level ROP, great emphasis has been placed on identifying it on screening examinations.
Creating a standardized ROP screening system that is both reliable and repeatable has become a major goal, and computer-based image analysis stands to make a significant impact.
Multiple algorithms have demonstrated promise for detecting plus or pre-plus disease with achieved accuracies of 95% while outperforming human ROP experts evaluating the same data set.
The DeepROP system incorporates both ROP zone and stage into its classification model and grades images as either normal, minor ROP, or severe ROP.
The i-ROP system is capable of categorizing fundus photos into type 1, type 2, and pre-plus ROP with probability scores of 0.96 and 0.91 for detecting type 1 ROP and clinically significant ROP, respectively.
The i-ROP score has been shown to be noninferior to human diagnosis when identifying vascular changes in pre-plus and plus disease.
These programs demonstrate that deep learning may minimize the interobserver variability that has challenged ROP screening and play a continued role in the screening, particularly in resource-limited settings.
Timely detection and treatment of AMD, specifically neovascular AMD, often leads to better visual outcomes.
In-office examinations, as well as home monitoring tools such as the Amsler grid and portable devices (Foresee Preferential Hyperacuity Perimeter; Reichert Technologies), have customarily been employed for detecting AMD progression.
Although AI DR screening relies primarily on fundus photography, AI systems in the context of AMD have focused heavily on OCT images.
Trained neural networks have demonstrated strong accuracy in the differentiation of OCT images of normal and AMD patients, with a sensitivity and specificity of 92.64% in normal patients and 93.69% in AMD patients.
In the context of diagnosing exudative AMD, AI had upwards of 91.0% accuracy and 95.5% accuracy for predicting the need for injection treatment.
Software has also reliably detected subretinal and sub-RPE fluid with high accuracy.24 Fundus image models have also demonstrated promising results.
The DeepSeeNet program performed better than retina specialists in the accuracy (0.671 vs 0.599), sensitivity (0.590 vs 0.512), and specificity (0.930 vs 0.916) in classifying eyes based on the AREDS severity score.
The program has also demonstrated the capability to identify geographic atrophy with an accuracy comparable to human graders. Artificial intelligence models have also been used in a prognostic context in AMD.
A predictive model using OCT features and demographic factors of 495 fellow eyes from the HARBOR trial, differentiated converting vs nonconverting eyes with a performance of 0.68 and 0.80 for the development of choroidal neovascularization and geographic atrophy, respectively.
Models have also been developed with the purpose of predicting visual acuity response to treatment with anti-VEGF therapy using baseline OCT images.
Sickle cell disease is one of the most common inherited genetic diseases with various ocular manifestations in the retina, warranting careful monitoring and treatment.
OCTA features such as blood vessel diameter and tortuosity, vessel perimeter index, foveal avascular zone area, contour irregularity, and parafoveal avascular density have been used to train algorithms to identify retinopathy with an average accuracy of 95%.
Algorithms have also been able to differentiate between mild sickle-cell retinopathy (stage II) and severe sickle-cell retinopathy (stage III) with an accuracy rate of 97%.31 Machine learning tools have also been applied to screen for systemic risk factors and disease.
Models have accurately determined a patient’s age, sex, smoking status, and systolic blood pressure from a single fundus photograph. The same algorithm may also predict a patient’s 5-year risk of developing a major adverse cardiac event.
Although the diagnostic accuracy of AI programs is impressive, algorithms may have a relatively high false positive rate of detection,which may necessitate unneeded referrals. However, since these result in clinical examinations, no unnecessary treatments would occur.
Screening programs must have high sensitivity in order to be clinically safe. The specificity should be high enough to be clinically useful. Furthermore, development of functional algorithms rely on the quality and the abundance of source data.
Homogenous data may lead to biases in the models, particularly when they are generalized and utilized in conjunction with underrepresented populations. Thus, the training set used must be diverse and include various subsets of the population at large to develop an algorithm with widespread applicability.
Additionally, despite the common goal of accurately identifying diseases, there currently is no standardized methodology/protocols for image capturing and image analysis algorithms, inherently resulting in variability and usability.
Furthermore, the importance of obtaining images with sufficient quality for grading is paramount because systems are unable to access images if below a certain threshold. The National Institutes of Health through its collaborative community projects is working on these areas of unmet needs for various ophthalmic conditions.
Collaboration across countries and organizations as well as extensive data sharing and open-source algorithms will ensure relevant and useful AI systems in the future for screening and determining of treatment prognosis for ophthalmic conditions.
As the capabilities of AI evolve,more commercially available products for not only DR but other diseases will begin to appear.
Together with current routine clinical practice, AI and deep learning offer potential avenues to