First-of-Its-Kind Platform Evaluates AI Fairness and Accuracy in Diabetic Eye Disease Screening

First-of-Its-Kind Platform Evaluates AI Fairness and Accuracy in Diabetic Eye Disease Screening

November 27, 2025

A groundbreaking real-world evaluation platform has been developed to assess the fairness, accuracy, and trustworthiness of commercial artificial intelligence (AI) algorithms used in diabetic eye disease screening. Designed to support NHS integration, the platform marks the first independent, head-to-head assessment of its kind, aiming to eliminate commercial bias and ensure equitable diagnostic performance across diverse populations.

Addressing the Gaps in AI Assessment for Healthcare

While the NHS currently selects AI algorithms based on cost-effectiveness and parity with human performance, broader challenges, such as the need for robust infrastructure and large-scale fairness testing, remain unmet. Notably, many AI-powered medical devices have not been evaluated for algorithmic fairness across different ethnicities, an oversight that can result in health disparities.

Study Overview and Methodology

The study, published in The Lancet Digital Health, was led by Professor Alicja Rudnicka (City St George's, University of London) and Adnan Tufail (Moorfields Eye Hospital NHS Foundation Trust), in collaboration with Kingston University and Homerton Healthcare NHS Trust.

Key features of the study include:

       • Dataset: 1.2 million retinal images from 202,886 screening visits in the North East London Diabetic Eye Screening Program, one of the UK's most diverse screening cohorts.

       • Participants: 8 CE-marked AI algorithms were evaluated out of 25 invited companies.

       • Process: AI systems were tested in a secure 'trusted research environment' with no access to human grading data. Human evaluations (used as the benchmark) followed NHS protocol and were carried out by up to three trained professionals.

Performance Across Ethnic and Disease Severity Groups

The AI systems demonstrated the following diagnostic performance:

       • Overall accuracy: 83.7% to 98.7% for detecting diabetic eye disease requiring clinical attention.

       • Moderate-to-severe cases: 96.7% to 99.8% accuracy.

       • Proliferative disease (advanced cases): 95.8% to 99.5% accuracy.

Importantly, performance was consistent across different ethnic groups, 32% white, 17% Black, and 39% South Asian, making this the first large-scale evaluation to confirm equitable AI outcomes across diverse populations.

Broader Vision: National AI Infrastructure for Eye Screening

Looking ahead, the researchers envision expanding the platform into a centralized, national AI infrastructure that would:

       • Host NHS-approved algorithms.

       • Allow screening centers to securely upload retinal images.

       • Return AI-analyzed results directly into patient electronic health records.

       • Reduce duplicative infrastructure and promote nationwide consistency in care.

Reference:

Accuracy of automated retinal image analysis systems (ARIAS) to triage for human grading to detect diabetic retinopathy in a large-scale, multiethnic national screening programme, The Lancet Digital Health (2025). DOI: 10.1016/j.landig.2025.100914