A Deep Learning Algorithm for Prediction of Age-Related Eye Disease Study Severity Scale for Age-Related Macular Degeneration from Color Fundus Photography

Felix Graßmann, Judith Mengelkamp, Caroline Brandl, Sebastian Harsch, Martina E. Zimmermann, Birgit Linkohr, Annette Peters, Iris M. Heid, Christoph Palm, Bernhard H. F. Weber

Purpose
Age-related macular degeneration (AMD) is a common threat to vision. While classification of disease stages is critical to understanding disease risk and progression, several systems based on color fundus photographs are known. Most of these require in-depth and time-consuming analysis of fundus images. Herein, we present an automated computer-based classification algorithm.
Design Algorithm development for AMD classification based on a large collection of color fundus images. Validation is performed on a cross-sectional, population-based study.
Participants.

We included 120 656 manually graded color fundus images from 3654 Age-Related Eye Disease Study (AREDS) participants. AREDS participants were >55 years of age, and non-AMD sight-threatening diseases were excluded at recruitment. In addition, performance of our algorithm was evaluated in 5555 fundus images from the population-based Kooperative Gesundheitsforschung in der Region Augsburg (KORA; Cooperative Health Research in the Region of Augsburg) study.
Methods.

We defined 13 classes (9 AREDS steps, 3 late AMD stages, and 1 for ungradable images) and trained several convolution deep learning architectures. An ensemble of network architectures improved prediction accuracy. An independent dataset was used to evaluate the performance of our algorithm in a population-based study.
Main Outcome Measures.

κ Statistics and accuracy to evaluate the concordance between predicted and expert human grader classification.
Results.

A network ensemble of 6 different neural net architectures predicted the 13 classes in the AREDS test set with a quadratic weighted κ of 92% (95% confidence interval, 89%–92%) and an overall accuracy of 63.3%. In the independent KORA dataset, images wrongly classified as AMD were mainly the result of a macular reflex observed in young individuals. By restricting the KORA analysis to individuals >55 years of age and prior exclusion of other retinopathies, the weighted and unweighted κ increased to 50% and 63%, respectively. Importantly, the algorithm detected 84.2% of all fundus images with definite signs of early or late AMD. Overall, 94.3% of healthy fundus images were classified correctly.

Conclusions
Our deep learning algoritm revealed a weighted κ outperforming human graders in the AREDS study and is suitable to classify AMD fundus images in other datasets using individuals >55 years of age.