Automation of clinical assessments via machine learning and paired comparisons

Abigail Sellen

Automation of clinical assessments via machine learning and paired comparisons

Abigail Sellen

Neurology | April 2015 , Vol 84

Download BibTex

Objectives: To develop a method for automated clinical assessment of motor dysfunction by training machine learning algorithms with fine-grained clinical judgment derived from pairwise comparisons of video recordings of patients. Background: Many clinical assessment scales in neurology suffer from reliability and sensitivity issues, since they require absolute, qualitative judgments from clinicians. Conversely, quantitative surrogate measures may not convey the same clinically relevant information. Supervised machine learning algorithms allow to map automated surrogate measurements onto the clinical assessment scale, and thus automatically create a clinical rating. However, patient numbers are often too limited for standard machine learning approaches, and it is not possible to validate finer-grained output of the algorithm with a coarse-grained clinical assessment scale. Here, we focus on the expanded disability scale (EDSS) for assessment of motor dysfunction in multiple sclerosis (MS). Methods: EDSS sub-scores are predicted by applying a bespoke supervised machine learning algorithm to motion patterns extracted from video recordings of patients performing pre-defined movements. For fine-grained capture of clinical judgment, we use paired comparisons of patient videos by neurologists. To avoid having to compare all possible pairs of videos, we use a variant of the TrueSkill™ algorithm to present only the most informative pairs of videos to neurologists for comparison. Results: Our novel ensemble-based machine learning algorithm provides robust prediction performance of the EDSS sub-scores from the Finger-to-Nose Test. The paired comparisons of recorded movement videos by expert neurologists show a finer-grained clinical judgment than provided by the EDSS sub-scores, providing a new potential validation standard. Conclusions: Our generic approach to automate clinical assessment by combining depth sensor recordings with video analysis and supervised machine learning to produce clinically relevant information validated against fine-grained capture of neurological judgment from pairwise comparisons can overcome the limitations of current clinical ratings in reliably capturing subtle changes in disability.