Title: MLCommons Launches MedPerf, a New Platform to Evaluate AI Medical Models
The healthcare industry is increasingly embracing AI, with 80% of healthcare organizations already having an AI strategy in place, and an additional 15% planning to launch one, according to a survey by Optum. In response to the growing demand for AI in healthcare, MLCommons has developed a new platform called MedPerf to benchmark and evaluate AI medical models.
With the proliferation of medical models in the market, it has become challenging to determine which models actually perform as advertised. Many medical models are trained with data from limited clinical settings, leading to biases and harmful impacts, particularly on minority patient populations.
MedPerf aims to establish a reliable and trusted way to benchmark and evaluate medical models. Designed to be used by healthcare organizations rather than vendors, MedPerf allows hospitals and clinics to assess AI models on demand. It utilizes federated evaluation to remotely deploy models and evaluate them on-premises while protecting patient privacy.
As part of a two-year collaboration led by the Medical Working Group, MedPerf received input from over 20 companies and more than 20 academic institutions, including Google, Amazon, IBM, Intel, Brigham and Women’s Hospital, Stanford, and MIT.
In a recent test, MedPerf hosted the NIH-funded Federated Tumor Segmentation (FeTS) Challenge, which involved evaluating 41 different models across 32 healthcare sites on six continents. The results showed reduced performance of the models at sites with different patient demographics, revealing the biases within them.
While MLCommons sees MedPerf as a foundational step towards accelerating medical AI through open and scientific approaches, it primarily focuses on evaluating radiology scan-analyzing models. However, MLCommons encourages AI researchers to validate their own models using the platform and urges data owners to register their patient data to enhance the robustness of MedPerf’s testing.
While MedPerf addresses the issue of medical model bias, there are still challenges to overcome in implementing AI in healthcare. Incorporating AI into the daily routines of doctors and nurses, as well as dealing with complex care-delivery and technical systems, remains a challenge. A report from Duke University highlights this gap between AI marketing and the practical implementation of the technology.
The concerns surrounding AI in healthcare are reflected in a poll by Yahoo Finance, which found that 55% of healthcare practitioners believe the technology is not yet ready for use, and only 26% believe it can be trusted.
While MedPerf offers a valuable tool for benchmarking and evaluating medical models, the safe deployment of these models requires continuous auditing by vendors, customers, and researchers. Benchmarks alone do not provide a complete picture, and thorough testing is essential to ensure responsible use of AI in healthcare.