Introducing CohortFinder: A Revolutionary Tool for Data-Driven Digital Pathology and Imaging
CohortFinder is an innovative open-source data-partitioning tool that is set to revolutionize the field of digital pathology and imaging. This powerful tool has been designed to identify potential batch-effect groups and ensure their proportional representation when partitioning cohorts into training and testing sets. By doing so, CohortFinder significantly enhances the reliability of machine-learning models in batch-effect-laden datasets, making downstream workflows more efficient and effective.
The key to CohortFinder’s success lies in its ability to detect batch effects a priori, utilizing computationally derived quality control metrics generated by open-source tools such as HistoQC and MRQy. These metrics provide valuable insights into the presentation of digital pathology and imaging data, allowing CohortFinder to identify groups of images with similar characteristics and partition them into balanced training and testing sets.
In addition to improving data partitions, CohortFinder also offers the unique capability of facilitating rapid identification of representative samples for downstream workflows, such as annotation. Moreover, as our understanding of batch effects and quality control metrics evolves, CohortFinder is well-equipped to adapt and incorporate more sophisticated metrics to further enhance the performance of machine-learning models.
To evaluate the efficacy of CohortFinder, three diverse deep-learning use cases in digital pathology and medical imaging were selected, including tubule segmentation on kidney whole-slide images, adenocarcinoma detection on colon whole-slide images, and rectal cancer segmentation on MR images. Through rigorous internal patient-level cross-validation and external testing, CohortFinder demonstrated its ability to yield optimal data partitions and improve model performance across different use cases.
Overall, CohortFinder represents a significant advancement in the field of digital pathology and imaging, offering researchers a powerful tool to enhance the reliability and accuracy of machine-learning models in batch-effect-laden datasets. With its open-source nature and user-friendly interface, CohortFinder is poised to drive innovation and progress in digital pathology and imaging research.
For more information on CohortFinder and to access the source code, visit cohortfinder.com.