Researchers have introduced a significant dataset that addresses the challenge of color and texture variations in histopathology images, affecting the generalizability of machine learning models in the medical field. The comprehensive dataset, PathoLogy Images of Scanners and Mobile phones (PLISM), includes 46 human tissue types stained using 13 different hematoxylin and eosin conditions and captured by 13 imaging devices.
Histopathological images often exhibit color and texture heterogeneity due to differences in staining conditions and imaging devices across hospitals. This variability hinders the robustness of machine learning models when exposed to out-of-domain data. To mitigate this issue, the PLISM dataset provides precisely aligned image patches from various domains to allow accurate evaluation of color and texture properties.
The dataset encompasses a wide range of colors similar to existing datasets while incorporating images captured by whole-slide scanners and smartphones. By including images from different domains at the patch level, researchers can analyze the impact of diverse imaging modalities and staining types on machine learning algorithms. The PLISM dataset aims to enhance the development of robust machine learning models capable of addressing challenges related to domain shift in histological image analysis.
This initiative aligns with the advancements in digital pathology facilitated by whole-slide scanners, which have revolutionized the capture and analysis of high-resolution digital images of complete specimens. Coupled with the progress in deep learning, artificial intelligence applications are being developed to support pathologists in tasks such as predicting patient prognosis and providing decision support for treatment plans based on whole-slide images.
Color and texture heterogeneity in digital histology images pose a significant challenge, stemming from inconsistencies in tissue preparation, staining, and scanning procedures before obtaining whole-slide images. Factors such as variations in hematoxylin and eosin formulations, exposure to light, and different imaging properties of scanners contribute to the color and texture variations observed in histopathological images. Additionally, the use of smartphones for capturing histological images introduces further variability in image quality, complicating the analysis process.
To overcome these challenges, researchers have developed the PLISM dataset as a valuable resource for evaluating domain shifts in digital pathology. By pre-training convolutional neural networks on the PLISM dataset, improvements in addressing domain shift have been observed, paving the way for more robust machine learning models in histological image analysis. The dataset’s unique design and inclusion of diverse imaging modalities and staining conditions offer insights into the impact of these factors on the performance of AI algorithms in various domains.