← BACK TO WORKS
Deep LearningImage ClassificationHistologyResNetDinoV2Anomaly DetectionUniversity Project

HISTOLOGICAL IMAGE CLASSIFICATION: AN2DL SECOND CHALLENGE

A patch-based deep learning approach to categorize diseased tissue image slides into molecular subtypes, leveraging ResNet50 and adaptive patch extraction.

December 10, 2025

Histological Image Classification

This project was developed as part of the Artificial Neural Networks and Deep Learning (AN2DL) course at Politecnico di Milano. Following the first challenge, our team tackled a complex computer vision task in the medical domain. Below is an overview of our methodology and results.

The Problem: Tissue Molecular Subtypes

This challenge focused on categorizing diseased tissue image slides into four distinct molecular subtypes: Luminal A, Luminal B, HER2(+), and Triple Negative.

We were provided with 691 low-magnification whole-slide images (WSI) along with binary masks segmenting the diseased regions. The dataset presented several challenges, including a modest class imbalance and the presence of severe outliers — such as irrelevant "Shrek" images and synthetic "green stain" artifacts — which we identified using an image retrieval system based on DINOv2 [4].

Our Approach

Because high-resolution medical images cannot be simply downscaled without losing crucial cellular details, our pipeline heavily relied on strategic patch extraction and transfer learning.

1. Data Cleaning & Pre-processing

We completely discarded the irrelevant outlier images. We also fixed the "green stain" artifacts by thresholding the saturation channel in the HSV color space using Otsu's method [5], substituting the artifacts with surrounding background pixels.

2. Patch Extraction Strategy

To avoid resizing the original images, we generated 224×224 training crops. We experimented with several strategies:

  • ROIs Center-Crop: Crops centered on each connected component in the mask.
  • ROIs Sliding Window: Tiling on the bounding box of connected components, retaining patches with ≥50% tissue overlap.
  • Grid-Crop: Sliding window over the entire WSI.

Since the provided binary masks sometimes missed diseased tissue, we also trained a U-Net architecture to generate more generous and accurate segmentation masks for anchoring our crops.

3. Model Architecture

We selected the ResNet family as our primary vision backbone. To combat overfitting on our limited data, we fine-tuned only the last convolutional block of a ResNet50 pretrained on ImageNet. We also experimented with initializing weights from LUNIT [2], which uses momentum contrastive learning (MoCo) [1] on pathology-specific datasets.

Results & Key Takeaways

Transfer learning combined with targeted data-centric techniques — such as Adaptive Patch Extraction and MixUp augmentation — proved essential for this limited-data scenario.

Our most successful configuration was a fine-tuned ResNet50 using Center Crop and MixUp, achieving a Validation F1 score of 0.4834 ± 0.0233 and a Test F1 score of 0.4317.

Pretrained backbones such as ResNet50 achieved significantly better performance and faster convergence. Deeper models like ResNet101 overfitted quickly compared to the lighter ResNet50.

Interestingly, some advanced techniques hindered our progress. Applying Macenko stain normalization [3][6] to standardize color variations actually introduced artifacts that degraded performance on certain folds. This project reinforced the importance of thoroughly validating preprocessing techniques before integrating them into a medical imaging pipeline.

// REFERENCES

  • [1]

    He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum Contrast for Unsupervised Visual Representation Learning. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 9729–9738.

  • [2]

    Kang, M., Song, H., Park, S., Yoo, D., & Peery, S. (2023). Benchmarking Self-Supervised Learning on Diverse Pathology Datasets. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

  • [3]

    Macenko, M., Niethammer, M., Marron, J. S., Borland, D., Woosley, J. T., Guan, X., Schmitt, C., & Thomas, N. E. (2009). A Method for Normalizing Histology Slides for Quantitative Analysis. IEEE International Symposium on Biomedical Imaging, 1107–1110.

  • [4]

    Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., & Bourdoukan, R. (2023). DINOv2: Learning Robust Visual Features without Supervision. Transactions on Machine Learning Research.

  • [5]

    Otsu, N. (1979). A Threshold Selection Method from Gray-Level Histograms. IEEE Transactions on Systems, Man, and Cybernetics, 9(1), 62–66.

  • [6]

    Vahadane, A., Peng, T., Sethi, A., Albarqouni, S., Wang, L., Baust, M., Steiger, K., Weichert, W., & Navab, N. (2016). Structure-Preserving Color Normalization and Sparse Stain Separation for Histological Images. IEEE Transactions on Medical Imaging, 35(8), 1962–1971.