Multi-Modal Datasets for Computational Pathology (CPath)

In the field of Computational Pathology (CPath), the application of multi-modal datasets is becoming increasingly important. Today, we have compiled a super comprehensive summary of multi-modal CPath datasets. Whether you are engaged in pathology AI research, model training, or just want to learn about the latest data resources, you can find what you need here!

This article categorizes the datasets into Image-Text Pair Datasets and Multi-Modal Instruction Datasets, detailing the description, staining type, data source, public availability, and whether large models assisted in generation for each dataset.

Multi-Modal Datasets for Computational Pathology (CPath)

📸 Image-Text Pair Datasets

1.QUILT

oData Type: Slice-Description Pair

oDescription: 437,878 slices, 802,404 descriptions, from 4,475 videos

oStaining: H&E (H), IHC (I), Others (O)

oSource: YouTube

oPublic Availability:

oLarge Model Assistance:

2.PathCap

oData Type: Slice-Description Pair

oDescription: 208k pathology slice-description pairs

oStaining: H, I, O

oSource: PubMed

oPublic Availability:

oLarge Model Assistance:

3.OpenPath

oData Type: Slice-Description Pair

oDescription: 208,014 slice-description pairs

oStaining: I, O

oSource: WSI-Twitter, Open Source Libraries, Internet

oPublic Availability:

oLarge Model Assistance:

4.CONCH

oData Type: Slice-Description Pair

oDescription: 1,170,674 slice-description pairs

oStaining: H, I

oSource: PMC-OA

oPublic Availability:

oLarge Model Assistance:

5.HistGen

oData Type: Whole Slide Image (WSI)-Report Pair

oDescription: 75,723 pairs

oStaining: H

oSource: PMC-OA

oPublic Availability:

oLarge Model Assistance:

6.Mass-3QK

oData Type: WSI

oDescription: 335,665 WSIs covering 20 organs

oStaining: H, M, I

oSource: GTEx

oPublic Availability:

oLarge Model Assistance:

7.CAPTION-PATCH CAPTION

oData Type: Slice-Description Pair

oDescription: 10.5 million pairs

oStaining: H, I, O

oSource: TCGA

oPublic Availability:

oLarge Model Assistance:

8.MUNICH

oData Type: WSI-Report Pair

oDescription: 15,129 pairs from 6,705 patients

oStaining: I

oSource: TCGA

oPublic Availability:

oLarge Model Assistance:

9.PCAPTION-C

oData Type: Slice-Description Pair

oDescription: 1,409,058 pairs, cleaned (removing non-human pathology data and short texts)

oStaining: H, I, O

oSource: PMC-OA, QUILT-1M

oPublic Availability:

oLarge Model Assistance:

10.ARCHI

oData Type: Package-Description Pair

oDescription: 21,186 packages containing 33,480 slice-description pairs

oStaining: H, I, O

oSource: PubMed

oPublic Availability:

oLarge Model Assistance:

11.MI-ZERO

oData Type: Slice-Description Pair

oDescription: Slice-description pairs from educational resources

oStaining: H, I, O

oSource: ARCHI

oPublic Availability:

oLarge Model Assistance:

Multi-Modal Datasets for Computational Pathology (CPath)

️ Multi-Modal Instruction Datasets

1.PathInstrucT

oData Type: Slice-Level Instructions

oDescription: 180k multi-modal instruction samples

oStaining: H, I, O

oSource: YouTube

oPublic Availability:

oLarge Model Assistance:

2.CAPTION-PATCH Instruction

oData Type: Slice-Level Instructions

oDescription: 351,871 samples covering description generation, visual question answering (VQA), and classification tasks

oStaining: H

oSource: CAPTION-VQA, PathGen, CAPTION-PATCH

oPublic Availability:

oLarge Model Assistance:

3.CAPTI-WSI Instruction

oData Type: WSI-Level Instructions

oDescription: 7,312 WSI-level samples

oStaining: H

oSource: HistGen

oPublic Availability:

oLarge Model Assistance:

4.QUILT-Instruct

oData Type: Question-Answer Pairs (VQA)

oDescription: 107,131 question-answer pairs

oStaining: H

oSource: YouTube

oPublic Availability:

oLarge Model Assistance:

5.PathCapQ&A Bench

oData Type: Slice-Level Instructions

oDescription: 456,916 instructions, 999,022 question-answer pairs

oStaining: H

oSource: PMC-OA, TCGA

oPublic Availability:

oLarge Model Assistance:

6.CLOVER

oData Type: Instructions

oDescription: 45,000 question-answer instructions

oStaining: I

oSource: PathVQA

oPublic Availability:

oLarge Model Assistance:

Leave a Comment