SortMeRNA Benchmark Summary

Generated: 2026-06-05 08:28:07 | SortMeRNA v6.0.2 | Database: smr_v6.0.2_default_db | E-value: 1e-5 | AWS EC2 r6i.16xlarge (64 vCPUs, 512 GB RAM), 40 threads

Datasets: Benchmark datasets from Deng et al. 2022 (Nucleic Acids Research, doi:10.1093/nar/gkac112). FN = rRNA input (measures sensitivity); FP = non-rRNA input (measures specificity); FN+FP = mixed.

DatasetTestPairsDescription
SILVA_rRNAFN20,000,000SILVA SSU+LSU rRNA sequences
OMA_CDSFP20,000,000Prokaryotic and eukaryotic mRNA
ENA_virusFP27,206,792Viral gene sequences from ENA
Amplicon_16SFN7,917,920Real 16S V1-V2 amplicon reads (oral microbiome)
Human_ncRNAFP6,330,381Human non-coding RNA
MetaTFN+FP9,165,829Oral metatranscriptome: 4.7M prokaryotic mRNA, 2.5M human mRNA, 73K viral mRNA, 1.9M rRNA (21% rRNA fraction)

Metrics: Sensitivity = (total - misclassifications) / total for FN datasets. FPR = misclassifications / total for FP datasets. MetaT reports reads classified as rRNA vs. the expected ~21% rRNA fraction.

DatasetTypeTotal pairs Misclassifications / rRNA classifiedMetricValue Wall time (s)Memory (MB)
OMA_CDSnonrrna200000007031FPR0.04%5473821
SILVA_rRNArrna200000005124Sensitivity99.97%41604393
Amplicon_16Srrna79179207Sensitivity100.00%13574203
Human_ncRNAnonrrna63303814493FPR0.07%1013805
MetaTmixed91658291888326 (20.6% of reads classified as rRNA; ~21% expected)NANA9993973
ENA_virusnonrrna272067921769FPR0.01%3423798