SortMeRNA Database Clustering Summary

Sequence counts after vsearch clustering at multiple identity thresholds

This table summarises how many representative sequences remain in each reference database after clustering at decreasing identity thresholds. Each column group covers a source database and taxonomic kingdom; each row corresponds to a SortMeRNA database configuration:

Baseline rows (SILVA NR99 / RFAM unclustered) show the full unfiltered sequence counts against which the clustered databases can be compared. A dash (—) indicates that no sequences were available for that kingdom/database combination.

Source database versions:

ConfigurationSILVA 138.2 SSURef NR99SILVA LSURef NR99RFAM
ArchaeaBacteriaEukaryotaArchaeaBacteriaEukaryota5S5.8S
%clust#seqs%clust#seqs%clust#seqs%clust#seqs%clust#seqs%clust#seqs%clust#seqs%clust#seqs
SILVA NR99 (unclustered)99%20,38999%431,16699%58,940
SMR v4.7 sensitive db97%8,93497%165,46897%28,707
SMR v4.7 default db95%5,29395%99,72195%18,810
SMR v4.7 fast db90%1,83990%31,97990%8,464
SMR v4.7 fast db (85%)85%81385%10,11185%4,199