Sequence counts after vsearch clustering at multiple identity thresholds
This table summarises how many representative sequences remain in each reference database after clustering at decreasing identity thresholds. Each column group covers a source database and taxonomic kingdom; each row corresponds to a SortMeRNA database configuration:
Baseline rows (SILVA NR99 / RFAM unclustered) show the full unfiltered sequence counts against which the clustered databases can be compared. A dash (—) indicates that no sequences were available for that kingdom/database combination.
Source database versions:
| Configuration | SILVA 138.2 SSURef NR99 | SILVA LSURef NR99 | RFAM | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Archaea | Bacteria | Eukaryota | Archaea | Bacteria | Eukaryota | 5S | 5.8S | |||||||||
| %clust | #seqs | %clust | #seqs | %clust | #seqs | %clust | #seqs | %clust | #seqs | %clust | #seqs | %clust | #seqs | %clust | #seqs | |
| SILVA NR99 (unclustered) | 99% | 20,389 | 99% | 431,166 | 99% | 58,940 | — | — | — | — | — | — | — | — | — | — |
| SMR v4.7 sensitive db | 97% | 8,934 | 97% | 165,468 | 97% | 28,707 | — | — | — | — | — | — | — | — | — | — |
| SMR v4.7 default db | 95% | 5,293 | 95% | 99,721 | 95% | 18,810 | — | — | — | — | — | — | — | — | — | — |
| SMR v4.7 fast db | 90% | 1,839 | 90% | 31,979 | 90% | 8,464 | — | — | — | — | — | — | — | — | — | — |
| SMR v4.7 fast db (85%) | 85% | 813 | 85% | 10,111 | 85% | 4,199 | — | — | — | — | — | — | — | — | — | — |