ClusTCR: a Python interface for rapid clustering of large sets of CDR3 sequences with unknown antigen specificity
A two-step clustering approach that combines the speed of the Faiss Clustering Library with the accuracy of Markov Clustering Algorithm
On a standard machine*, ClusTCR can cluster 1 million CDR3 sequences in under 5 minutes.
 *Intel(R) Core(TM) i7-10875H CPU @ 2.30GHz, using 8 CPUs
Compared to other state-of-the-art clustering algorithms (GLIPH2, iSMART and TCRDist), ClusTCR shows comparable clustering quality, but provides a steep increase in speed and scalability.
 .png) 
 
Getting started
To install ClusTCR on Linux or OSX, simply use conda.
conda install clustcr -c svalkiers -c bioconda -c pytorch -c conda-forge
A GPU version is also available (only for CUDA enabled GPUs on Linux), with support for the use_gpu parameter in the Clustering interface.
conda install clustcr-gpu cudatoolkit=VERSION -c svalkiers -c bioconda -c pytorch -c conda-forge
with the cudatoolkit version being
- 8.0 for CUDA8
- 9.0 for CUDA9
- 10.0 for CUDA10
To read more about the GPU support, visit the Faiss GPU page.