ClusTCR: a Python interface for rapid clustering of large sets of CDR3 sequences with unknown antigen specificity

A two-step clustering approach that combines the speed of the Faiss Clustering Library with the accuracy of Markov Clustering Algorithm

On a standard machine*, ClusTCR can cluster 1 million CDR3 sequences in under 5 minutes.
*Intel(R) Core(TM) i7-10875H CPU @ 2.30GHz, using 8 CPUs

Compared to other state-of-the-art clustering algorithms (GLIPH2, iSMART and TCRDist), ClusTCR shows comparable clustering quality, but provides a steep increase in speed and scalability.

drawing

Getting started

To install ClusTCR on Linux or OSX, simply use conda.

conda install clustcr -c svalkiers -c bioconda -c pytorch -c conda-forge

A GPU version is also available (only for CUDA enabled GPUs on Linux), with support for the use_gpu parameter in the Clustering interface.

conda install clustcr-gpu cudatoolkit=VERSION -c svalkiers -c bioconda -c pytorch -c conda-forge

with the cudatoolkit version being

  • 8.0 for CUDA8
  • 9.0 for CUDA9
  • 10.0 for CUDA10

To read more about the GPU support, visit the Faiss GPU page.