Explore clustering results

Initiate the ClusterAnalysis object

ClusTCR provides a number of options for exploration of clustering results. To perform these analysis, you must initiate a ClusterAnalysis object, which takes cluster features as its only argument.

from clustcr import ClusterAnalysis
analysis = ClusterAnalysis(features)

PCA

To get a quick overview of the newly generated features, you can perform a principal component analysis (PCA). clusTCR contains a built-in PCA functionality, which can be conveniently executed by performing the .pca() method on a ClusterAnalysis object.

analysis.pca()

Performing a PCA with clusTCR will provide a figure of the PCA loadings.

drawing

Analyzing cluster quality

An additional feature of ClusTCR’s analysis module is predicting the quality of a cluster. Here, quality is determined as the purity of a cluster. This feature is particularly useful when no information is available about the target epitopes of the clustered TCR sequences. We trained a classification model that predicts whether an individual cluster will be of good (1) or bad (0) quality. Good clusters have a predicted purity of > 0.90, low quality clusters have a purity < 0.90.

Using the pre-trained model
predictions = analysis.predict_quality()
Training your own model

You can also train your own model by creating a ModelTraining object. To do this, you will need to provide the following three arguments: features (see features page), results from the clustering and epitope data corresponding to the CDR3 sequences used for clustering.

from clustcr import ModelTraining
model = ModelTraining(features, clusters, epitopes)
clf = model.fit()

You can evaluate your own model using the .evaluate() method. This will perform 10-fold stratified cross-validation and outputs a receiver operating characteristic (ROC) curve with corresponding area under the curve (AUC) value.

# Evaluate your own model
model.evaluate()

drawing

Save your model using the following method:

model.save(clf, '/path_to_model/my_custom_model.pkl')

If you want to use this model to make quality predictions, just specify the path to your model in the model parameter of the predict_quality() method.