Scalable Universal T-Cell Receptor Embeddings from Adaptive Immune Repertoires

ICLR 2025 |

Publication

T cells are a key component of the adaptive immune system, targeting infections, cancers, and allergens with specificity encoded by their T cell receptors (TCRs), and retaining a memory of their targets. High-throughput TCR repertoire sequencing captures a cross-section of TCRs that encode the immune history of any subject, though the data are heterogeneous, high dimensional, sparse, and mostly unlabeled. Sets of TCRs responding to the same antigen, i.e., a protein fragment, co-occur in subjects sharing immune genetics and exposure history. Here, we leverage TCR co-occurrence across a large set of TCR repertoires and employ the GloVe (Pennington et al., 2014) algorithm to derive low-dimensional, dense vector representations (embeddings) of TCRs. We then aggregate these TCR embeddings to generate subject-level embeddings based on observed subject-specific TCR subsets. Further, we leverage random projection theory to improve GloVe’s computational efficiency in terms of memory usage and training time. Extensive experimental results show that TCR embeddings targeting the same pathogen have high cosine similarity, and subject-level embeddings encode both immune genetics and pathogenic exposure history.

论文与出版物下载

Immunomics – JL-GloVe

21 3 月, 2025

We employ GloVe and random projection theory to infer immunologically meaningful T-cell receptor embeddings from adaptive immune repertoires. This repository contains the Pytorch code to replicate experiments in our paper "Scalable Universal T-Cell Receptor Embeddings from Adaptive Immune Repertoires" accepted at the International Conference on Learning Representations (ICLR 2025).