
A machine learning model for ranking candidate hla class i neoantigens based on known neoepitopes from multiple human tumor types
- Select a language for the TTS:
- UK English Female
- UK English Male
- US English Female
- US English Male
- Australian Female
- Australian Male
- Language selected: (auto detect) - EN
Play all audios:

ABSTRACT Tumor neoepitopes presented by major histocompatibility complex (MHC) class I are recognized by tumor-infiltrating lymphocytes (TIL) and are targeted by adoptive T-cell therapies.
Identifying which mutant neoepitopes from tumor cells are capable of recognition by T cells can assist in the development of tumor-specific, cell-based therapies and can shed light on
antitumor responses. Here, we generate a ranking algorithm for class I candidate neoepitopes by using next-generation sequencing data and a dataset of 185 neoepitopes that are recognized by
HLA class I–restricted TIL from individuals with metastatic cancer. Random forest model analysis showed that the inclusion of multiple factors impacting epitope presentation and recognition
increased output sensitivity and specificity compared to the use of predicted HLA binding alone. The ranking score output provides a set of class I candidate neoantigens that may serve as
therapeutic targets and provides a tool to facilitate in vitro and in vivo studies aimed at the development of more effective immunotherapies. Access through your institution Buy or
subscribe This is a preview of subscription content, access via your institution ACCESS OPTIONS Access through your institution Access Nature and 54 other Nature Portfolio journals Get
Nature+, our best-value online-access subscription $29.99 / 30 days cancel any time Learn more Subscribe to this journal Receive 12 digital issues and online access to articles $119.00 per
year only $9.92 per issue Learn more Buy this article * Purchase on SpringerLink * Instant access to full article PDF Buy now Prices may be subject to local taxes which are calculated during
checkout ADDITIONAL ACCESS OPTIONS: * Log in * Learn about institutional subscriptions * Read our FAQs * Contact customer support SIMILAR CONTENT BEING VIEWED BY OTHERS MULTI-STEP SCREENING
OF NEOANTIGENS’ HLA- AND TCR-INTERFACES IMPROVES PREDICTION OF SURVIVAL Article Open access 11 May 2021 IDENTIFICATION OF NEOANTIGENS FOR INDIVIDUALIZED THERAPEUTIC CANCER VACCINES Article
01 February 2022 A COMPREHENSIVE PROTEOGENOMIC PIPELINE FOR NEOANTIGEN DISCOVERY TO ADVANCE PERSONALIZED CANCER IMMUNOTHERAPY Article Open access 11 October 2024 DATA AVAILABILITY All
next-generation sequencing data are available on dbGap under accession number phs001003.v1.p1. Source data are available from the NIH figshare repository at
https://doi.org/10.35092/yhjc.c.4792338.v2 (ref. 56). CODE AVAILABILITY The models developed and presented in this paper are available at
https://github.com/JaredJGartner/SB_neoantigen_Models. REFERENCES * Huang, J. et al. T cells associated with tumor regression recognize frameshifted products of the _CDKN2A_ tumor suppressor
gene locus and a mutated HLA class I gene product. _J. Immunol._ 172, 6057–6064 (2004). Article CAS PubMed Google Scholar * Zhou, J., Dudley, M. E., Rosenberg, S. A. & Robbins, P.
F. Persistence of multiple tumor-specific T-cell clones is associated with complete tumor regression in a melanoma patient receiving adoptive cell transfer therapy. _J. Immunother._ 28,
53–62 (2005). Article PubMed PubMed Central Google Scholar * Robbins, P. F. et al. Mining exomic sequencing data to identify mutated antigens recognized by adoptively transferred
tumor-reactive T cells. _Nat. Med._ 19, 747–752 (2013). Article CAS PubMed PubMed Central Google Scholar * Lu, Y. C. et al. Mutated PPP1R3B is recognized by T cells used to treat a
melanoma patient who experienced a durable complete tumor regression. _J. Immunol._ 190, 6034–6042 (2013). Article CAS PubMed Google Scholar * Lu, Y. C. et al. Efficient identification
of mutated cancer antigens recognized by T cells associated with durable tumor regressions. _Clin. Cancer Res._ 20, 3401–3410 (2014). Article CAS PubMed PubMed Central Google Scholar *
Prickett, T. D. et al. Durable complete response from metastatic melanoma after transfer of autologous T cells recognizing 10 mutated tumor antigens. _Cancer Immunol. Res._ 4, 669–678
(2016). Article CAS PubMed PubMed Central Google Scholar * Tran, E. et al. Cancer immunotherapy based on mutation-specific CD4+ T cells in a patient with epithelial cancer. _Science_
344, 641–645 (2014). Article CAS PubMed PubMed Central Google Scholar * Tran, E. et al. T-cell transfer therapy targeting mutant KRAS in cancer. _N. Engl. J. Med._ 375, 2255–2262
(2016). Article CAS PubMed PubMed Central Google Scholar * Zacharakis, N. et al. Immune recognition of somatic mutations leading to complete durable regression in metastatic breast
cancer. _Nat. Med._ 24, 724–730 (2018). Article CAS PubMed PubMed Central Google Scholar * Rizvi, N. A. et al. Mutational landscape determines sensitivity to PD-1 blockade in non-small
cell lung cancer. _Science_ 348, 124–128 (2015). Article CAS PubMed PubMed Central Google Scholar * McGranahan, N. et al. Clonal neoantigens elicit T cell immunoreactivity and
sensitivity to immune checkpoint blockade. _Science_ 351, 1463–1469 (2016). Article CAS PubMed PubMed Central Google Scholar * Hellmann, M. D. et al. Genomic features of response to
combination immunotherapy in patients with advanced non-small-cell lung cancer. _Cancer Cell_ 33, 843–852 (2018). * Le, D. T. et al. Mismatch repair deficiency predicts response of solid
tumors to PD-1 blockade. _Science_ 357, 409–413 (2017). Article CAS PubMed PubMed Central Google Scholar * Le, D. T. et al. PD-1 blockade in tumors with mismatch-repair deficiency. _N.
Engl. J. Med._ 372, 2509–2520 (2015). Article CAS PubMed PubMed Central Google Scholar * Peltomaki, P. DNA mismatch repair and cancer. _Mutat. Res._ 488, 77–85 (2001). Article CAS
PubMed Google Scholar * Peters, B. & Sette, A. Generating quantitative models describing the sequence specificity of biological processes with the stabilized matrix method. _BMC
Bioinf._ 6, 132 (2005). Article CAS Google Scholar * Alvarez, B. et al. NNAlign_MA; MHC peptidome deconvolution for accurate MHC binding motif characterization and improved T-cell epitope
predictions. _Mol. Cell. Proteomics_ 18, 2459–2477 (2019). Article CAS PubMed PubMed Central Google Scholar * O’Donnell, T. J. et al. MHCflurry: open-source class I MHC binding
affinity prediction. _Cell Syst._ 7, 129–132 (2018). Article PubMed CAS Google Scholar * Duan, F. et al. Genomic and bioinformatic profiling of mutational neoepitopes reveals new rules
to predict anticancer immunogenicity. _J. Exp. Med._ 211, 2231–2248 (2014). Article PubMed PubMed Central Google Scholar * Bulik-Sullivan, B. et al. Deep learning using tumor HLA peptide
mass spectrometry datasets improves neoantigen identification. _Nat. Biotechnol._ 37, 55–63 (2019). * Hundal, J. et al. pVACtools: a computational toolkit to identify and visualize cancer
neoantigens. _Cancer Immunol. Res._ 8, 409–420 (2020). CAS PubMed PubMed Central Google Scholar * Bjerregaard, A. M., Nielsen, M., Hadrup, S. R., Szallasi, Z. & Eklund, A. C. MuPeXI:
prediction of neo-epitopes from tumor sequencing data. _Cancer Immunol. Immunother._ 66, 1123–1130 (2017). Article CAS PubMed Google Scholar * Kim, S. et al. Neopepsee: accurate
genome-level prediction of neoantigens by harnessing sequence and amino acid immunogenicity information. _Ann. Oncol._ 29, 1030–1036 (2018). Article CAS PubMed Google Scholar *
Kosaloglu-Yalcin, Z. et al. Predicting T cell recognition of MHC class I restricted neoepitopes. _Oncoimmunology_ 7, e1492508 (2018). Article PubMed PubMed Central Google Scholar *
Brown, S. D. et al. Neo-antigens predicted by tumor genome meta-analysis correlate with increased patient survival. _Genome Res._ 24, 743–750 (2014). Article CAS PubMed PubMed Central
Google Scholar * Balachandran, V. P. et al. Identification of unique neoantigen qualities in long-term survivors of pancreatic cancer. _Nature_ 551, 512–516 (2017). Article CAS PubMed
PubMed Central Google Scholar * Parkhurst, M. R. et al. Unique neoantigens arise from somatic mutations in patients with gastrointestinal cancers. _Cancer Discov._ 9, 1022–1035 (2019).
Article CAS PubMed PubMed Central Google Scholar * Tran, E. et al. Immunogenicity of somatic mutations in human gastrointestinal cancers. _Science_ 350, 1387–1390 (2015). Article CAS
PubMed PubMed Central Google Scholar * Lo, W. et al. Immunologic recognition of a shared p53 mutated neoantigen in a patient with metastatic colorectal cancer. _Cancer Immunol. Res._ 7,
534–543 (2019). Article CAS PubMed PubMed Central Google Scholar * Jurtz, V. et al. NetMHCpan-4.0: improved peptide–MHC class I interaction predictions integrating eluted ligand and
peptide binding affinity data. _J. Immunol._ 199, 3360–3368 (2017). Article CAS PubMed Google Scholar * Gfeller, D. et al. The length distribution and multiple specificity of naturally
presented HLA-I ligands. _J. Immunol._ 201, 3705–3716 (2018). Article CAS PubMed Google Scholar * Sarkizova, S. et al. A large peptidome dataset improves HLA class I epitope prediction
across most of the human population. _Nat. Biotechnol._ 38, 199–209 (2020). Article CAS PubMed Google Scholar * Paul, S. et al. HLA class I alleles are associated with peptide-binding
repertoires of different size, affinity, and immunogenicity. _J. Immunol._ 191, 5831–5839 (2013). Article CAS PubMed Google Scholar * Chen, W., Yewdell, J. W., Levine, R. L. &
Bennink, J. R. Modification of cysteine residues in vitro and in vivo affects the immunogenicity and antigenicity of major histocompatibility complex class I-restricted viral determinants.
_J. Exp. Med._ 189, 1757–1764 (1999). Article CAS PubMed PubMed Central Google Scholar * Chen, J. L. et al. Structural and kinetic basis for heightened immunogenicity of T cell
vaccines. _J. Exp. Med._ 201, 1243–1255 (2005). Article CAS PubMed PubMed Central Google Scholar * Sachs, A., et al. Impact of cysteine residues on MHC binding predictions and
recognition by tumor-reactive T cells. _J. Immunol._ 205, 539–549 (2020). * Sahin, U. et al. Personalized RNA mutanome vaccines mobilize poly-specific therapeutic immunity against cancer.
_Nature_ 547, 222–226 (2017). Article CAS PubMed Google Scholar * Horton, P. et al. WoLF PSORT: protein localization predictor. _Nucleic Acids Res._ 35, W585–W587 (2007). Article PubMed
PubMed Central Google Scholar * Abelin, J. G. et al. Mass spectrometry profiling of HLA-associated peptidomes in mono-allelic cells enables more accurate epitope prediction. _Immunity_
46, 315–326 (2017). Article CAS PubMed PubMed Central Google Scholar * Rasmussen, M. et al. Pan-specific prediction of peptide–MHC class I complex stability, a correlate of T cell
immunogenicity. _J. Immunol._ 197, 1517–1524 (2016). * Jorgensen, K. W., Rasmussen, M., Buus, S. & Nielsen, M. NetMHCstab—predicting stability of peptide–MHC-I complexes; impacts for
cytotoxic T lymphocyte epitope discovery. _Immunology_ 141, 18–26 (2014). Article CAS PubMed Google Scholar * Groettrup, M., Kirk, C. J. & Basler, M. Proteasomes in immune cells:
more than peptide producers? _Nat. Rev. Immunol._ 10, 73–78 (2010). * Larsen, M. V. et al. An integrative approach to CTL epitope prediction: a combined algorithm integrating MHC class I
binding, TAP transport efficiency, and proteasomal cleavage predictions. _Eur. J. Immunol._ 35, 2295–2303 (2005). Article CAS PubMed Google Scholar * Capietto, A. H. et al. Mutation
position is an important determinant for predicting cancer neoantigens. _J. Exp. Med._ 217, e20190179 (2020). * Calis, J. J. et al. Properties of MHC class I presented peptides that enhance
immunogenicity. _PLoS Comput. Biol._ 9, e1003266 (2013). Article PubMed PubMed Central Google Scholar * Chowell, D. et al. TCR contact residue hydrophobicity is a hallmark of immunogenic
CD8+ T cell epitopes. _Proc. Natl Acad. Sci. USA_ 112, E1754–E1762 (2015). Article CAS PubMed PubMed Central Google Scholar * Cohen, C. J. et al. Isolation of neoantigen-specific T
cells from tumor and peripheral lymphocytes. _J. Clin. Invest._ 125, 3981–3991 (2015). Article PubMed PubMed Central Google Scholar * Gros, A. et al. PD-1 identifies the patient-specific
CD8+ tumor-reactive repertoire infiltrating human tumors. _J. Clin. Invest._ 124, 2246–2259 (2014). Article CAS PubMed PubMed Central Google Scholar * Gros, A. et al. Prospective
identification of neoantigen-specific lymphocytes in the peripheral blood of melanoma patients. _Nat. Med._ 22, 433–438 (2016). Article CAS PubMed PubMed Central Google Scholar *
Parkhurst, M. et al. Isolation of T-cell receptors specifically reactive with mutated tumor-associated antigens from tumor-infiltrating lymphocytes based on CD137 expression. _Clin. Cancer
Res._ 23, 2491–2505 (2017). Article CAS PubMed Google Scholar * Stevanovic, S. et al. Landscape of immunogenic tumor antigens in successful immunotherapy of virally induced epithelial
cancer. _Science_ 356, 200–205 (2017). Article CAS PubMed PubMed Central Google Scholar * Deniger, D. C. et al. T-cell responses to TP53 “Hotspot” mutations and unique neoantigens
expressed by human ovarian cancers. _Clin. Cancer Res._ 24, 5562–5573 (2018). Article CAS PubMed PubMed Central Google Scholar * Yossef, R. et al. Enhanced detection of
neoantigen-reactive T cells targeting unique and shared oncogenes for personalized cancer immunotherapy. _JCI Insight_ 3, e122467 (2018). * Gros, A. et al. Recognition of human
gastrointestinal cancer neoantigens by circulating PD-1+ lymphocytes. _J. Clin. Invest._ 129, 4992–5004 (2019). Article CAS PubMed PubMed Central Google Scholar * Larsen, M. V. et al.
Large-scale validation of methods for cytotoxic T-lymphocyte epitope prediction. _BMC Bioinf._ 8, 424 (2007). Article CAS Google Scholar * Gartner, J. Datasets for ‘Development of a model
for ranking candidate HLA class I neoantigens based upon datasets of known neoepitopes’. figshare https://doi.org/10.35092/yhjc.c.4792338.v2 (2020). Download references ACKNOWLEDGEMENTS We
thank members of the NIH High Performance Computing (HPC) group for all of their support, assistance and technical advice. This work utilized the computational resources of the NIH HPC
Biowulf cluster (http://hpc.nih.gov). We also thank all members of the tissue procurement team for all of their efforts in acquiring and maintaining the specimens used in this study. AUTHOR
INFORMATION AUTHORS AND AFFILIATIONS * Surgery Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA Jared J. Gartner, Maria R. Parkhurst, Amy Copeland,
Ken-Ichi Hanada, Nikolaos Zacharakis, Almin Lalani, Sri Krishna, Abraham Sachs, Todd D. Prickett, Yong F. Li, Maria Florentin, Scott Kivitz, Samuel C. Chatmon, Steven A. Rosenberg & Paul
F. Robbins * Vall d’Hebron Institute of Oncology (VHIO), Cellex Center, Barcelona, Spain Alena Gros * Earle A. Chiles Research Institute, Providence Cancer Institute, Portland, OR, USA Eric
Tran * Department of Surgery, Dartmouth-Hancock Medical Center, Lebanon, NH, USA Mohammad S. Jafferji Authors * Jared J. Gartner View author publications You can also search for this author
inPubMed Google Scholar * Maria R. Parkhurst View author publications You can also search for this author inPubMed Google Scholar * Alena Gros View author publications You can also search
for this author inPubMed Google Scholar * Eric Tran View author publications You can also search for this author inPubMed Google Scholar * Mohammad S. Jafferji View author publications You
can also search for this author inPubMed Google Scholar * Amy Copeland View author publications You can also search for this author inPubMed Google Scholar * Ken-Ichi Hanada View author
publications You can also search for this author inPubMed Google Scholar * Nikolaos Zacharakis View author publications You can also search for this author inPubMed Google Scholar * Almin
Lalani View author publications You can also search for this author inPubMed Google Scholar * Sri Krishna View author publications You can also search for this author inPubMed Google Scholar
* Abraham Sachs View author publications You can also search for this author inPubMed Google Scholar * Todd D. Prickett View author publications You can also search for this author inPubMed
Google Scholar * Yong F. Li View author publications You can also search for this author inPubMed Google Scholar * Maria Florentin View author publications You can also search for this
author inPubMed Google Scholar * Scott Kivitz View author publications You can also search for this author inPubMed Google Scholar * Samuel C. Chatmon View author publications You can also
search for this author inPubMed Google Scholar * Steven A. Rosenberg View author publications You can also search for this author inPubMed Google Scholar * Paul F. Robbins View author
publications You can also search for this author inPubMed Google Scholar CONTRIBUTIONS J.J.G., P.F.R. and S.A.R. designed the study and drafted the manuscript. J.J.G. trained models and
evaluated all nmers and mmps. T.D.P. and S.C.C. generated exomes and RNA-seq libraries. N.Z., K.H., Y.F.L. and P.F.R. designed minigene constructs encoding candidate neoantigens and
generated in vitro-transcribed RNA used to perform screening assays. M.R.P., M.F. and S. Kivitz synthesized peptides used for T-cell screening assays. M.R.P., A.G., E.T., M.S.J., A.C., K.H.,
N.Z., A.L., S. Krishna and A.S. evaluated T cells for their ability to recognize nmers/mmps in the context of the appropriate HLA class I restriction elements. CORRESPONDING AUTHOR
Correspondence to Paul F. Robbins. ETHICS DECLARATIONS COMPETING INTERESTS The authors declare no competing interests. ADDITIONAL INFORMATION PEER REVIEW INFORMATION _Nature Cancer_ thanks
the anonymous reviewers for their contribution to the peer review of this work. PUBLISHER’S NOTE Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations. EXTENDED DATA EXTENDED DATA FIG. 1 PERCENTILE RANK COMPARISONS BETWEEN NETMHCPAN4.0 EL AND MHCFLURRY1.6 PERCENTILE RANK. Percentile rank of positive mmps were
mapped by their MHCflurry1.6 rank on the x-axis and the NetMHCpan4.0 EL model rank on the y-axis. Red Triangles correspond to mmps containing cysteine residues at positions 2,3 or C-terminus
(n=12) while orange dots correspond to peptides containing cysteine residues at position 1 or between positions 3 and the C-terminus (n=107). EXTENDED DATA FIG. 2 NMER LOCALIZATION
PREDICTIONS. WoLF Psort algorithm was used on all nmer proteins (n=9541) to predicted for localization. Blue bars are CD8 + Positive nmers, Orange bars are negative nmers. Y-axis represents
frequency of each group predicted to localize. X axis are the WoLF Psort prediction abbreviations. chlo = chloroplast, cyto = cytosol, cysk = cytoskeleton, E.R. = endoplasmic reticulum, extr
= extracellular, golg = Golgi apparatus, lyso = lysosome, mito = mitochondria, nucl = nuclear, pero = peroxisome, plas = plasma membrane, vacu = vacuolar membrane . Individual totals for
each groups positive and negative can be found in Supplementary Table 12. Hyphenated values denote compound prediction. P-values comparing positive to negative nmers displayed over each
prediction. P-values calculated using a two-sided Fisher’s exact test and corrected using Bonferroni correction for multiple comparisons. EXTENDED DATA FIG. 3 GENE EXPRESSION DECILE OF MMPS.
Gene expression deciles of positive (n=119) and negative mmps (n=2681162). Box indicates quartiles 2 & 3 and inter quartile range, median indicated by line in box plot, whiskers
represent quartile 1 and 4 ± 1.5X IQR or minimum/maximum value if within the whisker values. Significance calculated with Mann-Whitney U test. EXTENDED DATA FIG. 4 IEDB IMMUNOGENICITY SCORES
OF MMPS. IEDB Immunogenicity scores were generated for each mmp using the IEDB immunogenicity tool. The panels are split into all mmps (positive n=119, negative n=2681162), comparison of
just those with a mutation anchor in position 2,3 or C-terminus (positive n=55, negative n= 1167363) and those without mutations in position 2,3, or C-terminus (positive n= 64, negative n=
1513799). Box indicates quartiles 2 & 3 and inter quartile range, median indicated by line in box plot, whiskers represent quartile 1 and 4 ± 1.5X IQR or minimum/maximum value if within
the whisker values. Significance was calculated using the Mann-Whitney U test. EXTENDED DATA FIG. 5 HYDROPHOBICITY SCORES OF T-CELL CONTACT REGIONS. Hydrophobicity scores were calculated
summing the Kyte-Doolittle hydrophobicity score of positions 4 through n-1. The panels are split into all mmps (positive n=119, negative n=2681162), comparison of just those with a anchor in
position 2,3 or C-terminus (positive n=55, negative n= 1167363) and those without mutations in position 2,3, or C-terminus (positive n= 64, negative n= 1513799). Box indicates quartiles 2
& 3 and inter quartile range, median indicated by line in box plot, whiskers represent quartile 1 and 4 ± 1.5X IQR or minimum/maximum value if within the whisker values. Significance
calculated with Mann-Whitney U test. EXTENDED DATA FIG. 6 TOP NMER MODELS USING EITHER MMP SCORE OF MHCFLURRY SCORE AS INPUT. ROC curve showing the mean performance of the top models using
either MMP model scores or MHCflurry scores as input. Solid line represents mean for each model across n=5 folds, shaded area is the standard deviation at each point along the x-axis.
SUPPLEMENTARY INFORMATION REPORTING SUMMARY SUPPLEMENTARY TABLES 1–23 RIGHTS AND PERMISSIONS Reprints and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE Gartner, J.J., Parkhurst, M.R.,
Gros, A. _et al._ A machine learning model for ranking candidate HLA class I neoantigens based on known neoepitopes from multiple human tumor types. _Nat Cancer_ 2, 563–574 (2021).
https://doi.org/10.1038/s43018-021-00197-6 Download citation * Received: 06 December 2019 * Accepted: 11 March 2021 * Published: 03 May 2021 * Issue Date: May 2021 * DOI:
https://doi.org/10.1038/s43018-021-00197-6 SHARE THIS ARTICLE Anyone you share the following link with will be able to read this content: Get shareable link Sorry, a shareable link is not
currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt content-sharing initiative