Peptide Sequence Tag Extraction by Graph Convolution Neural Networks
Main Article Content
Abstract
The peptide sequence tag extraction method plays a vital role in tandem mass spectrometry-based protein identification engines. This approach faces two significant challenges in practical applications: first, the issue of fixed tag lengths, where shorter tags lack sufficient specificity, leading to an excessive recall of non-target peptide sequences, and longer tags experience a reduction in precision as tag length increases, potentially failing to recall target peptide sequences; second, the sensitivity and precision of tag extraction remain relatively low. To address these issues, a variable-length peptide sequence tag extraction algorithm, TagEx, based on graph convolutional networks, is proposed. This method begins by training a de novo peptide sequencing scoring model utilizing graph convolutional networks. It then constructs a spectral peak connection graph from the mass spectrum, employing a depth-first search strategy to extract variable-length peptide sequence tags, with the trained graph convolutional network model scoring amino acid connections during the extraction process.Finally, tags are filtered based on length and scoring to obtain the final candidate peptide sequence tag set. To evaluate TagEx’s performance, it was benchmarked against three representative tag extraction software tools: InsPect, PepNovo+, and DirecTag. The experimental results demonstrate that TagEx exhibits superior sensitivity, coverage, and precision, with improvements of 0.62-2.32, 3.22-11.14, and 3.29-8.31 percentage points, respectively, when retaining the top 100 tags.
Article Details
This work is licensed under a Creative Commons Attribution 4.0 International License.