Peptide Sequence Tag Extraction by Graph Convolution Neural Networks

Main Article Content

XinYe Bian
DongMei Xie
DI Zhang
XiaoYu Xie
Yuyue Feng
Piyu Zhou
Changjiu He
Mingming Lv
Haipeng Wang

Abstract

The peptide sequence tag extraction method plays a vital role in tandem mass spectrometry-based protein identification engines. This approach faces two significant challenges in practical applications: first, the issue of fixed tag lengths, where shorter tags lack sufficient specificity, leading to an excessive recall of non-target peptide sequences, and longer tags experience a reduction in precision as tag length increases, potentially failing to recall target peptide sequences; second, the sensitivity and precision of tag extraction remain relatively low. To address these issues, a variable-length peptide sequence tag extraction algorithm, TagEx, based on graph convolutional networks, is proposed. This method begins by training a de novo peptide sequencing scoring model utilizing graph convolutional networks. It then constructs a spectral peak connection graph from the mass spectrum, employing a depth-first search strategy to extract variable-length peptide sequence tags, with the trained graph convolutional network model scoring amino acid connections during the extraction process.Finally, tags are filtered based on length and scoring to obtain the final candidate peptide sequence tag set. To evaluate TagEx’s performance, it was benchmarked against three representative tag extraction software tools: InsPect, PepNovo+, and DirecTag. The experimental results demonstrate that TagEx exhibits superior sensitivity, coverage, and precision, with improvements of 0.62-2.32, 3.22-11.14, and 3.29-8.31 percentage points, respectively, when retaining the top 100 tags.

Article Details

Section
Special Issue - Efficient Scalable Computing based on IoT and Cloud Computing