Modeling protein tandem mass spectrometry data with an extended linear regression strategy

Han Liu, Anthony J. Bonner, Andrew Emili

Research output: Contribution to journalConference articlepeer-review

1 Scopus citations


Tandem mass spectrometry(MS/MS) has emerged as a cornerstone of proteomics owing in part to robust spectral interpretation algorithm. The intensity patterns presented in mass spectra are useful information for identification of peptides and proteins. However, widely used algorithms can not predicate the peak intensity patterns exactly. In this paper, we have developed a systematic analytical approach based on a family of extended regression models, which permits routine, large scale protein expression profile modeling. By proving an important technical result that the regression coefficient vector is just the eigenvector corresponding to the least eigenvalue of a space transformed version of the original data, this extended regression problem can be reduced to a SVD decomposition problem, thus gain the robustness and efficiency. To evaluate the performance of our model, from 60,960 spectra, we chose 2,859 with high confidence, non redundant matches as training data, based on this specific problem, we derived some measurements of goodness of fit to show that our modeling method is reasonable. The issues of overfilling and underfilling are also discussed. This extended regression strategy therefore offers an effective and efficient framework for in-depth investigation of complex mammalian proteomes.

Original languageAmerican English
Pages (from-to)3055-3059
Number of pages5
JournalAnnual International Conference of the IEEE Engineering in Medicine and Biology - Proceedings
Volume26 IV
StatePublished - Dec 1 2004
EventConference Proceedings - 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC 2004 - San Francisco, CA, United States
Duration: Sep 1 2004Sep 5 2004

ASJC Scopus subject areas

  • Signal Processing
  • Biomedical Engineering
  • Computer Vision and Pattern Recognition
  • Health Informatics


  • Goodness of fit
  • Protein expression profile
  • Proteomes
  • Regression
  • Tandem mass spectrometry

Cite this