Statistical Models for the Analysis of Isobaric Tags Multiplexed Quantitative Proteomics

D'Angelo, Gina; Chaerkady, Raghothama; Yu, Wen; Hizal, DENİZ; Hess, Sonja; Zhao, Wei; Lekstrom, Kristen; Guo, Xiang; White, Wendy; Roskos, Lorin; Bowen, Michael; Yang, Harry

doi:10.1021/acs.jproteome.6b01050

Statistical Models for the Analysis of Isobaric Tags Multiplexed Quantitative Proteomics

Atıf İçin Kopyala

D'Angelo G., Chaerkady R., Yu W., Hizal D., Hess S., Zhao W., ...Daha Fazla

JOURNAL OF PROTEOME RESEARCH, cilt.16, sa.9, ss.3124-3136, 2017 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 16 Sayı: 9
Basım Tarihi: 2017
Doi Numarası: 10.1021/acs.jproteome.6b01050
Dergi Adı: JOURNAL OF PROTEOME RESEARCH
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
Sayfa Sayıları: ss.3124-3136
Anahtar Kelimeler: proteomics, mixed models, statistical models, biomarkers, TMT, MASS-SPECTROMETRY DATA, PROTEIN IDENTIFICATION, COMPLEX SAMPLES, DATA SETS, QUANTIFICATION, GUIDELINES, IMPUTATION, RATES
Acıbadem Mehmet Ali Aydınlar Üniversitesi Adresli: Hayır

Özet

Mass spectrometry is being used to identify protein biomarkers that can facilitate development of drug treatment. Mass spectrometry-based labeling proteomic experiments result in complex proteomic data that is hierarchical in nature often with small sample size studies. The generalized linear model (GLM) is the most popular approach in proteomics to compare protein abundances between groups. However, GLM does not address all the complexities of proteomics data such as repeated measures and variance heterogeneity. Linear models for microarray data (LIMMA) and mixed models are two approaches that can address some of these data complexities to provide better statistical estimates. We compared these three statistical models (GLM, LIMMA, and mixed models) under two different normalization approaches (quantile normalization and median sweeping) to demonstrate when each approach is the best for tagged proteins. We evaluated these methods using a spiked-in data set of known protein abundances, a systemic lupus erythematosus (SLE) data set, and simulated data from multiplexed labeling experiments that use tandem mass tags (TMT). Data are available via ProteomeXchange with identifier PXD00S486. We found median sweeping to be a preferred approach data normalization, and with this normalization approach there was overlap with findings across all methods with GLM being a subset of mixed models. The conclusion is that the mixed model had the best type I error with median sweeping, whereas LIMMA had the better overall statistical properties regardless of normalization approaches.