1 Department of Chemistry, Nanchang University, Nanchang 330031, P.R.China
2 Department of Mathematics, Nanchang University, Nanchang 330031, P.R.China
3 Department of Materials and Chemical Engineering, Pingxiang University, Pingxiang 337055, P.R.China
* To whom correspondence should be addressed. Tel: + 86 791 83969518; Email: firstname.lastname@example.org
Supplemental Text 1.
In this example, we search for the cellular tumor antigen p53 from Macaca mulatta as a query and use default options to obtain predictions for phosphorylation. The result page includes two parts: detailed submission form and search results for one search. The search results will be presented as a searchable table in which each row shows a predicted phosphorylation site. In case of p53, we get a list of 32 predictions for 32 phosphorylation sites, and the user can choose view number of the predictions in the table, which default only the first 10 of these predictions are shown in the table (see Supplemental Figure 2D). For each prediction we list nine columns: The first and fourth columns respectively represent the protein name (UniProt ID) for query protein and the protein of known PTM. The second and fifth column respectively indicate the putative and experimentally-verified modification residues as well as its positions in query protein and the protein of known PTM. The third and sixth columns respectively show the putative and experimentally-verified modification residues surrounding sequence segments in query protein and the protein of known PTM. The seventh column represents the number of non-conservative sequence differences between the peptide of known PTM and query peptide. The eighth column represents the expect value of alignment between query peptide and peptide of known PTM. The ninth column represents the status of CPE (cross-promotion expect-value, see method section) aligned between query protein and protein known PTM. It should be noted that the user could determine PTM status by the CPE (yes or no). However, if both sequence differences in the seventh column and E-value in the eighth column should preferably be low, the prediction can also be considered as a candidate of PTM site even in “no” status for CPE. Notably, those predictions are searchable by entering key words to textbox in the table. Furthermore, the user can investigate the predictions in greater detail via the web interface. For each known protein and query protein, we link to UniProt database where the user can find manually curated information on sequence annotation column (see Supplemental Figure 2D).
Supplemental Figure 1. Identification results of protein Q8SPZ3 from Delphinapterus leucas and protein P61260 from Macaca fuscata in PTMProber system. Different colors illustrate exactly identification sites for different PTM types. Phosphorylation, acetylation, ubiquitination, methylation and sumoylation respectively are represented as blue, red, yellow, purple and green.
Supplemental Figure 2. Main work interface of PTMProber. A popular human protein TP53 protein is interested because that is frequently mutated or inactivated in about 60% of cancers. The homologous protein (UniProtID: P56424) in Macaca mulatta (taxid=9544) is chosen as the example protein to search phosphorylation sites in search interface (A). Extensible tool on setting BLAST database of query proteome is used when the database is not in the list (B). Extensible tool on setting BLAST database of known PTM is utilized to customize individual database by user-provided PTM data (C). The system returns user-submitted form and searching results involving a total of 32 phosphorylation sites which are presented in searchable tabular view (only the first 10 predictions are shown), and these results can now be sent to user-provided email address and investigated in further detail by following the links to the UniProt database for curated knowledge related to the sites (D).
Supplemental Table 1. the list shows the protein PTM data for testing PTMProber including protein and site number (P. num. and S. num.) in Rattus norvegicus, Gallus gallus and Gallus gallus.
Supplemental Table 2.Comparison CPE benchmark with E-value benchmark. The “Seq. Diff.” column indicates the number of sequence differences between a query site in testing data and its best known site match from PhosphoSitePlus/UniProt. The “NH” row indicates query sites for which there was either no known site match in PhosphoSitePlus/UniProt. The “no homology” row in “CPE” column indicates that either the query site had no match in the PhosphoSitePlus/UniProt, or that “CPE = no”. The “no homology” row in “E-value” column indicates that either the match site E-value is smaller than 10-6, or also the query site had no match in the PhosphoSitePlus/UniProt. *NH represents no homology.
Supplemental Table 3. PTM data sources and statistics for different PTM types.