Libra in the Trans-Proteomic Pipeline

Libra
Libra is a module within the trans-proteomic pipeline to perform quantification on MS/MS spectra that have multi-reagent labeled peptides. More specifically, at ISB we use Libra on MS/MS spectra of iTRAQ labeled samples.

Patrick Pedrioli -- original code author of Quantitation Andrew Keller -- peptide assignment to proteins within pipeline Nichole King -- code statistics/maintenance/additions/corrections and satellite applications

Command line syntax for using Libra The condition file Details about what Libra does How accurate is the quantitation?


Command line syntax to use Libra in the pipeline To run the pipeline using PeptideProphet, ProteinProphet, and Libra, specify your input files (PepXML files or summary html files), 'Prophet flags, and Libra flag. The libra flag is of form: -L[conditionFile]

For example, to run PeptideProphet retaining identifications with P less than 0.05 (-p0), run PropteinProphet (-Op), and run Libra to return quantitation use: xinteract -p0 -Op .html -Lcondition.xml*

You'll find the integrated intensities of the reagent m/z fragment ion lines in your interact.xml file, and you'll find the protein quantitation in your interact-prot.xml file. Additionally, there's a tab separated file called quantitation.tsv which you can load into Excel or favorite spreadsheet tool to do your own math if you like. Your web interface to your interact files provides export options too. (If you wanted to tailor the peptide to protein assignment yourselves, use export options on your interact (PepXML) file.)


The condition file The condition file specifies the reagent M/Z values, mass tolerance, isotopic correction coefficients, method of centroiding, method of normalization, minimum threshhold intensity (not required), target MS level, and output type.

The condition.xml elements and attributes (in sequential order) have the following function: 1. <fragmentMasses> <reagent> specifies the m/z values to be used in the analysis. 2. <isotopicContributions> specifies the isotopic contributions of one line to it's adjacent lines (specified in fractions). The order is: contribution of channel 1 to channel 2 contribution of channel 2 to channel 1 and then to channel 3 contribution of channel 3 to channel 2 and then to channel 4 contribution of channel 4 to channel 3 If you do not want to apply isotopic corrections input -2. 3. Specifies the mass tolerance. If in field one you have choosen m/z = 114 and you set a mass tolerance of 1, libra would look for the most intense m/z value in the interval 113 to 115. 4. Specifies centroiding preferences. 0: none 1: mathematical average 2: intensity weighted average For 1 and 2 a number of iterations must be specified as well. 5. Specifies normalization preferences. 1 ... n: corresponding m/z value in the sorted (ascending order) list of m/z values specified in field one. The default is channel 1 if not specified, but you'll want to specify. 6. Specifies the level of the MS scans to use in the analysis. 1: MS 2: MS2 n: MSn 7. Switch between printing the scan number or the retention time in the output file. Retention time can be usefull when a link to the native data is required, but the scan numbers in the mzXML format and the native output from the MS instrument do not correspond. [NOTE: haven't checked this feature, it might not be working.] 8. Name of output file with tab separated information (not active yet, default filename is quantitation.tsv) 9. Minimum threshhold intensity (not required). If you have low S/N spectra you'll want to set this to ignore noise below a certain integrated count for a line.
If you would like to generate a condition.xml file, please use http://db.systemsbiology.net/webapps/conditionFileApp/

Here's an example condition file: [NOTE: this changed June, 24, 2006 so please update your condition file using the web application above or information here.]

<SUMmOnCondition description="iTraq"> <fragmentMasses> <reagent mz="114.1"> <reagent mz="115.1"> <reagent mz="116.1"> <reagent mz="117.1"> </fragmentMasses> <isotopicContributions> <contributingMz value="1"> <affected mz="2" correction="0.063"/> </contributingMz> <contributingMz value="2"> <affected mz="1" correction="0.02"/> <affected mz="3" correction="0.06"/> </contributingMz> <contributingMz value="3"> <affected mz="2" correction="0.03"/> <affected mz="4" correction="0.049"/> </contributingMz> <contributingMz value="4"> <affected mz="3" correction="0.04"/> </contributingMz> </isotopicContributions> <massTolerance value="0.2"/> <centroiding type="2" iterations="1"/> <normalization type="4"/> <targetMs level="2"/> <output type="1"/> <quantitationFile name="quantitation.tsv"/> <minimumThreshhold value="20"/> </SUMmOnCondition>


Details about what Libra does

Given conditions, Libra integrates the intensities of the reagent m/z lines (a.k.a. channels throughout documentation) in an MS/MS spectrum and stores the values at the peptide level in the interact.xml file. ProteinProphet, within the trans-proteomic pipeline, infers the simplest list of proteins consistent with the identified peptides. (Note that peptides with PeptideProphet probabilities less than 0.5 are excluded in Libra.) Protein quantitation is derived from the group of peptides associated with the protein. Each peptide integrated intensity is normalized by the sum of it's channel intensities, the normalized channels are averaged over all peptides of a protein, the standard deviation of the mean is determined for each normalized channel of a peptide, normalized channels more than 2 sigma from the mean are removed, the average channels of the protein are recalculated for those channels surviving outlier removing, and the 1-sigma standard errors are calculated using the standard deviation. If the user has specified a reference normalization channel, the protein quantitation is normalized w.r.t. that channel, and the errors become the channel error and the reference channel error added in quadrature. The value 99.99 indicates that a protein's quantition was calculated using only peptide, and so the standard error is infinite. The value -9.0 indicates that no peptides of the protein survived the threshhold filter and outlier removal, so the protein quantitation is undefined. (One day, would like to use intensity weighted mean and errors in calcs.) When a reagent m/z (channel) wasn't found in the peptide spectrum, that reagent m/z is replaced with the default value. When the intensity of a reagent line is less than or equal to zero, it's value is replaced with zero. Note, there are still a few loose ends to tie up. ^* Be wary of quantitation from very poor S/N spectra. Is your integrated intensity for a peptide channel less than 20 counts, for example?

An detailed example of the steps going into the quantitation follows. ProteinA has 8 peptides.

The interact.xml file shows the peptide integrated intensities for each channel: libra 114 libra 115 libra 116 libra 117 pep1 67.100 39.153 49.651 47.567 pep2 2311.460 167071.800 1847.637 1762.466 pep3 2311.460 1670.718 1847.637 1762.466 pep4 2311.460 1670.718 1847.637 1762.466 pep5 2311.460 1670.718 1847.637 1762.466 pep6 2311.460 1670.718 1847.637 1762.466 pep7 224.920 231.700 246.938 241.900 pep8 287.600 293.121 263.173 268.105

Libra normalizes each peptide channel by the sum of that peptide's channels: libra 114 libra 115 libra 116 libra 117 pep1 0.330 0.192 0.244 0.234 pep2 0.013 0.966 0.011 0.010 pep3 0.304 0.220 0.243 0.232 pep4 0.304 0.220 0.243 0.232 pep5 0.304 0.220 0.243 0.232 pep6 0.304 0.220 0.243 0.232 pep7 0.238 0.245 0.261 0.256 pep8 0.259 0.264 0.237 0.241

Determines the mean and standard deviation of the mean: libra 114 libra 115 libra 116 libra 117 mean 0.257 0.318 0.216 0.209 st dev 0.103 0.262 0.083 0.081

Removes those that deviate from the mean by more than 2 sigma, which are those outside of the range below in this example: libra 114 libra 115 libra 116 libra 117 0.05-0.46 0.00-0.81 0.06-0.37 0.06-0.36

Recalculates the mean and standard deviation (outliers have been removed): libra 114 libra 115 libra 116 libra 117 mean 0.292 0.226 0.245 0.237 st dev 0.032 0.023 0.008 0.009

Re-normalizes with respect to the user selected channel, channel 4 in this example: libra 114 libra 115 libra 116 libra 117 mean 1.232 0.953 1.034 1.000 standard error 0.013 0.009 0.004 0.005


How accurate is the quantitation? Test datasets were created by Anne-Claude Gingras, Patrick Pedrioli, (and Hookeun Lee?) from a 9 protein mix labeled with iTRAQ. The mix treatment was varied slightly, and several measurements were obtained on the Q-TOF, the Qstar, and the TOF-TOF. The preliminary results presented below are from one sample run on the Q-TOF. The expected numbers in the table are the concentrations normalized to channel 4.

Peptides deviating from the mean by more than 2 standard deviations were removed. A minimum intensity thresshold of 20 counts was used. The software additionally removes peptides in which a channel mass is not found. IC is an abbreviation for isotopic correction provided in the condition.xml file. P0.9 P0.9 \| EXPECTED \| LIBRA (w/IC, thresh=20) \| Libra (w/IC, thresh=0.01) \| \| \| Name Species \| 114 115 116 117 \| 114 115 116 117 \| 114 115 116 117 --------------------- \|-------------------------\|-------------------------\|------------------------- cytochrome c \| .25 1 .25 1 \| 0.21 0.88 0.26 1.0 \| 0.28 0.91 0.26 1.00 ovalbumin Chicken \| 1 1 1 1 \| 0.87 0.93 1.07 1.0 \| 0.90 1.04 1.20 1.00 transferrin Bovine \| 8 8 1 1 \| 6.16 5.64 1.20 1.0 \| 6.01 5.63 1.06 1.00 beta lactoglobulin \| .125 .125 1 1 \| 0.10 0.11 1.08 1.0 \| 0.10 0.08 1.10 1.00 serum albumin Bovine \| 4 1 4 1 \| 3.12 0.93 3.91 1.0 \| 3.13 0.88 4.19 1.00 catalase Bovine \| 0 100 10 1 \| 0.98 82.57 11.42 1.0 \| 1.15 20.37 4.79 1.00

Please add to/edit this section. From the table above, we can see a handful of experimental errors. There may be errors in the purity of the protein purchased, errors in the protein concentration placed in the sample, errors in peptide concentration due to incomplete digestion or method of digestion, and errors introduced in peptide acquisition and measurement in mass spectrometer. [The later measurement errors are seen as the standard deviation in the top of this document. These are the smallest uncertainties.] Guesstimating a rough accuracy in the 10 - 25% range using the expected numbers and the measurements above. This is preliminary as haven't analyzed the gazillion other test datasets. ** Note, can see above that will have difficulties with zero intensities. For these cases, please check your quantitation.tsv file until have time to modify code to handle those cases...


last updated Nov 19, 2006, Nichole King