Libra
Libra is a module within the trans-proteomic pipeline to
perform quantification on MS/MS spectra that have multi-reagent
labeled peptides. More specifically, at ISB we use Libra on
MS/MS spectra of iTRAQ labeled samples.
|
|
Patrick Pedrioli
-- original code author of Quantitation
Andrew Keller
-- peptide assignment to proteins within pipeline
Nichole King
-- code statistics/maintenance/additions/corrections and satellite applications
|
|
Command line syntax for using Libra
The condition file
Details about what Libra does
How accurate is the quantitation?
|
|
|
Command line syntax to use Libra in the pipeline
To run the pipeline using PeptideProphet, ProteinProphet,
and Libra, specify your input files (PepXML files or
summary html files), 'Prophet flags, and Libra flag.
The libra flag is of form:
-L[conditionFile]
|
|
For example, to run PeptideProphet retaining identifications
with P less than 0.05 (-p0), run PropteinProphet (-Op),
and run Libra to return quantitation use:
xinteract -p0 -Op *.html -Lcondition.xml
|
|
You'll find the integrated intensities of the reagent
m/z fragment ion lines in your interact.xml file, and you'll find
the protein quantitation in your interact-prot.xml
file.
Additionally, there's a tab separated file called
quantitation.tsv which you can load into Excel or favorite
spreadsheet tool to do your own math if you like.
Your web interface to your interact files provides export options too.
(If you wanted to tailor the peptide to protein assignment yourselves,
use export options on your interact (PepXML) file.)
|
|
|
The condition file
The condition file specifies the reagent M/Z values,
mass tolerance, isotopic correction coefficients,
method of centroiding, method of normalization,
minimum threshhold intensity (not required),
target MS level, and output type.
|
|
The condition.xml elements and attributes (in sequential order)
have the following function:
1. <fragmentMasses>
<reagent>
specifies the m/z values to be used in the analysis.
2. <isotopicContributions>
specifies the isotopic contributions of one line to it's adjacent
lines (specified in fractions).
The order is:
contribution of channel 1 to channel 2
contribution of channel 2 to channel 1 and then to channel 3
contribution of channel 3 to channel 2 and then to channel 4
contribution of channel 4 to channel 3
If you do not want to apply isotopic corrections input -2.
3. Specifies the mass tolerance. If in field one you have choosen
m/z = 114 and you set a mass tolerance of 1, libra would look
for the most intense m/z value in the interval 113 to 115.
4. Specifies centroiding preferences.
0: none
1: mathematical average
2: intensity weighted average
For 1 and 2 a number of iterations must be specified as well.
5. Specifies normalization preferences.
1 ... n: corresponding m/z value in the sorted (ascending order)
list of m/z values specified in field one.
The default is channel 1 if not specified, but you'll want to specify.
6. Specifies the level of the MS scans to use in the analysis.
1: MS
2: MS2
n: MSn
7. Switch between printing the scan number or the retention time in
the output file.
Retention time can be usefull when a link to the native data is
required, but the scan numbers in the mzXML format and the native
output from the MS instrument do not correspond.
[NOTE: haven't checked this feature, it might not be working.]
8. Name of output file with tab separated information (not active yet,
default filename is quantitation.tsv)
9. Minimum threshhold intensity (not required). If you have low S/N spectra
you'll want to set this to ignore noise below a certain integrated count
for a line.
|
If you would like to generate a condition.xml file, please use
http://db.systemsbiology.net/webapps/conditionFileApp/
|
|
Here's an example condition file: [NOTE: this changed June, 24, 2006 so please update your condition file using the web application above or information here.] |
|
<SUMmOnCondition description="iTraq">
<fragmentMasses>
<reagent mz="114.1">
<reagent mz="115.1">
<reagent mz="116.1">
<reagent mz="117.1">
</fragmentMasses>
<isotopicContributions>
<contributingMz value="1">
<affected mz="2" correction="0.063"/>
</contributingMz>
<contributingMz value="2">
<affected mz="1" correction="0.02"/>
<affected mz="3" correction="0.06"/>
</contributingMz>
<contributingMz value="3">
<affected mz="2" correction="0.03"/>
<affected mz="4" correction="0.049"/>
</contributingMz>
<contributingMz value="4">
<affected mz="3" correction="0.04"/>
</contributingMz>
</isotopicContributions>
<massTolerance value="0.2"/>
<centroiding type="2" iterations="1"/>
<normalization type="4"/>
<targetMs level="2"/>
<output type="1"/>
<quantitationFile name="quantitation.tsv"/>
<minimumThreshhold value="20"/>
</SUMmOnCondition>
|
|
|
Details about what Libra does
|
|
Given conditions, Libra integrates the intensities of the
reagent m/z lines (a.k.a. channels throughout documentation)
in an MS/MS spectrum and stores the values at the peptide
level in the interact.xml file. ProteinProphet, within the
trans-proteomic pipeline, infers the simplest list of proteins
consistent with the identified peptides. (Note that peptides
with PeptideProphet probabilities less than 0.5 are excluded
in Libra.)
Protein quantitation is derived from the group of peptides
associated with the protein.
Each peptide integrated intensity is normalized by the
sum of it's channel intensities, the normalized channels
are averaged over all peptides of a protein, the standard
deviation of the mean is determined for each normalized
channel of a peptide, normalized channels more than
2 sigma from the mean are removed, the average channels
of the protein are recalculated for those channels
surviving outlier removing, and the 1-sigma standard errors are
calculated using the standard deviation. If the user has
specified a reference normalization channel, the protein quantitation is
normalized w.r.t. that channel, and the errors become
the channel error and the reference channel error
added in quadrature.
The value 99.99 indicates that a protein's quantition was calculated
using only peptide, and so the standard error is infinite.
The value -9.0 indicates that no peptides of the protein survived the threshhold filter
and outlier removal, so the protein quantitation is undefined.
(One day, would like to use intensity weighted mean and errors in calcs.)
When a reagent m/z (channel) wasn't found in the peptide spectrum, that reagent m/z
is replaced with the default value. When the intensity of a reagent line is
less than or equal to zero, it's value is replaced with zero.
Note, there are still a few loose ends to tie up.
*
Be wary of quantitation from very poor S/N spectra. Is your
integrated intensity for a peptide channel less than 20 counts, for example?
|
|
An detailed example of the steps going into the quantitation follows. ProteinA has 8 peptides.
|
|
The interact.xml file shows the peptide integrated
intensities for each channel:
libra 114 libra 115 libra 116 libra 117
pep1 67.100 39.153 49.651 47.567
pep2 2311.460 167071.800 1847.637 1762.466
pep3 2311.460 1670.718 1847.637 1762.466
pep4 2311.460 1670.718 1847.637 1762.466
pep5 2311.460 1670.718 1847.637 1762.466
pep6 2311.460 1670.718 1847.637 1762.466
pep7 224.920 231.700 246.938 241.900
pep8 287.600 293.121 263.173 268.105
|
|
Libra normalizes each peptide channel by the sum of that peptide's
channels:
libra 114 libra 115 libra 116 libra 117
pep1 0.330 0.192 0.244 0.234
pep2 0.013 0.966 0.011 0.010
pep3 0.304 0.220 0.243 0.232
pep4 0.304 0.220 0.243 0.232
pep5 0.304 0.220 0.243 0.232
pep6 0.304 0.220 0.243 0.232
pep7 0.238 0.245 0.261 0.256
pep8 0.259 0.264 0.237 0.241
|
|
Determines the mean and standard deviation of the mean:
libra 114 libra 115 libra 116 libra 117
mean 0.257 0.318 0.216 0.209
st dev 0.103 0.262 0.083 0.081
|
|
Removes those that deviate from the mean by more than 2 sigma, which are
those outside of the range below in this example:
libra 114 libra 115 libra 116 libra 117
0.05-0.46 0.00-0.81 0.06-0.37 0.06-0.36
|
|
Recalculates the mean and standard deviation (outliers have been removed):
libra 114 libra 115 libra 116 libra 117
mean 0.292 0.226 0.245 0.237
st dev 0.032 0.023 0.008 0.009
|
|
Re-normalizes with respect to the user selected channel,
channel 4 in this example:
libra 114 libra 115 libra 116 libra 117
mean 1.232 0.953 1.034 1.000
standard error 0.013 0.009 0.004 0.005
|
|
|
How accurate is the quantitation?
Test datasets were created by Anne-Claude Gingras, Patrick Pedrioli,
(and Hookeun Lee?) from a 9 protein mix labeled with iTRAQ.
The mix treatment was varied slightly, and several measurements were
obtained on the Q-TOF, the Qstar, and the TOF-TOF.
The preliminary results presented below are from one sample run on the Q-TOF.
The expected numbers in the table are the concentrations normalized to
channel 4.
|
|
Peptides deviating from the mean by more than 2 standard deviations were removed.
A minimum intensity thresshold of 20 counts was used. The software additionally
removes peptides in which a channel mass is not found. IC is an abbreviation for
isotopic correction provided in the condition.xml file.
P0.9 P0.9
| EXPECTED | LIBRA (w/IC, thresh=20) | Libra (w/IC, thresh=0.01)
| | |
Name Species | 114 115 116 117 | 114 115 116 117 | 114 115 116 117
--------------------- |-------------------------|-------------------------|-------------------------
cytochrome c | .25 1 .25 1 | 0.21 0.88 0.26 1.0 | 0.28 0.91 0.26 1.00
ovalbumin Chicken | 1 1 1 1 | 0.87 0.93 1.07 1.0 | 0.90 1.04 1.20 1.00
transferrin Bovine | 8 8 1 1 | 6.16 5.64 1.20 1.0 | 6.01 5.63 1.06 1.00
beta lactoglobulin | .125 .125 1 1 | 0.10 0.11 1.08 1.0 | 0.10 0.08 1.10 1.00
serum albumin Bovine | 4 1 4 1 | 3.12 0.93 3.91 1.0 | 3.13 0.88 4.19 1.00
catalase Bovine | 0 100 10 1 | 0.98 82.57 11.42 1.0 | 1.15 20.37 4.79 1.00
|
|
Please add to/edit this section. From the table above, we can see a handful of experimental
errors. There may be errors in the purity of the protein purchased,
errors in the protein concentration placed in the sample, errors in
peptide concentration due to incomplete digestion or method of digestion,
and errors introduced in peptide acquisition and measurement in mass spectrometer.
[The later measurement errors are seen as the standard deviation in the top of this document.
These are the smallest uncertainties.]
Guesstimating a rough accuracy in the 10 - 25% range using the expected numbers
and the measurements above. This is preliminary as haven't analyzed the gazillion other
test datasets.
** Note, can see above that will have difficulties with zero intensities. For these cases, please
check your quantitation.tsv file until have time to modify code to handle those cases...
|
|
|
last updated Nov 19, 2006, Nichole King
|